[00:00:37] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [00:01:08] !log catrope@deploy1001 Synchronized php-1.34.0-wmf.16/extensions/CentralNotice/: T227711 among others (duration: 00m 48s) [00:01:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:01:17] T227711: BannerHistoryLogger issue prevents users reaching payments wiki - https://phabricator.wikimedia.org/T227711 [00:04:48] !log catrope@deploy1001 Synchronized php-1.34.0-wmf.15/extensions/CentralNotice/: T227711 among others (duration: 00m 47s) [00:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:05:00] AndyRussG: All done! [00:05:03] Thanks for sticking with me [00:05:13] RoanKattouw: thanks so much!!! [00:05:18] Yeah sorry it's still not quite standard [00:05:22] but still better than before :) [00:05:41] and apologies also for the Ci ugliness [00:06:33] (03PS2) 10Dzahn: mediawiki: use a better notes_url for the "DSH groups" Icinga alert [puppet] - 10https://gerrit.wikimedia.org/r/526561 (https://phabricator.wikimedia.org/T227547) [00:07:47] (03CR) 10Dzahn: [C: 03+2] mediawiki: use a better notes_url for the "DSH groups" Icinga alert [puppet] - 10https://gerrit.wikimedia.org/r/526561 (https://phabricator.wikimedia.org/T227547) (owner: 10Dzahn) [00:07:55] (03PS3) 10Dzahn: mediawiki: use a better notes_url for the "DSH groups" Icinga alert [puppet] - 10https://gerrit.wikimedia.org/r/526561 (https://phabricator.wikimedia.org/T227547) [00:28:39] (03PS2) 10Mholloway: Enable MachineVision on (beta) commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526543 (https://phabricator.wikimedia.org/T227348) [00:29:14] (03CR) 10Mholloway: Enable MachineVision on (beta) commonswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526543 (https://phabricator.wikimedia.org/T227348) (owner: 10Mholloway) [00:31:54] (03PS3) 10Mholloway: Load MachineVision extension if enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526544 (https://phabricator.wikimedia.org/T227348) [01:02:41] PROBLEM - puppet last run on etcd1003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [01:08:37] !log on mwmaint1002, editing wikiversions.json locally to move wikimania2006wiki to .16, to investigate T229366 [01:08:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:08:47] T229366: serialize(): "" returned as member variable from __sleep() but does not exist - https://phabricator.wikimedia.org/T229366 [01:30:35] RECOVERY - puppet last run on etcd1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [01:45:23] (03PS4) 10Dzahn: parsoid::testing: add mediawiki appserver profiles to role [puppet] - 10https://gerrit.wikimedia.org/r/526289 (https://phabricator.wikimedia.org/T228069) [01:49:40] (03CR) 10Dzahn: "needs mcrouter secrets created to be able to compile it. will continue there tomorrow. https://puppet-compiler.wmflabs.org/compiler1001/17" [puppet] - 10https://gerrit.wikimedia.org/r/526289 (https://phabricator.wikimedia.org/T228069) (owner: 10Dzahn) [01:49:51] (03PS5) 10Jeena Huneidi: Add Parsoid chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/525481 (https://phabricator.wikimedia.org/T228909) [01:51:15] (03PS6) 10Jeena Huneidi: Add Parsoid chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/525481 (https://phabricator.wikimedia.org/T228909) [01:51:54] (03PS1) 10Dzahn: add fake mcrouter keys for scandium.eqiad.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/526574 (https://phabricator.wikimedia.org/T228069) [01:52:09] PROBLEM - puppet last run on mc1031 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [01:52:32] (03PS2) 10Dzahn: add fake mcrouter keys for scandium.eqiad.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/526574 (https://phabricator.wikimedia.org/T228069) [01:52:43] (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake mcrouter keys for scandium.eqiad.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/526574 (https://phabricator.wikimedia.org/T228069) (owner: 10Dzahn) [02:09:39] (03PS5) 10Dzahn: parsoid::testing: add mediawiki appserver profiles to role [puppet] - 10https://gerrit.wikimedia.org/r/526289 (https://phabricator.wikimedia.org/T228069) [02:16:53] (03PS6) 10Dzahn: parsoid::testing: add mediawiki appserver profiles to role [puppet] - 10https://gerrit.wikimedia.org/r/526289 (https://phabricator.wikimedia.org/T228069) [02:20:13] RECOVERY - puppet last run on mc1031 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [02:23:23] (03CR) 10Dzahn: [C: 04-1] "added fake mcrouter keys to labs/private, added missing Hiera keys/values, now running into " Duplicate declaration: Class[Mediawiki::Pack" [puppet] - 10https://gerrit.wikimedia.org/r/526289 (https://phabricator.wikimedia.org/T228069) (owner: 10Dzahn) [02:24:32] !log on mwmaint1002 reverted previous change using scap pull [02:24:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:31:26] (03PS7) 10Dzahn: parsoid::testing: add mediawiki appserver profiles to role [puppet] - 10https://gerrit.wikimedia.org/r/526289 (https://phabricator.wikimedia.org/T228069) [02:34:43] RECOVERY - High lag on wdqs1009 is OK: (C)3600 ge (W)1200 ge 847.9 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [02:34:57] (03CR) 10Dzahn: "fixed duplicate declaration issue by removing fonts class from visualdiff module. compiles now: https://puppet-compiler.wmflabs.org/compi" [puppet] - 10https://gerrit.wikimedia.org/r/526289 (https://phabricator.wikimedia.org/T228069) (owner: 10Dzahn) [02:35:40] (03CR) 10Dzahn: "please check again now Giuseppe" [puppet] - 10https://gerrit.wikimedia.org/r/526289 (https://phabricator.wikimedia.org/T228069) (owner: 10Dzahn) [03:59:31] !log tstarling@deploy1001 Synchronized php-1.34.0-wmf.16/includes/parser/ParserOutput.php: T229366 (duration: 00m 47s) [03:59:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:59:41] T229366: serialize(): "" returned as member variable from __sleep() but does not exist - https://phabricator.wikimedia.org/T229366 [04:00:37] !log tstarling@deploy1001 Synchronized php-1.34.0-wmf.16/tests/phpunit/includes/parser/ParserOutputTest.php: T229366 (duration: 00m 46s) [04:00:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:19:39] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:28:53] (03CR) 10Santhosh: [C: 03+1] Decrease idwiki MT threshold for publishing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526197 (https://phabricator.wikimedia.org/T228971) (owner: 10Petar.petkovic) [05:00:02] !log Compress s6 on labsdb1010 - T222978 [05:00:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:10] T222978: Compress and defragment tables on labsdb hosts - https://phabricator.wikimedia.org/T222978 [05:32:49] (03PS1) 10Marostegui: db-codfw.php: db2128 is now the sanitarium master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526598 [05:36:50] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: db2128 is now the sanitarium master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526598 (owner: 10Marostegui) [05:37:45] (03Merged) 10jenkins-bot: db-codfw.php: db2128 is now the sanitarium master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526598 (owner: 10Marostegui) [05:38:00] (03CR) 10jenkins-bot: db-codfw.php: db2128 is now the sanitarium master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526598 (owner: 10Marostegui) [05:39:57] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Clarify that db2128 is the new sanitarium master (duration: 00m 47s) [05:40:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:41:12] (03PS1) 10Marostegui: mariadb: Provision db2125 into s2 [puppet] - 10https://gerrit.wikimedia.org/r/526600 (https://phabricator.wikimedia.org/T228969) [05:41:52] (03CR) 10Marostegui: [C: 03+2] mariadb: Provision db2125 into s2 [puppet] - 10https://gerrit.wikimedia.org/r/526600 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [05:44:18] !log Drop abuse_filter_log.afl_log_id from s1 codfw with replication (this will cause lag in s1 codfw) - T226851 [05:44:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:44:25] T226851: Drop abuse_filter_log.afl_log_id in production - https://phabricator.wikimedia.org/T226851 [05:47:45] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:00:49] (03PS1) 10Elukey: profile::druid::turnilo::proxy: add health check [puppet] - 10https://gerrit.wikimedia.org/r/526602 (https://phabricator.wikimedia.org/T227860) [06:01:45] (03CR) 10Elukey: [C: 03+2] profile::druid::turnilo::proxy: add health check [puppet] - 10https://gerrit.wikimedia.org/r/526602 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [06:29:47] PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:32:35] (03PS1) 10Elukey: profile::druid::turnilo::proxy: add Location to httpd Vhost [puppet] - 10https://gerrit.wikimedia.org/r/526605 (https://phabricator.wikimedia.org/T210411) [06:34:14] (03CR) 10Elukey: [C: 03+2] profile::druid::turnilo::proxy: add Location to httpd Vhost [puppet] - 10https://gerrit.wikimedia.org/r/526605 (https://phabricator.wikimedia.org/T210411) (owner: 10Elukey) [06:39:52] (03CR) 10Elukey: [C: 03+1] "Thanks!" [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/526527 (https://phabricator.wikimedia.org/T222253) (owner: 10Ottomata) [06:57:41] RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:58:02] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db2125 to s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526606 (https://phabricator.wikimedia.org/T228969) [07:00:00] (03PS1) 10Marostegui: dbctl: Add new instance [puppet] - 10https://gerrit.wikimedia.org/r/526607 (https://phabricator.wikimedia.org/T229070) [07:14:58] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/526607 (https://phabricator.wikimedia.org/T229070) (owner: 10Marostegui) [07:15:14] (03PS2) 10Marostegui: dbctl: Add new instance [puppet] - 10https://gerrit.wikimedia.org/r/526607 (https://phabricator.wikimedia.org/T229070) [07:15:49] (03CR) 10Marostegui: [C: 03+2] dbctl: Add new instance [puppet] - 10https://gerrit.wikimedia.org/r/526607 (https://phabricator.wikimedia.org/T229070) (owner: 10Marostegui) [07:18:03] (03CR) 10Volans: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526606 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:22:50] (03PS2) 10Marostegui: db-eqiad,db-codfw.php: Add db2125 to s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526606 (https://phabricator.wikimedia.org/T228969) [07:25:41] !log Add db2125 to tendril and zarcillo T228969 [07:25:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:50] T228969: Productionize db21[21-30} - https://phabricator.wikimedia.org/T228969 [07:29:42] (03PS3) 10Marostegui: db-eqiad,db-codfw.php: Add db2125 to s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526606 (https://phabricator.wikimedia.org/T228969) [07:29:54] !log restart-hhvm on mw1290 [07:29:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:30:49] RECOVERY - Nginx local proxy to apache on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.080 second response time https://wikitech.wikimedia.org/wiki/Application_servers [07:31:03] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 76222 bytes in 0.237 second response time https://wikitech.wikimedia.org/wiki/Application_servers [07:31:08] (03CR) 10Volans: [C: 03+1] "LGTM both this patch and dbctl config diff" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526606 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:31:21] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers [07:31:33] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Add db2125 to s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526606 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:32:29] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2125 to s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526606 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:32:45] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2125 to s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526606 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:33:39] PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin1001 is CRITICAL: CRITICAL - Uncommitted dbctl configuration changes, check dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs [07:33:55] PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2001 is CRITICAL: CRITICAL - Uncommitted dbctl configuration changes, check dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs [07:33:58] ^that will be fixed shortly [07:34:04] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Provision db2125 into s2 T228969 (duration: 00m 49s) [07:34:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:11] T228969: Productionize db21[21-30} - https://phabricator.wikimedia.org/T228969 [07:34:57] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Provision db2125 into s2 T228969 (duration: 00m 47s) [07:35:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:09] !log marostegui@cumin1001 dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8833', previous config saved to /var/cache/conftool/dbconfig/20190731-073608-marostegui.json [07:36:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:11] RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin1001 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs [07:39:29] RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2001 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs [07:39:59] !log Drop abuse_filter_log.afl_log_id in s1 eqiad - T226851 [07:40:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:06] T226851: Drop abuse_filter_log.afl_log_id in production - https://phabricator.wikimedia.org/T226851 [07:45:27] (03PS1) 10Marostegui: install_server: Do not reimage db2121-2130 [puppet] - 10https://gerrit.wikimedia.org/r/526609 (https://phabricator.wikimedia.org/T228969) [07:46:25] (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db2121-2130 [puppet] - 10https://gerrit.wikimedia.org/r/526609 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:47:27] (03PS1) 10Elukey: profile::cache::kafka::alerts: move alarms to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/526611 (https://phabricator.wikimedia.org/T229357) [08:00:37] (03PS6) 10Giuseppe Lavagetto: mtail: fix mediawiki access log metrics names [puppet] - 10https://gerrit.wikimedia.org/r/526388 [08:01:34] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mtail: fix mediawiki access log metrics names [puppet] - 10https://gerrit.wikimedia.org/r/526388 (owner: 10Giuseppe Lavagetto) [08:02:43] (03PS1) 10Elukey: role::analytics_cluster::hadoop::master|standby: increase NN heap size [puppet] - 10https://gerrit.wikimedia.org/r/526613 (https://phabricator.wikimedia.org/T228620) [08:03:50] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::hadoop::master|standby: increase NN heap size [puppet] - 10https://gerrit.wikimedia.org/r/526613 (https://phabricator.wikimedia.org/T228620) (owner: 10Elukey) [08:05:16] (03CR) 10Volans: [C: 04-1] "I think there is a nicer way to do it, see inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526562 (owner: 10CRusnov) [08:05:56] !log restart hadoop Namenodes on an-master100[12] to pick up new heap settings and new openjdk [08:06:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:57] <_joe_> !log running puppet (and restarting mtail) on all eqiad appservers [08:07:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:14] (03PS1) 10Marostegui: mariadb: Provision db2126 into s2 [puppet] - 10https://gerrit.wikimedia.org/r/526614 (https://phabricator.wikimedia.org/T228969) [08:12:53] (03PS2) 10Marostegui: mariadb: Provision db2126 into s2 [puppet] - 10https://gerrit.wikimedia.org/r/526614 (https://phabricator.wikimedia.org/T228969) [08:14:25] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/526614 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [08:14:48] (03CR) 10Marostegui: [C: 03+2] mariadb: Provision db2126 into s2 [puppet] - 10https://gerrit.wikimedia.org/r/526614 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [08:18:32] (03PS1) 10Filippo Giunchedi: prometheus: add sentry4 snmp_exporter config [puppet] - 10https://gerrit.wikimedia.org/r/526615 (https://phabricator.wikimedia.org/T148541) [08:18:34] (03PS1) 10Filippo Giunchedi: prometheus: prefix pdu metrics [puppet] - 10https://gerrit.wikimedia.org/r/526616 (https://phabricator.wikimedia.org/T148541) [08:24:09] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: add sentry4 snmp_exporter config [puppet] - 10https://gerrit.wikimedia.org/r/526615 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [08:24:19] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: prefix pdu metrics [puppet] - 10https://gerrit.wikimedia.org/r/526616 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [08:37:49] !log restart Yarn Resource Managers on an-master100[12] to pick up the new openjdk version [08:37:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:38] (03PS1) 10Filippo Giunchedi: prometheus: add sentry4 PDUs support [puppet] - 10https://gerrit.wikimedia.org/r/526619 (https://phabricator.wikimedia.org/T148541) [08:54:08] (03PS1) 10Elukey: Add cumin aliases for Kafka Mirror Maker [puppet] - 10https://gerrit.wikimedia.org/r/526620 (https://phabricator.wikimedia.org/T229003) [08:58:00] (03PS3) 10Fsero: helmfile: Update README to mention ".hfenv" [deployment-charts] - 10https://gerrit.wikimedia.org/r/525468 (owner: 10Thcipriani) [08:58:12] (03CR) 10Fsero: [V: 03+2 C: 03+2] helmfile: Update README to mention ".hfenv" [deployment-charts] - 10https://gerrit.wikimedia.org/r/525468 (owner: 10Thcipriani) [08:58:33] (03PS2) 10Elukey: Add cumin aliases for Kafka Mirror Maker and zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/526620 (https://phabricator.wikimedia.org/T229003) [09:12:35] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor comment, rest LGTM. Also keep in mind that chances are that parsoid will not be around in 6 to 12 months from now." (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/525481 (https://phabricator.wikimedia.org/T228909) (owner: 10Jeena Huneidi) [09:17:40] (03PS1) 10Elukey: Add sre.kafka.roll-restart-mirror-maker.py [cookbooks] - 10https://gerrit.wikimedia.org/r/526624 (https://phabricator.wikimedia.org/T229003) [09:23:49] (03PS1) 10Filippo Giunchedi: prometheus: fix pdu_ metrics prefixing [puppet] - 10https://gerrit.wikimedia.org/r/526625 (https://phabricator.wikimedia.org/T148541) [09:27:21] (03PS4) 10Ema: ATS: add support for the compress plugin and enable it [puppet] - 10https://gerrit.wikimedia.org/r/526436 (https://phabricator.wikimedia.org/T227432) [09:27:52] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: add sentry4 PDUs support [puppet] - 10https://gerrit.wikimedia.org/r/526619 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [09:28:04] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: fix pdu_ metrics prefixing [puppet] - 10https://gerrit.wikimedia.org/r/526625 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [09:29:01] (03PS5) 10Ema: ATS: add support for the compress plugin and enable it [puppet] - 10https://gerrit.wikimedia.org/r/526436 (https://phabricator.wikimedia.org/T227432) [09:30:01] (03CR) 10Ema: [C: 03+2] ATS: add support for the compress plugin and enable it [puppet] - 10https://gerrit.wikimedia.org/r/526436 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [09:36:08] (03PS2) 10Jbond: hiera backends: update the config and hiera backend with the correct names [puppet] - 10https://gerrit.wikimedia.org/r/526420 [09:37:02] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/526620 (https://phabricator.wikimedia.org/T229003) (owner: 10Elukey) [09:38:01] (03CR) 10Jbond: [C: 03+2] hiera backends: update the config and hiera backend with the correct names [puppet] - 10https://gerrit.wikimedia.org/r/526420 (owner: 10Jbond) [09:40:38] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/526624 (https://phabricator.wikimedia.org/T229003) (owner: 10Elukey) [09:43:56] (03PS1) 10Lucas Werkmeister (WMDE): vcl: add Access-Control-Allow-Origin to mobile redirects [puppet] - 10https://gerrit.wikimedia.org/r/526627 (https://phabricator.wikimedia.org/T229385) [09:45:37] (03PS2) 10Cparle: Enable other statements on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525538 (owner: 10Matthias Mullie) [09:45:44] (03CR) 10Lucas Werkmeister (WMDE): vcl: add Access-Control-Allow-Origin to mobile redirects (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526627 (https://phabricator.wikimedia.org/T229385) (owner: 10Lucas Werkmeister (WMDE)) [09:48:28] (03PS1) 10Elukey: Add sre.zookeeper.roll-restart-zookeeper.py [cookbooks] - 10https://gerrit.wikimedia.org/r/526628 (https://phabricator.wikimedia.org/T229003) [09:48:30] (03PS3) 10Elukey: Add cumin aliases for Kafka Mirror Maker and zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/526620 (https://phabricator.wikimedia.org/T229003) [09:49:40] (03CR) 10Elukey: [C: 03+2] Add cumin aliases for Kafka Mirror Maker and zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/526620 (https://phabricator.wikimedia.org/T229003) (owner: 10Elukey) [09:49:44] <_joe_> !log pruning orphaned images on contint1001 [09:49:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:58] PROBLEM - puppet last run on poolcounter2003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:51:00] PROBLEM - puppet last run on dbprov2001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:51:08] (03CR) 10Elukey: [C: 03+2] Add sre.kafka.roll-restart-mirror-maker.py [cookbooks] - 10https://gerrit.wikimedia.org/r/526624 (https://phabricator.wikimedia.org/T229003) (owner: 10Elukey) [09:51:20] PROBLEM - puppet last run on mc2027 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:52:18] PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:52:18] PROBLEM - puppet last run on mw2205 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:52:44] PROBLEM - puppet last run on mc2024 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:52:44] PROBLEM - puppet last run on mc2033 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:52:44] PROBLEM - puppet last run on kubernetes2006 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:52:44] PROBLEM - puppet last run on mw2243 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:52:44] PROBLEM - puppet last run on mw2198 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:52:49] ^^^ looking [09:52:51] what's going on? [09:53:25] i deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/526420 it should have been a no op the few servers i have checked seem to run ok [09:53:26] Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, undefined method `[]' for nil:NilClass at /etc/puppet/manifests/realm.pp:24:14 on node mw2198.codfw.wmnet [09:53:39] (03PS1) 10Jbond: Revert "hiera backends: update the config and hiera backend with the correct names" [puppet] - 10https://gerrit.wikimedia.org/r/526629 [09:53:50] (03PS2) 10Jbond: Revert "hiera backends: update the config and hiera backend with the correct names" [puppet] - 10https://gerrit.wikimedia.org/r/526629 [09:54:09] jbond42: I am trying to re-run puppet, maybe it is temporary [09:54:10] PROBLEM - puppet last run on kubetcd2002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:54:16] gimme 1 min [09:54:23] yes just prpering thanks [09:54:34] now works fine [09:54:41] on mw2198 [09:54:55] im doing a puppet run on failed ones now from cumin [09:54:59] super [09:55:20] PROBLEM - puppet last run on db2101 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:55:22] PROBLEM - puppet last run on elastic2042 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:55:28] PROBLEM - puppet last run on webperf2002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:55:53] (03PS1) 10Alexandros Kosiaris: termbox: Make LVS paging [puppet] - 10https://gerrit.wikimedia.org/r/526631 [09:55:55] (03PS1) 10Alexandros Kosiaris: restrouter: Add kubernetes stanzas [puppet] - 10https://gerrit.wikimedia.org/r/526632 (https://phabricator.wikimedia.org/T223953) [09:56:32] !log elukey@cumin1001 START - Cookbook sre.kafka.roll-restart-mirror-maker [09:56:36] PROBLEM - puppet last run on mw2175 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:38] PROBLEM - puppet last run on ms-be2035 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:12] PROBLEM - puppet last run on kafka-main2004 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:12] PROBLEM - puppet last run on graphite2003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:12] PROBLEM - puppet last run on db2067 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:12] PROBLEM - puppet last run on db2080 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:12] PROBLEM - puppet last run on elastic2032 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:13] PROBLEM - puppet last run on kubernetes2003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:13] PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:14] PROBLEM - puppet last run on mw2150 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:14] RECOVERY - puppet last run on mw2198 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:15] PROBLEM - puppet last run on mw2239 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:15] PROBLEM - puppet last run on mw2171 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:16] PROBLEM - puppet last run on mw2196 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:57:16] PROBLEM - puppet last run on mw2188 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:58:46] PROBLEM - puppet last run on restbase2013 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:58:46] PROBLEM - puppet last run on db2084 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:59:20] (03PS2) 10Elukey: Add sre.zookeeper.roll-restart-zookeeper.py [cookbooks] - 10https://gerrit.wikimedia.org/r/526628 (https://phabricator.wikimedia.org/T229003) [09:59:30] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:01:28] (03PS1) 10Filippo Giunchedi: facilities: add model to pdu monitoring [puppet] - 10https://gerrit.wikimedia.org/r/526633 (https://phabricator.wikimedia.org/T148541) [10:01:30] (03PS1) 10Filippo Giunchedi: prometheus: query pdu resources based on model [puppet] - 10https://gerrit.wikimedia.org/r/526634 (https://phabricator.wikimedia.org/T148541) [10:01:54] (03CR) 10jerkins-bot: [V: 04-1] facilities: add model to pdu monitoring [puppet] - 10https://gerrit.wikimedia.org/r/526633 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [10:02:04] RECOVERY - puppet last run on mw2175 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:02:04] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:02:04] PROBLEM - puppet last run on mw2212 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:03:24] PROBLEM - puppet last run on proton2002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:03:30] PROBLEM - puppet last run on mw2290 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:03:50] (03CR) 10Jbond: [C: 03+2] Revert "hiera backends: update the config and hiera backend with the correct names" [puppet] - 10https://gerrit.wikimedia.org/r/526629 (owner: 10Jbond) [10:04:10] PROBLEM - puppet last run on ms-fe2007 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:04:10] PROBLEM - puppet last run on acrux is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:04:12] PROBLEM - puppet last run on wtp2006 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:04:28] PROBLEM - puppet last run on deploy2001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:04:34] PROBLEM - puppet last run on db2114 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:04:40] PROBLEM - puppet last run on cloudnet2003-dev is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:14] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:14] PROBLEM - puppet last run on dbmonitor1001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:14] PROBLEM - puppet last run on logstash1010 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:14] PROBLEM - puppet last run on mw1346 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:14] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:15] PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:15] PROBLEM - puppet last run on db2128 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:16] PROBLEM - puppet last run on db2104 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:16] PROBLEM - puppet last run on kubernetes2005 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:17] PROBLEM - puppet last run on ms-be2017 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:17] PROBLEM - puppet last run on ms-fe2005 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:18] PROBLEM - puppet last run on mw2141 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:18] PROBLEM - puppet last run on mw2274 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:18] !log rolling back https://gerrit.wikimedia.org/r/q/c9f876e9990fb171f27616515e7d125824d7a6ac [10:05:19] PROBLEM - puppet last run on mw2225 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:19] PROBLEM - puppet last run on mw2202 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:20] PROBLEM - puppet last run on releases2001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:20] PROBLEM - puppet last run on cp4028 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:05:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:52] PROBLEM - puppet last run on elastic1040 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:06:52] PROBLEM - puppet last run on restbase2019 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:06:52] PROBLEM - puppet last run on mw2157 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:07:34] PROBLEM - puppet last run on db2105 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:07:48] PROBLEM - puppet last run on mc2019 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:07:48] PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:08:30] PROBLEM - puppet last run on mw2277 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:08:30] PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:08:39] !log elukey@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) [10:08:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:08:46] PROBLEM - puppet last run on mw2251 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:08:55] volans: \o/ [10:08:58] PROBLEM - puppet last run on ms-be1029 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:09:00] PROBLEM - puppet last run on druid1003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:09:01] jbond42: do you need help? [10:09:28] PROBLEM - puppet last run on ms-fe1008 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:09:29] fsero: thanks, im rolling back now which should hopefully resolve this issue [10:09:30] PROBLEM - puppet last run on mw2176 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:09:33] (03CR) 10Lucas Werkmeister (WMDE): "This can be tested as with the following commands:" [puppet] - 10https://gerrit.wikimedia.org/r/526627 (https://phabricator.wikimedia.org/T229385) (owner: 10Lucas Werkmeister (WMDE)) [10:10:12] PROBLEM - puppet last run on labstore1007 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:10:19] (03PS2) 10Filippo Giunchedi: facilities: add model to pdu monitoring [puppet] - 10https://gerrit.wikimedia.org/r/526633 (https://phabricator.wikimedia.org/T148541) [10:10:20] PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:10:21] (03PS2) 10Filippo Giunchedi: prometheus: query pdu resources based on model [puppet] - 10https://gerrit.wikimedia.org/r/526634 (https://phabricator.wikimedia.org/T148541) [10:11:44] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:12:09] elukey: yay! [10:19:02] RECOVERY - puppet last run on db2101 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:02] RECOVERY - puppet last run on dbprov2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:02] RECOVERY - puppet last run on mc2024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:02] RECOVERY - puppet last run on mc2027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:02] RECOVERY - puppet last run on kubernetes2003 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:03] RECOVERY - puppet last run on db2080 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:03] RECOVERY - puppet last run on poolcounter2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:04] RECOVERY - puppet last run on mw2243 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:04] RECOVERY - puppet last run on webperf2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:19:05] RECOVERY - puppet last run on mw2205 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:40] RECOVERY - puppet last run on elastic2042 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:40] RECOVERY - puppet last run on kafka-main2004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:40] RECOVERY - puppet last run on mc2033 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:40] RECOVERY - puppet last run on graphite2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:40] RECOVERY - puppet last run on db2067 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:41] RECOVERY - puppet last run on ms-be2026 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:41] RECOVERY - puppet last run on restbase2013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:42] RECOVERY - puppet last run on db2084 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:42] RECOVERY - puppet last run on ms-be2035 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:43] RECOVERY - puppet last run on kubernetes2006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:43] RECOVERY - puppet last run on elastic2032 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:44] RECOVERY - puppet last run on mw2180 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:44] RECOVERY - puppet last run on mw2239 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:45] RECOVERY - puppet last run on mw2150 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:45] RECOVERY - puppet last run on mw2171 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:46] RECOVERY - puppet last run on kubetcd2002 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:46] RECOVERY - puppet last run on mw2188 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:24:47] RECOVERY - puppet last run on mw2196 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:25:54] (03CR) 10Volans: [C: 03+1] "LGTM one doubt inline" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/526628 (https://phabricator.wikimedia.org/T229003) (owner: 10Elukey) [10:28:13] (03CR) 10Giuseppe Lavagetto: [C: 04-1] utils: add run_ci_locally.sh (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/526469 (owner: 10Giuseppe Lavagetto) [10:30:16] RECOVERY - puppet last run on mw1346 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:16] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:16] RECOVERY - puppet last run on db2104 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:16] RECOVERY - puppet last run on mc2019 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:16] RECOVERY - puppet last run on restbase2019 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:17] RECOVERY - puppet last run on ms-fe2007 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:17] RECOVERY - puppet last run on cloudnet2003-dev is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:18] RECOVERY - puppet last run on ms-be2017 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:18] RECOVERY - puppet last run on acrux is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:19] RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:19] RECOVERY - puppet last run on deploy2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:20] RECOVERY - puppet last run on mw2157 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:20] RECOVERY - puppet last run on proton2002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:21] RECOVERY - puppet last run on mw2141 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:21] RECOVERY - puppet last run on mw2202 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:30:22] RECOVERY - puppet last run on mw2212 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:34:52] (03PS1) 10Filippo Giunchedi: prometheus: generate targets for sentry4 PDUs too [puppet] - 10https://gerrit.wikimedia.org/r/526640 (https://phabricator.wikimedia.org/T148541) [10:35:54] RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:35:54] RECOVERY - puppet last run on logstash1010 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:35:54] RECOVERY - puppet last run on labstore1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:35:54] RECOVERY - puppet last run on ms-fe1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:35:54] RECOVERY - puppet last run on ms-be1029 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:38:39] welcome back icinga-wm [10:39:06] (03PS1) 10Jbond: run-puppet-agent: updated failed-only consider zero resources and fail [puppet] - 10https://gerrit.wikimedia.org/r/526642 [10:40:19] volans: could you take a look at this ^^ [10:40:27] sure [10:40:40] thanks [10:40:46] (03PS1) 10Ladsgroup: labs: Set tmpItemTermsMigrationStages to MIGRATION_WRITE_BOTH for half of Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526643 (https://phabricator.wikimedia.org/T225057) [10:41:32] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/526642 (owner: 10Jbond) [10:41:46] (03CR) 10Jbond: [C: 03+2] run-puppet-agent: updated failed-only consider zero resources and fail [puppet] - 10https://gerrit.wikimedia.org/r/526642 (owner: 10Jbond) [10:42:12] (03PS2) 10Ladsgroup: labs: Set tmpItemTermsMigrationStages to MIGRATION_WRITE_BOTH for half of Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526643 (https://phabricator.wikimedia.org/T225057) [10:42:47] (03CR) 10Ladsgroup: [C: 03+2] labs: Set tmpItemTermsMigrationStages to MIGRATION_WRITE_BOTH for half of Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526643 (https://phabricator.wikimedia.org/T225057) (owner: 10Ladsgroup) [10:43:45] (03Merged) 10jenkins-bot: labs: Set tmpItemTermsMigrationStages to MIGRATION_WRITE_BOTH for half of Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526643 (https://phabricator.wikimedia.org/T225057) (owner: 10Ladsgroup) [10:44:02] (03CR) 10jenkins-bot: labs: Set tmpItemTermsMigrationStages to MIGRATION_WRITE_BOTH for half of Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526643 (https://phabricator.wikimedia.org/T225057) (owner: 10Ladsgroup) [10:44:23] ^ rebased [10:51:20] (03CR) 10Fsero: utils: add run_ci_locally.sh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526469 (owner: 10Giuseppe Lavagetto) [10:51:37] (03PS1) 10Jbond: Revert "Revert "hiera backends: update the config and hiera backend with the correct names"" [puppet] - 10https://gerrit.wikimedia.org/r/526645 [10:54:20] (03CR) 10Fsero: utils: add run_ci_locally.sh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526469 (owner: 10Giuseppe Lavagetto) [10:59:50] (03PS2) 10Jbond: This change attempts to re-deploy a failed change[1] updating the hiera backends again. When the initial change was deployed a number of systems started to alert in icinga with 'Failed to apply catalog, zero resources tracked by'. I believe this was caused as i did not deploy the change to the puppet masters simultaneously first. This results in a node potentially hitting two puppet masters with differ [11:00:05] Amir1, Lucas_WMDE, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190731T1100). [11:00:05] matthiasmullie, cormacparle_, Amir1, kart_, and kostajh: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:13] o/ [11:00:16] o/ [11:00:21] I would prefer going last [11:00:23] (03CR) 10Matthias Mullie: [C: 03+1] "ready for swat" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525538 (owner: 10Matthias Mullie) [11:00:55] \o [11:01:00] (03CR) 10jerkins-bot: [V: 04-1] This change attempts to re-deploy a failed change[1] updating the hiera backends again. When the initial change was deployed a number of systems started to alert in icinga with 'Failed to apply catalog, zero resources tracked by'. I believe this was caused as i did not deploy the change to the puppet masters simultaneously first. This results in a node potentially hitting two puppet ma [11:01:01] 10https://gerrit.wikimedia.org/r/526645 (owner: 10Jbond) [11:01:45] ok for me to go ahead with mine? [11:01:45] * kart_ is here [11:02:20] (03PS3) 10Jbond: hiera backends: update the config and hiera backend with the correct names [puppet] - 10https://gerrit.wikimedia.org/r/526645 [11:02:24] cormacparle__: go ahead :) [11:02:37] ok, starting ... [11:02:40] o/ [11:03:17] (03PS3) 10Cparle: Enable other statements on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525538 (owner: 10Matthias Mullie) [11:04:09] (03CR) 10Cparle: [C: 03+2] Enable other statements on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525538 (owner: 10Matthias Mullie) [11:04:37] (03PS1) 10Ladsgroup: Fix typo in name of config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526646 (https://phabricator.wikimedia.org/T225055) [11:05:03] * Lucas_WMDE is excited for other statements on Commons [11:05:10] (03Merged) 10jenkins-bot: Enable other statements on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525538 (owner: 10Matthias Mullie) [11:06:32] (03CR) 10jenkins-bot: Enable other statements on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525538 (owner: 10Matthias Mullie) [11:07:50] (03PS2) 10Jbond: puppetdb (buster): dont install the puppetdb4 component on buster servers [puppet] - 10https://gerrit.wikimedia.org/r/526470 [11:09:12] cormacparle__: ping me when done :) [11:09:42] (03CR) 10Volans: hiera backends: update the config and hiera backend with the correct names (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526645 (owner: 10Jbond) [11:09:54] (03PS4) 10Jbond: hiera backends: update the config and hiera backend with the correct names [puppet] - 10https://gerrit.wikimedia.org/r/526645 [11:10:07] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "plan LGTM. As I said on irc, you might also need to restart the puppetmasters after running puppet there." [puppet] - 10https://gerrit.wikimedia.org/r/526645 (owner: 10Jbond) [11:10:45] (03CR) 10Giuseppe Lavagetto: [C: 04-1] utils: add run_ci_locally.sh (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/526469 (owner: 10Giuseppe Lavagetto) [11:11:02] (03PS5) 10Jbond: hiera backends: update the config and hiera backend with the correct names [puppet] - 10https://gerrit.wikimedia.org/r/526645 [11:11:58] (03PS6) 10Jbond: hiera backends: update the config and hiera backend with the correct names [puppet] - 10https://gerrit.wikimedia.org/r/526645 [11:12:23] (03CR) 10Jbond: "> Patch Set 3: Code-Review+1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526645 (owner: 10Jbond) [11:15:52] (03PS2) 10KartikMistry: Decrease idwiki MT threshold for publishing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526197 (https://phabricator.wikimedia.org/T228971) (owner: 10Petar.petkovic) [11:16:46] !log cparle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable other statements on Commons (duration: 00m 48s) [11:16:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:30] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) is CRITICAL: Test retrieve featured image data for April 29, 2016 returned the unexpected status 504 (expecting: 200): /{domain}/v1/page/featured/{year}/{month}/{day} (retrieve title of the featured article for April 29, 2016) is CRITICAL: Test retrieve title of the [11:18:30] for April 29, 2016 returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [11:18:35] ok all done kart_ [11:18:43] cormacparle__: cool. [11:19:39] (03CR) 10KartikMistry: [C: 03+2] Decrease idwiki MT threshold for publishing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526197 (https://phabricator.wikimedia.org/T228971) (owner: 10Petar.petkovic) [11:20:08] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [11:21:26] (03Merged) 10jenkins-bot: Decrease idwiki MT threshold for publishing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526197 (https://phabricator.wikimedia.org/T228971) (owner: 10Petar.petkovic) [11:21:43] (03CR) 10jenkins-bot: Decrease idwiki MT threshold for publishing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526197 (https://phabricator.wikimedia.org/T228971) (owner: 10Petar.petkovic) [11:24:44] (03PS1) 10Jbond: puppetmaster - canary-hosts: remove sarin as a canary host. [puppet] - 10https://gerrit.wikimedia.org/r/526652 [11:25:18] (03CR) 10Jbond: [C: 03+2] puppetmaster - canary-hosts: remove sarin as a canary host. [puppet] - 10https://gerrit.wikimedia.org/r/526652 (owner: 10Jbond) [11:25:31] (03PS2) 10Giuseppe Lavagetto: utils: add run_ci_locally.sh [puppet] - 10https://gerrit.wikimedia.org/r/526469 [11:25:36] (03PS3) 10Jbond: puppetdb (buster): dont install the puppetdb4 component on buster servers [puppet] - 10https://gerrit.wikimedia.org/r/526470 [11:25:52] !log kartik@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:526197|Decrease idwiki MT threshold for publishing (T228971)]] (duration: 00m 48s) [11:25:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:00] T228971: Adjust the threshold for Indonesian to prevent publishing when overall unmodified content is higher than 70% - https://phabricator.wikimedia.org/T228971 [11:26:14] OK. Next patch. [11:26:49] (03PS1) 10Ladsgroup: labs: Set tmpPropertyTermsMigrationStage to MIGRATION_WRITE_NEW in wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526653 (https://phabricator.wikimedia.org/T225053) [11:26:54] (03CR) 10Jbond: [C: 03+2] puppetdb (buster): dont install the puppetdb4 component on buster servers [puppet] - 10https://gerrit.wikimedia.org/r/526470 (owner: 10Jbond) [11:26:59] Sigh, I should have +2 it earlier.. [11:27:29] kart_: Can I merge and rebase a patch in between? It's a pretty quick and noop (no need to deploy, just rebase) [11:27:37] Amir1: Do you want to go for your config patch? [11:27:47] Amir1: yes. Just pinged you for that :) [11:28:06] kart_: so that's not the actual main thing, I prefer to go last for that one [11:28:12] but a very small thing [11:28:24] (03CR) 10Ladsgroup: [C: 03+2] labs: Set tmpPropertyTermsMigrationStage to MIGRATION_WRITE_NEW in wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526653 (https://phabricator.wikimedia.org/T225053) (owner: 10Ladsgroup) [11:28:54] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:29:04] Rebase and merge for config patch is OK while CI for extension is running. Hope that I'm not wrong :) [11:29:22] (03Merged) 10jenkins-bot: labs: Set tmpPropertyTermsMigrationStage to MIGRATION_WRITE_NEW in wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526653 (https://phabricator.wikimedia.org/T225053) (owner: 10Ladsgroup) [11:29:23] yeah this is noop [11:29:37] (03CR) 10jenkins-bot: labs: Set tmpPropertyTermsMigrationStage to MIGRATION_WRITE_NEW in wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526653 (https://phabricator.wikimedia.org/T225053) (owner: 10Ladsgroup) [11:29:44] done [11:29:51] kart_: I'm done for now [11:30:11] Amir1: any particular reason for going last, just curious.. [11:30:53] I want to spend lots of time monitoring, etc. It moves 10k queries / sec from a table to set of tables [11:31:08] kart_: I'm around for mine if you want to do those next [11:31:33] kostajh: OK. wmf patches? It is going to take long time for CI :/ [11:31:42] yep [11:31:59] We could do the wmf.16 later [11:32:24] Yeah. I dropped my wmf.16 for same reason :) [11:32:44] kostajh: I'll ping once I start, so you can +2 your patch. [11:32:47] kart_: actually, let me just do it in the next window [11:32:48] As I have to be somewhere in 30 minutes or so [11:32:55] kart_: I'll postpone [11:33:00] ah, OK. [11:33:19] Please update deployment page. [11:33:20] Just updated [11:33:22] :) [11:35:40] kart_: Can I deploy something else? Not the big thing and noop for prod [11:35:42] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/526646 [11:36:28] Yeah. Add in calendar too. [11:36:34] Sure [11:36:35] Thanks kostajh [11:36:40] (03CR) 10Ladsgroup: [C: 03+2] Fix typo in name of config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526646 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:37:36] (03Merged) 10jenkins-bot: Fix typo in name of config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526646 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:39:02] (03CR) 10jenkins-bot: Fix typo in name of config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526646 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:39:04] Btw, there is: `modified: extensions/CheckUser (new commits)` - in wmf.15 branch while doing 'git status' [11:39:23] Anyone know if that affects deployment? [11:40:17] !log ladsgroup@deploy1001 Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:526646|Fix typo in name of config (T225055) (duration: 00m 47s) [11:40:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:23] T225055: Switch `tmpItemTermsMigrationStages` to MIGRATION_WRITE_BOTH - https://phabricator.wikimedia.org/T225055 [11:41:13] ^ done [11:41:21] !log disable puppet to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645 [11:41:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:43] (03PS7) 10Jbond: hiera backends: update the config and hiera backend with the correct names [puppet] - 10https://gerrit.wikimedia.org/r/526645 [11:43:42] Amir1: Do you know if untracked modification as I mention can cause issue? [11:44:08] Sorry I didn't understand it [11:44:14] (03CR) 10Jbond: [C: 03+2] hiera backends: update the config and hiera backend with the correct names [puppet] - 10https://gerrit.wikimedia.org/r/526645 (owner: 10Jbond) [11:44:32] Amir1: git status in wmf.15 shows extensions/CheckUser has new commits. [11:44:54] kart_: that might be security patches [11:44:58] double check [11:45:10] OK. Let me check. [11:51:24] Seems good so far. [11:51:27] Deploying. [11:52:20] !log kartik@deploy1001 Synchronized php-1.34.0-wmf.15/extensions/ExternalGuidance: SWAT: [[gerrit|526637|Provide the messages in the target language of translation (T228019)]] (duration: 00m 46s) [11:52:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:29] OK. Done. [11:52:39] Amir1: you can go ahead with your patch. [11:52:45] T228019: Injected info does not get translated - https://phabricator.wikimedia.org/T228019 [11:54:46] marostegui: [11:54:51] We are going live now [11:56:28] !log enable puppet fleet wide https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645 deployed [11:56:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:56:56] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:57:52] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [11:58:09] Amir1: ok [11:59:20] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 76214 bytes in 0.132 second response time https://wikitech.wikimedia.org/wiki/Application_servers [12:00:07] (03PS3) 10Ladsgroup: Switch property terms migration to WRITE_NEW on production wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519212 (https://phabricator.wikimedia.org/T225053) (owner: 10Alaa Sarhan) [12:00:25] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519212 (https://phabricator.wikimedia.org/T225053) (owner: 10Alaa Sarhan) [12:01:28] (03Merged) 10jenkins-bot: Switch property terms migration to WRITE_NEW on production wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519212 (https://phabricator.wikimedia.org/T225053) (owner: 10Alaa Sarhan) [12:01:43] (03CR) 10jenkins-bot: Switch property terms migration to WRITE_NEW on production wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519212 (https://phabricator.wikimedia.org/T225053) (owner: 10Alaa Sarhan) [12:05:01] !log ladsgroup@deploy1001 sync-file aborted: SWAT: [[gerrit:519212|Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 03s) [12:05:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:09] T225053: Switch `tmpPropertyTermsMigrationStage` to MIGRATION_WRITE_NEW - https://phabricator.wikimedia.org/T225053 [12:06:11] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:519212|Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 47s) [12:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:13] marostegui: so I see a huge increase in rows read which is I think okay (if you read from several tables, it will be more reads) and specially it will be cached and the rows will be reduced [12:10:21] (03PS1) 10Jbond: statistics::gpu: add missing group [puppet] - 10https://gerrit.wikimedia.org/r/526656 [12:11:20] traffic stayed the same [12:11:40] I'm checking [12:14:36] I will double check bu this supposed to be cached, I don't know why that hasn't happened [12:15:40] (03PS1) 10Ladsgroup: Revert "Switch property terms migration to WRITE_NEW on production wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526657 [12:15:40] reverting [12:16:39] (03CR) 10Ladsgroup: [C: 03+2] Revert "Switch property terms migration to WRITE_NEW on production wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526657 (owner: 10Ladsgroup) [12:16:48] Amir1: From what I can see the number of processes remained the same and also the queries per second [12:17:40] (03Merged) 10jenkins-bot: Revert "Switch property terms migration to WRITE_NEW on production wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526657 (owner: 10Ladsgroup) [12:17:45] marostegui: This is scary: https://grafana.wikimedia.org/d/000000278/mysql-aggregated?panelId=8&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=s8&var-role=All&from=now-30m&to=now [12:17:53] yes, that is very scary [12:17:56] (03CR) 10jenkins-bot: Revert "Switch property terms migration to WRITE_NEW on production wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526657 (owner: 10Ladsgroup) [12:18:06] if you check the hosts individually, the reads sky rocketed [12:19:30] The revert is being deployed [12:19:36] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:526657|Revert: Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 47s) [12:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:44] T225053: Switch `tmpPropertyTermsMigrationStage` to MIGRATION_WRITE_NEW - https://phabricator.wikimedia.org/T225053 [12:22:17] !log EU SWAT is done [12:22:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:34] (03CR) 10Elukey: [C: 03+2] Add sre.zookeeper.roll-restart-zookeeper.py [cookbooks] - 10https://gerrit.wikimedia.org/r/526628 (https://phabricator.wikimedia.org/T229003) (owner: 10Elukey) [12:53:01] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper [12:53:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:35] !log Drop abuse_filter_log.afl_log_id from s3 codfw with replication (this will cause lag in s3 codfw) - T226851 [12:53:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:43] T226851: Drop abuse_filter_log.afl_log_id in production - https://phabricator.wikimedia.org/T226851 [12:56:38] (03PS1) 10Filippo Giunchedi: prometheus: track total number of puppet resources [puppet] - 10https://gerrit.wikimedia.org/r/526662 (https://phabricator.wikimedia.org/T229262) [12:57:37] (03CR) 10jerkins-bot: [V: 04-1] prometheus: track total number of puppet resources [puppet] - 10https://gerrit.wikimedia.org/r/526662 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [12:59:46] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [12:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:16] \o/ [13:01:51] (03PS2) 10Filippo Giunchedi: prometheus: track total number of puppet resources [puppet] - 10https://gerrit.wikimedia.org/r/526662 (https://phabricator.wikimedia.org/T229262) [13:02:07] (03PS1) 10Marostegui: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526663 [13:05:42] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper [13:05:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:09] (03PS1) 10Jbond: netbox/puppet: An example of how we may intergrate netbox data with puppet [puppet] - 10https://gerrit.wikimedia.org/r/526664 (https://phabricator.wikimedia.org/T229397) [13:11:33] (03PS1) 10Elukey: Improvements to kafka and zookeeper cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/526665 (https://phabricator.wikimedia.org/T229003) [13:12:41] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [13:12:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:33] (03PS2) 10Jbond: netbox/puppet: An example of how we may intergrate netbox data with puppet [puppet] - 10https://gerrit.wikimedia.org/r/526664 (https://phabricator.wikimedia.org/T229397) [13:15:36] !log Drop abuse_filter_log.afl_log_id in s3 eqiad - T226851 [13:15:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:43] T226851: Drop abuse_filter_log.afl_log_id in production - https://phabricator.wikimedia.org/T226851 [13:17:43] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526663 (owner: 10Marostegui) [13:18:38] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526663 (owner: 10Marostegui) [13:18:53] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526663 (owner: 10Marostegui) [13:19:01] !log marostegui@cumin1001 dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8834', previous config saved to /var/cache/conftool/dbconfig/20190731-131900-marostegui.json [13:19:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:40] !log Upgrade db1078 [13:19:45] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1078 for alter and upgrade (duration: 00m 47s) [13:19:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:30] (03PS2) 10Elukey: Improvements to kafka and zookeeper cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/526665 (https://phabricator.wikimedia.org/T229003) [13:23:41] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526666 [13:27:50] !log roll restart of zookeeper on conf100[4-6] and conf200[1-3] for openjdk upgrades [13:27:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:49] (03CR) 10Elukey: [C: 03+2] Improvements to kafka and zookeeper cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/526665 (https://phabricator.wikimedia.org/T229003) (owner: 10Elukey) [13:29:50] (03PS3) 10Filippo Giunchedi: facilities: add model to pdu monitoring [puppet] - 10https://gerrit.wikimedia.org/r/526633 (https://phabricator.wikimedia.org/T148541) [13:29:52] (03PS3) 10Filippo Giunchedi: prometheus: query pdu resources based on model [puppet] - 10https://gerrit.wikimedia.org/r/526634 (https://phabricator.wikimedia.org/T148541) [13:29:54] (03PS2) 10Filippo Giunchedi: prometheus: generate targets for sentry4 PDUs too [puppet] - 10https://gerrit.wikimedia.org/r/526640 (https://phabricator.wikimedia.org/T148541) [13:30:39] (03CR) 10Alexandros Kosiaris: [C: 04-1] utils: add run_ci_locally.sh (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/526469 (owner: 10Giuseppe Lavagetto) [13:31:11] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper [13:31:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:57] !log rolling update of exim [13:33:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:03] !log beginning rolling restarts of codfw kafka-main brokers for security updates [13:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:45] (03PS1) 10Ema: 0.4: do not use ioutil.ReadAll() in fifo-log-tailer [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/526668 (https://phabricator.wikimedia.org/T229414) [13:36:19] (03CR) 10Giuseppe Lavagetto: utils: add run_ci_locally.sh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526469 (owner: 10Giuseppe Lavagetto) [13:37:45] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [13:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:55] (03PS3) 10Giuseppe Lavagetto: utils: add run_ci_locally.sh [puppet] - 10https://gerrit.wikimedia.org/r/526469 [13:40:14] (03CR) 10Giuseppe Lavagetto: [C: 03+2] utils: add run_ci_locally.sh [puppet] - 10https://gerrit.wikimedia.org/r/526469 (owner: 10Giuseppe Lavagetto) [13:40:24] (03PS1) 10CDanis: dbctl: expand to 10% of appservers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526669 (https://phabricator.wikimedia.org/T229070) [13:41:09] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/526662 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [13:43:56] PROBLEM - Disk space on phab2001 is CRITICAL: DISK CRITICAL - /var/spool/exim4/scan is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=phab2001&var-datasource=codfw+prometheus/ops [13:44:25] jbond42: ^ [13:44:38] PROBLEM - Disk space on phab1003 is CRITICAL: DISK CRITICAL - /var/spool/exim4/db is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=phab1003&var-datasource=eqiad+prometheus/ops [13:44:44] marostegui: thanks looking [13:44:49] thanks! [13:44:54] PROBLEM - DPKG on kerberos1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [13:45:32] RECOVERY - Disk space on phab2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=phab2001&var-datasource=codfw+prometheus/ops [13:45:36] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[exim4-config],Package[exim4-daemon-light] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [13:46:37] !log cp4021: test fifo-log-demux 0.4 T229414 [13:46:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:45] T229414: fifo-log-tailer: evergrowing memory usage - https://phabricator.wikimedia.org/T229414 [13:46:48] PROBLEM - Disk space on mx1001 is CRITICAL: DISK CRITICAL - /var/spool/exim4/db is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mx1001&var-datasource=eqiad+prometheus/ops [13:47:27] * Krinkle testing on mwdebug1002 [13:47:28] PROBLEM - Disk space on mx2001 is CRITICAL: DISK CRITICAL - /var/spool/exim4/scan is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mx2001&var-datasource=codfw+prometheus/ops [13:47:52] RECOVERY - Disk space on phab1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=phab1003&var-datasource=eqiad+prometheus/ops [13:48:26] RECOVERY - Disk space on mx1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mx1001&var-datasource=eqiad+prometheus/ops [13:49:01] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper [13:49:06] RECOVERY - Disk space on mx2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mx2001&var-datasource=codfw+prometheus/ops [13:49:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:28] PROBLEM - PHP opcache health on mwdebug1002 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:50:14] Krinkle: ^ intentional? :D [13:50:25] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526666 (owner: 10Marostegui) [13:50:52] cdanis: nope, just snafu I guess. [13:51:09] I'm testing on hhvm at the moment [13:51:20] RECOVERY - DPKG on kerberos1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [13:51:30] !log marostegui@cumin1001 dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8835', previous config saved to /var/cache/conftool/dbconfig/20190731-135129-marostegui.json [13:51:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:07] cdanis: all done with my testing [13:52:38] PROBLEM - puppet last run on mw2164 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 8 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[exim4-config],Package[exim4-daemon-light] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [13:55:09] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526666 (owner: 10Marostegui) [13:56:06] (03PS5) 10BPirkle: Add kask session storage configuration. Use only on testwiki, [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519432 (https://phabricator.wikimedia.org/T222099) [13:56:16] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1078 after upgrade and alter (duration: 00m 46s) [13:56:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:22] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526666 (owner: 10Marostegui) [13:56:58] (03PS1) 10Marostegui: db-eqiad.php: More weight to db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526674 [13:58:12] RECOVERY - puppet last run on mw2164 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [13:58:40] (03CR) 10BPirkle: [C: 03+1] "CentralAuth concerns (and also similar concerns with OAuth) have been resolved via T227696 and T227097." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519432 (https://phabricator.wikimedia.org/T222099) (owner: 10BPirkle) [14:00:04] cdanis: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for dbctl to 10%. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190731T1400). [14:00:43] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More weight to db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526674 (owner: 10Marostegui) [14:01:02] PROBLEM - very high load average likely xfs on ms-be2018 is CRITICAL: CRITICAL - load average: 168.07, 109.25, 55.68 https://wikitech.wikimedia.org/wiki/Swift [14:01:25] !log marostegui@cumin1001 dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8836', previous config saved to /var/cache/conftool/dbconfig/20190731-140124-marostegui.json [14:01:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:47] (03PS2) 10Alexandros Kosiaris: termbox: Make LVS paging [puppet] - 10https://gerrit.wikimedia.org/r/526631 [14:01:56] (03CR) 10Alexandros Kosiaris: [C: 03+2] termbox: Make LVS paging [puppet] - 10https://gerrit.wikimedia.org/r/526631 (owner: 10Alexandros Kosiaris) [14:02:00] (03Merged) 10jenkins-bot: db-eqiad.php: More weight to db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526674 (owner: 10Marostegui) [14:02:03] (03PS1) 10Ayounsi: Netbox, enable Prometheus endpoint [puppet] - 10https://gerrit.wikimedia.org/r/526676 (https://phabricator.wikimedia.org/T226331) [14:02:22] RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:02:54] (03CR) 10Filippo Giunchedi: [C: 03+1] Prometheus add bird prefix export count to global metrics [puppet] - 10https://gerrit.wikimedia.org/r/526536 (owner: 10Ayounsi) [14:03:03] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1078 after upgrade and alter (duration: 00m 46s) [14:03:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:17] (03CR) 10Marostegui: [C: 03+1] dbctl: expand to 10% of appservers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526669 (https://phabricator.wikimedia.org/T229070) (owner: 10CDanis) [14:04:18] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [14:04:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:12] PROBLEM - puppet last run on sessionstore1001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:07:42] PROBLEM - dbctl differs from mediawiki-config in eqiad- did you forget to update both- on cumin1001 is CRITICAL: Mismatched loads for section DEFAULT: diff {(db1078, 250), (db1078, 100)} -- PHP {db1112: 400, db1078: 100, db1123: 300, db1075: 0} vs dbctl {db1112: 400, db1078: 250, db1123: 300, db1075: 0} https://wikitech.wikimedia.org/wiki/Dbctl%23Configuration_deltas_vs_PHP [14:08:01] marostegui: we got you :-P [14:08:10] volans: I committed the change some time ago [14:08:34] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/526674/1/wmf-config/db-eqiad.php and https://phabricator.wikimedia.org/P8836 [14:08:34] config diff is empty in effect [14:08:42] the check runs every 5 minutes [14:08:46] so it should recover soon [14:13:20] RECOVERY - dbctl differs from mediawiki-config in eqiad- did you forget to update both- on cumin1001 is OK: OK: configurations match https://wikitech.wikimedia.org/wiki/Dbctl%23Configuration_deltas_vs_PHP [14:13:26] (03PS2) 10CDanis: dbctl: expand to 25% of appservers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526669 (https://phabricator.wikimedia.org/T229070) [14:13:29] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526678 [14:13:31] (03CR) 10jenkins-bot: db-eqiad.php: More weight to db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526674 (owner: 10Marostegui) [14:14:32] (03CR) 10Giuseppe Lavagetto: [C: 03+1] dbctl: expand to 25% of appservers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526669 (https://phabricator.wikimedia.org/T229070) (owner: 10CDanis) [14:14:39] (03PS1) 10MSantos: WIP: First version of the wikifeeds chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) [14:14:56] (03CR) 10CDanis: [C: 03+2] dbctl: expand to 25% of appservers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526669 (https://phabricator.wikimedia.org/T229070) (owner: 10CDanis) [14:16:33] (03Merged) 10jenkins-bot: dbctl: expand to 25% of appservers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526669 (https://phabricator.wikimedia.org/T229070) (owner: 10CDanis) [14:17:46] (03CR) 10jenkins-bot: dbctl: expand to 25% of appservers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526669 (https://phabricator.wikimedia.org/T229070) (owner: 10CDanis) [14:18:06] RECOVERY - puppet last run on sessionstore1001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:18:29] !log cdanis@deploy1001 Synchronized wmf-config/etcd.php: I02d66736 expand dbctl to 25% of the fleet (duration: 00m 46s) [14:18:34] (03CR) 10Filippo Giunchedi: 0.4: do not use ioutil.ReadAll() in fifo-log-tailer (031 comment) [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/526668 (https://phabricator.wikimedia.org/T229414) (owner: 10Ema) [14:18:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:12] (03CR) 10Ppchelko: [C: 03+1] "I don't quite understand whats going on here, but +1 I guess.." [puppet] - 10https://gerrit.wikimedia.org/r/526632 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [14:26:22] (03CR) 10Giuseppe Lavagetto: [C: 04-1] restrouter: Add kubernetes stanzas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526632 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [14:26:47] (03PS3) 10BBlack: anycast recdns: use for all install-time DNS [puppet] - 10https://gerrit.wikimedia.org/r/526177 (https://phabricator.wikimedia.org/T228190) [14:26:49] (03PS1) 10BBlack: anycast recdns: edge sites via realm.pp (nop) [puppet] - 10https://gerrit.wikimedia.org/r/526684 (https://phabricator.wikimedia.org/T228190) [14:26:51] (03PS1) 10BBlack: anycast recdns: set for canary api/appservers [puppet] - 10https://gerrit.wikimedia.org/r/526685 (https://phabricator.wikimedia.org/T228190) [14:27:08] PROBLEM - puppet last run on serpens is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:27:31] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526678 (owner: 10Marostegui) [14:28:15] !log marostegui@cumin1001 dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8837', previous config saved to /var/cache/conftool/dbconfig/20190731-142814-marostegui.json [14:28:18] (03CR) 10Ppchelko: [C: 03+1] Add kask session storage configuration. Use only on testwiki, [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519432 (https://phabricator.wikimedia.org/T222099) (owner: 10BPirkle) [14:28:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:51] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526678 (owner: 10Marostegui) [14:28:59] !log beginning rolling reboots of codfw logstash hosts for security updates [14:29:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:32] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526678 (owner: 10Marostegui) [14:29:48] (03PS2) 10Ema: 0.4: do not use ioutil.ReadAll() in fifo-log-tailer [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/526668 (https://phabricator.wikimedia.org/T229414) [14:29:55] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1078 after upgrade and alter (duration: 00m 47s) [14:30:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:14] PROBLEM - PHP opcache health on mwdebug2001 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:30:37] (03CR) 10Ema: 0.4: do not use ioutil.ReadAll() in fifo-log-tailer (031 comment) [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/526668 (https://phabricator.wikimedia.org/T229414) (owner: 10Ema) [14:32:05] (03CR) 10Gergő Tisza: [C: 03+1] Enable MachineVision on (beta) commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526543 (https://phabricator.wikimedia.org/T227348) (owner: 10Mholloway) [14:37:42] (03PS1) 10Ayounsi: Netbox, make non sensitive models public [puppet] - 10https://gerrit.wikimedia.org/r/526686 (https://phabricator.wikimedia.org/T226331) [14:38:09] (03CR) 10Ayounsi: [C: 03+2] Prometheus add bird prefix export count to global metrics [puppet] - 10https://gerrit.wikimedia.org/r/526536 (owner: 10Ayounsi) [14:38:14] (03PS2) 10Ayounsi: Prometheus add bird prefix export count to global metrics [puppet] - 10https://gerrit.wikimedia.org/r/526536 [14:38:36] (03PS2) 10Alexandros Kosiaris: restrouter: Add kubernetes stanzas [puppet] - 10https://gerrit.wikimedia.org/r/526632 (https://phabricator.wikimedia.org/T223953) [14:38:38] (03PS1) 10Alexandros Kosiaris: populate kubeconfig resources if both token/username [puppet] - 10https://gerrit.wikimedia.org/r/526687 [14:40:22] (03PS1) 10Marostegui: db-eqiad.php: Depool db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526689 [14:42:26] (03CR) 10BBlack: [C: 03+2] "NOP confirmation via compiler: https://puppet-compiler.wmflabs.org/compiler1001/17685/" [puppet] - 10https://gerrit.wikimedia.org/r/526684 (https://phabricator.wikimedia.org/T228190) (owner: 10BBlack) [14:42:28] (03PS2) 10BBlack: anycast recdns: edge sites via realm.pp (nop) [puppet] - 10https://gerrit.wikimedia.org/r/526684 (https://phabricator.wikimedia.org/T228190) [14:42:33] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526689 (owner: 10Marostegui) [14:42:36] (03CR) 10BBlack: [V: 03+2 C: 03+2] anycast recdns: edge sites via realm.pp (nop) [puppet] - 10https://gerrit.wikimedia.org/r/526684 (https://phabricator.wikimedia.org/T228190) (owner: 10BBlack) [14:43:27] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526689 (owner: 10Marostegui) [14:43:42] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526689 (owner: 10Marostegui) [14:44:48] (03PS1) 10Pmiazga: Enable MobileWebUIActionsTracking schema with 50% sampling rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526691 (https://phabricator.wikimedia.org/T220016) [14:45:08] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1112 (duration: 00m 46s) [14:45:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:36] (03CR) 10CRusnov: "> Patch Set 1: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526562 (owner: 10CRusnov) [14:47:32] !log marostegui@cumin1001 dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8838', previous config saved to /var/cache/conftool/dbconfig/20190731-144731-marostegui.json [14:47:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:22] (03CR) 10Cwhite: [C: 03+1] prometheus: track total number of puppet resources [puppet] - 10https://gerrit.wikimedia.org/r/526662 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [14:49:54] (03CR) 10Filippo Giunchedi: [C: 03+1] 0.4: do not use ioutil.ReadAll() in fifo-log-tailer [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/526668 (https://phabricator.wikimedia.org/T229414) (owner: 10Ema) [14:50:35] (03PS1) 10CRusnov: Update reqs with swift's requirements [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/526693 [14:52:59] (03CR) 10Alexandros Kosiaris: [C: 03+2] "PCC compiles again at https://puppet-compiler.wmflabs.org/compiler1001/17687/" [puppet] - 10https://gerrit.wikimedia.org/r/526687 (owner: 10Alexandros Kosiaris) [14:53:08] (03PS2) 10Alexandros Kosiaris: populate kubeconfig resources if both token/username [puppet] - 10https://gerrit.wikimedia.org/r/526687 [14:54:11] (03PS1) 10Giuseppe Lavagetto: run_ci_locally: fix docker test [puppet] - 10https://gerrit.wikimedia.org/r/526694 [14:54:36] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] run_ci_locally: fix docker test [puppet] - 10https://gerrit.wikimedia.org/r/526694 (owner: 10Giuseppe Lavagetto) [14:54:44] RECOVERY - puppet last run on serpens is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:57:50] !log ms-be2018 disablepd 1I:1:1 - T225630 [14:57:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:58] T225630: ms-be2018 sdc unreadable sector - https://phabricator.wikimedia.org/T225630 [14:59:12] (03PS1) 10Elukey: role::analytics_test_cluster::client: add hive option [puppet] - 10https://gerrit.wikimedia.org/r/526697 (https://phabricator.wikimedia.org/T226698) [15:00:15] (03CR) 10Elukey: [C: 03+2] role::analytics_test_cluster::client: add hive option [puppet] - 10https://gerrit.wikimedia.org/r/526697 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [15:00:45] (03CR) 10Volans: [C: 03+1] "LGTM but please check with @godog also to see if the metrics exposed are ok" [puppet] - 10https://gerrit.wikimedia.org/r/526676 (https://phabricator.wikimedia.org/T226331) (owner: 10Ayounsi) [15:01:25] (03PS3) 10Alexandros Kosiaris: populate kubeconfig resources if both token/username [puppet] - 10https://gerrit.wikimedia.org/r/526687 [15:01:29] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] populate kubeconfig resources if both token/username [puppet] - 10https://gerrit.wikimedia.org/r/526687 (owner: 10Alexandros Kosiaris) [15:03:50] !log power down re1:cr1-codfw (backup) - T226422 [15:03:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:58] T226422: update RE-S-X6-64G-S in cr[12]-codfw - https://phabricator.wikimedia.org/T226422 [15:06:51] going to move forward with train, deploying to group0. [15:12:55] (03CR) 10Ema: [C: 03+2] 0.4: do not use ioutil.ReadAll() in fifo-log-tailer [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/526668 (https://phabricator.wikimedia.org/T229414) (owner: 10Ema) [15:13:05] (03PS1) 10Brennen Bearnes: Group0 to 1.34.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526699 [15:13:07] (03CR) 10Brennen Bearnes: [C: 03+2] Group0 to 1.34.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526699 (owner: 10Brennen Bearnes) [15:13:18] go, brennen, go! [15:14:26] (03Merged) 10jenkins-bot: Group0 to 1.34.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526699 (owner: 10Brennen Bearnes) [15:15:31] !log upload fifo-log-demux 0.4 to stretch-wikimedia T229414 [15:15:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:38] T229414: fifo-log-tailer: evergrowing memory usage - https://phabricator.wikimedia.org/T229414 [15:16:24] (03CR) 10jenkins-bot: Group0 to 1.34.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526699 (owner: 10Brennen Bearnes) [15:17:13] (03PS1) 10Elukey: cdh::hive: allow to render metastore's kerb option on client nodes [puppet] - 10https://gerrit.wikimedia.org/r/526701 (https://phabricator.wikimedia.org/T226698) [15:17:13] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.16 [15:17:15] (03CR) 10Alexandros Kosiaris: restrouter: Add kubernetes stanzas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526632 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [15:17:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:29] (03PS3) 10Alexandros Kosiaris: restrouter: Add kubernetes stanzas [puppet] - 10https://gerrit.wikimedia.org/r/526632 (https://phabricator.wikimedia.org/T223953) [15:18:49] (03PS2) 10CRusnov: netbox: Fix swift CA errors. [puppet] - 10https://gerrit.wikimedia.org/r/526562 [15:18:51] (03CR) 10Alexandros Kosiaris: [C: 04-1] "-1ed until we have the helmfile.d stuff ready" [puppet] - 10https://gerrit.wikimedia.org/r/526632 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [15:19:32] PROBLEM - toolschecker: showmount succeeds on a labs instance on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/nfs/secondary_cluster_showmount - 177 bytes in 0.023 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [15:19:42] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/17689/" [puppet] - 10https://gerrit.wikimedia.org/r/526701 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [15:20:23] (03CR) 10jerkins-bot: [V: 04-1] netbox: Fix swift CA errors. [puppet] - 10https://gerrit.wikimedia.org/r/526562 (owner: 10CRusnov) [15:22:44] !log cp-ats: upgrade fifo-log-demux to 0.4 and restart atsmtail@backend.service T229414 [15:22:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:51] T229414: fifo-log-tailer: evergrowing memory usage - https://phabricator.wikimedia.org/T229414 [15:23:11] 04Critical Alert for device cr1-codfw.wikimedia.org - Juniper alarm active [15:24:56] !log restarting jenkins for update [15:25:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:13] (03PS1) 10Elukey: role::analytics_test_cluster::client: add hive kerberos options [puppet] - 10https://gerrit.wikimedia.org/r/526703 (https://phabricator.wikimedia.org/T226698) [15:26:29] ACKNOWLEDGEMENT - HP RAID on ms-be2018 is CRITICAL: CRITICAL: Slot 3: Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T229438 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [15:27:13] mhh I thought I silenced that [15:32:08] (03PS2) 10Elukey: role::analytics_test_cluster::client: add hive kerberos options [puppet] - 10https://gerrit.wikimedia.org/r/526703 (https://phabricator.wikimedia.org/T226698) [15:33:29] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/17691/an-tool1006.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/526703 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [15:33:40] (03PS3) 10CRusnov: netbox: Fix various swift errors [puppet] - 10https://gerrit.wikimedia.org/r/526562 [15:34:36] (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/526693 (owner: 10CRusnov) [15:34:53] (03CR) 10CRusnov: [V: 03+2 C: 03+2] Update reqs with swift's requirements [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/526693 (owner: 10CRusnov) [15:36:00] (03PS3) 10Filippo Giunchedi: prometheus: track total number of puppet resources [puppet] - 10https://gerrit.wikimedia.org/r/526662 (https://phabricator.wikimedia.org/T229262) [15:37:39] RECOVERY - very high load average likely xfs on ms-be2018 is OK: OK - load average: 10.27, 10.21, 73.41 https://wikitech.wikimedia.org/wiki/Swift [15:38:03] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: track total number of puppet resources [puppet] - 10https://gerrit.wikimedia.org/r/526662 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [15:39:46] !log mforns@deploy1001 Started deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to eb2d9b005b26f6dddab2b59f1ba591f1758ec99f [15:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:01] (03CR) 10Volans: "one nit inline, how the compiler looks like?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526562 (owner: 10CRusnov) [15:41:15] (03PS4) 10CRusnov: netbox: Fix various swift errors [puppet] - 10https://gerrit.wikimedia.org/r/526562 [15:43:09] RECOVERY - Device not healthy -SMART- on db2063 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2063&var-datasource=codfw+prometheus/ops [15:44:19] (03CR) 10CRusnov: "> Patch Set 3:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526562 (owner: 10CRusnov) [15:45:32] !log restarting nfs service on labstore1004 [15:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:17] (03PS3) 10Jbond: netbox/puppet: An example of how we may intergrate netbox data with puppet [puppet] - 10https://gerrit.wikimedia.org/r/526664 (https://phabricator.wikimedia.org/T229397) [15:50:24] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/526562 (owner: 10CRusnov) [15:51:24] (03PS5) 10CRusnov: netbox: Fix various swift errors [puppet] - 10https://gerrit.wikimedia.org/r/526562 [15:51:26] (03CR) 10Holger Knust: "Oops, I left them in draft" (0356 comments) [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) (owner: 10Holger Knust) [15:51:54] (03PS4) 10BBlack: anycast recdns: use for all install-time DNS [puppet] - 10https://gerrit.wikimedia.org/r/526177 (https://phabricator.wikimedia.org/T228190) [15:51:56] (03PS2) 10BBlack: anycast recdns: set for canary api/appservers [puppet] - 10https://gerrit.wikimedia.org/r/526685 (https://phabricator.wikimedia.org/T228190) [15:52:05] (03CR) 10CRusnov: [C: 03+2] netbox: Fix various swift errors [puppet] - 10https://gerrit.wikimedia.org/r/526562 (owner: 10CRusnov) [15:52:22] (03PS1) 10BBlack: VCL: Send 421 on apparently-faulty H/2 coalesce [puppet] - 10https://gerrit.wikimedia.org/r/526714 (https://phabricator.wikimedia.org/T207340) [15:52:56] !log mforns@deploy1001 Finished deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to eb2d9b005b26f6dddab2b59f1ba591f1758ec99f (duration: 13m 09s) [15:53:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:34] (03PS2) 10BBlack: VCL: Send 421 on apparently-faulty H/2 coalesce [puppet] - 10https://gerrit.wikimedia.org/r/526714 (https://phabricator.wikimedia.org/T207340) [15:57:09] (03PS1) 10Giuseppe Lavagetto: run_ci_locally: bump container version [puppet] - 10https://gerrit.wikimedia.org/r/526716 [15:57:10] 04Critical Alert for device cr1-codfw.wikimedia.org - Juniper alarm active [15:57:50] (03CR) 10Giuseppe Lavagetto: [C: 03+2] run_ci_locally: bump container version [puppet] - 10https://gerrit.wikimedia.org/r/526716 (owner: 10Giuseppe Lavagetto) [15:59:44] (03CR) 10Ema: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/526714 (https://phabricator.wikimedia.org/T207340) (owner: 10BBlack) [16:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: That opportune time is upon us again. Time for a Morning SWAT (Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190731T1600). [16:00:04] raynor, kostajh, and raynor: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:23] hallo [16:00:26] hello [16:00:30] (03CR) 10Giuseppe Lavagetto: "> not even "Class[Mediawiki::Mwrepl]" is needed?" [puppet] - 10https://gerrit.wikimedia.org/r/525584 (https://phabricator.wikimedia.org/T228976) (owner: 10Giuseppe Lavagetto) [16:01:32] I can SWAT today! [16:01:54] (03PS6) 10Urbanecm: Enable editor gender surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526461 (https://phabricator.wikimedia.org/T227793) (owner: 10Pmiazga) [16:01:58] Urbanecm - that would be great [16:02:08] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526461 (https://phabricator.wikimedia.org/T227793) (owner: 10Pmiazga) [16:02:28] thx. [16:02:53] Urbanecm, also - there are 3 backports, merging backports takes some time, and most probably we will need some time to test the first config change [16:03:03] kostajh, I've +2'ed your backports, to give space for CI [16:03:08] maybe you can proceed and start merging kostajh backports [16:03:10] (03Merged) 10jenkins-bot: Enable editor gender surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526461 (https://phabricator.wikimedia.org/T227793) (owner: 10Pmiazga) [16:03:24] thanks Urbanecm [16:03:25] (03CR) 10jenkins-bot: Enable editor gender surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526461 (https://phabricator.wikimedia.org/T227793) (owner: 10Pmiazga) [16:03:35] oh, I was going to say that. thx. Urbanecm, you're way ahead :) [16:05:14] raynor, your config change is on mwdebug1002 [16:06:20] Urbanecm, thx, testing [16:06:45] (first config change, the enable editor gender survey) [16:07:42] (03PS3) 10BBlack: VCL: Send 421 on apparently-faulty H/2 coalesce [puppet] - 10https://gerrit.wikimedia.org/r/526714 (https://phabricator.wikimedia.org/T207340) [16:08:03] (03CR) 10BBlack: [C: 03+2] VCL: Send 421 on apparently-faulty H/2 coalesce [puppet] - 10https://gerrit.wikimedia.org/r/526714 (https://phabricator.wikimedia.org/T207340) (owner: 10BBlack) [16:10:53] raynor, your backport is also on mwdebug1002 [16:11:01] (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/526688) [16:11:09] awesome, thx [16:12:10] (03PS1) 10Alexandros Kosiaris: restrouter: Add helmfile stanzas [deployment-charts] - 10https://gerrit.wikimedia.org/r/526719 (https://phabricator.wikimedia.org/T223953) [16:12:45] !log Poweroff pc2010 for on-site maintenance T227552 [16:12:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:54] T227552: pc2010 possibly broken memory - https://phabricator.wikimedia.org/T227552 [16:13:12] (03PS6) 10Giuseppe Lavagetto: mediawiki: allow installing php7 only [puppet] - 10https://gerrit.wikimedia.org/r/525584 (https://phabricator.wikimedia.org/T228976) [16:13:14] (03PS1) 10Giuseppe Lavagetto: mediawiki: make mw1270 a php7-only application server [puppet] - 10https://gerrit.wikimedia.org/r/526720 [16:13:22] (03CR) 10Alexandros Kosiaris: "@ppchelko, couples of TODO questions inline, could you please have a look ?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/526719 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [16:14:21] !log deploying VCL for H/2 coalesce 421 responses - T207340 [16:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:29] T207340: Determine cause of upload.wikimedia.org requests routed to text-lb (404 Not Found) - https://phabricator.wikimedia.org/T207340 [16:16:16] (03CR) 10Alexandros Kosiaris: [C: 04-1] "helmfile stanzas uploaded on https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/526719/" [puppet] - 10https://gerrit.wikimedia.org/r/526632 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [16:17:10] 04Critical Alert for device cr1-codfw.wikimedia.org - Juniper alarm active [16:18:31] (03PS1) 10Pmiazga: Revert "Enable editor gender surveys" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526721 [16:18:45] Urbanecm, sorry to bother you, could you revert the gender surveys? [16:18:46] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/526721/ [16:18:49] I made a revert patch [16:18:54] looks like we're missing some translations ;/ [16:19:31] (03CR) 10Fsero: [C: 04-1] restrouter: Add helmfile stanzas (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526719 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [16:19:45] raynor, sure [16:19:53] thx [16:20:07] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526721 (owner: 10Pmiazga) [16:20:28] raynor, done&reverted on mwdebug1002 [16:20:33] does the backport work correctly? [16:20:45] (03CR) 10jenkins-bot: Revert "Enable editor gender surveys" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526721 (owner: 10Pmiazga) [16:21:10] checking the backport [16:21:30] kostajh, your backports are on mwdebug1002, please check [16:21:37] Urbanecm: thanks, doing [16:22:39] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526691 (https://phabricator.wikimedia.org/T220016) (owner: 10Pmiazga) [16:22:48] raynor, SWATting your other config change [16:23:04] (03CR) 10Fsero: [C: 04-1] "we can probably do a better job templating some things in helmfile to avoid some of the duplication present here." (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526719 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [16:23:18] Urbanecm: both are good, thank you [16:23:23] kostajh, syncing [16:24:11] 04Critical Alert for device cr1-codfw.wikimedia.org - Juniper alarm active got acknowledged [16:25:02] !log urbanecm@deploy1001 Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: SWAT: [[:gerrit:526612|Only set relevant title on mobile skin]] (T229263, T225659) (duration: 00m 56s) [16:25:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:11] T229263: Homepage: do not display tabs unrelated to the page - https://phabricator.wikimedia.org/T229263 [16:25:11] T225659: Homepage: different paths to User talk depending on origin tab - https://phabricator.wikimedia.org/T225659 [16:25:47] (03Merged) 10jenkins-bot: Enable MobileWebUIActionsTracking schema with 50% sampling rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526691 (https://phabricator.wikimedia.org/T220016) (owner: 10Pmiazga) [16:26:26] !log urbanecm@deploy1001 Synchronized php-1.34.0-wmf.16/extensions/GrowthExperiments/: SWAT: [[:gerrit:526610|Only set relevant title on mobile skin]] (T229263, T225659) (duration: 00m 51s) [16:26:29] (03CR) 10jenkins-bot: Enable MobileWebUIActionsTracking schema with 50% sampling rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526691 (https://phabricator.wikimedia.org/T220016) (owner: 10Pmiazga) [16:26:32] kostajh, synced [16:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:27:18] raynor, your other config change is on mwdebug1002 [16:27:18] Urbanecm, looks like backport is ok, but question [16:27:22] could you do also backport and config [16:27:29] wdym? [16:27:30] oh, looks like you're reading my mind [16:27:34] I'm processing changes in order [16:27:35] :-) [16:27:50] anyway, both backport and config change is on mwdebug1002 [16:27:53] the config was using the backport, it was much easier to test :) [16:28:10] instead of hacking around the code, we could just enable the feature that backport is fixing [16:28:40] :) [16:30:15] (03PS5) 10BBlack: Add cloudelastic LVS to DNS [dns] - 10https://gerrit.wikimedia.org/r/512924 (https://phabricator.wikimedia.org/T224324) (owner: 10EBernhardson) [16:30:52] (03CR) 10BBlack: [C: 03+2] Add cloudelastic LVS to DNS [dns] - 10https://gerrit.wikimedia.org/r/512924 (https://phabricator.wikimedia.org/T224324) (owner: 10EBernhardson) [16:30:55] lvcy [16:31:03] Urbanecm, please give me 5 more mins [16:31:10] raynor, sure [16:32:19] PROBLEM - Host pc2010.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:33:58] Urbanecm, looks good [16:34:06] can you merge the backport first [16:34:10] certainly [16:34:12] syncing backport [16:34:16] and then once backport gets deployed everywhere then the config [16:34:38] raynor, just to ensure, is your patch in wmf.16? [16:34:48] (if it needs to be there) [16:35:31] wmf.15 [16:35:36] and the patch is already in wmf.16 [16:35:49] great [16:36:05] so I only need to backport it to wmf.15 so we can enable the feature [16:36:07] just wanted to ensure group0 wikis won't be without the backport [16:36:09] syncing :) [16:37:52] !log urbanecm@deploy1001 Synchronized php-1.34.0-wmf.15/extensions/WikimediaEvents/: SWAT: [[:gerrit:526688|Improved MobileUIActions tracking schema]] (T220016) (duration: 00m 54s) [16:37:58] raynor, backport synced [16:37:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:59] RECOVERY - Host pc2010.mgmt is UP: PING OK - Packet loss = 0%, RTA = 37.08 ms [16:38:00] T220016: Create, and deploy working MobileWebUIActionsTracking schema - https://phabricator.wikimedia.org/T220016 [16:38:40] raynor, syncing the config change [16:38:44] \o/ [16:39:32] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:526691|Enable MobileWebUIActionsTracking schema with 50% sampling rate]] (T220016) (duration: 00m 58s) [16:39:36] raynor, synced! [16:39:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:42] ok, thx, checking once again :) [16:39:47] ok [16:46:37] (03CR) 10Ppchelko: [C: 04-1] "I think those 3 are all the TODOs?" (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526719 (https://phabricator.wikimedia.org/T223953) (owner: 10Alexandros Kosiaris) [16:51:46] Urbanecm, everything is good on my side, the new schema works perfectly. Thanks for SWATting it [16:52:01] raynor, happy to help! [16:52:08] !log Morning SWAT done [16:52:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:47] PROBLEM - MariaDB Slave IO: s6 on db2058 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [16:57:17] PROBLEM - Disk space on db2058 is CRITICAL: DISK CRITICAL - /srv is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2058&var-datasource=codfw+prometheus/ops [16:57:29] PROBLEM - MariaDB disk space on db2058 is CRITICAL: DISK CRITICAL - /srv is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [16:57:45] PROBLEM - Check systemd state on db2058 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:57:47] PROBLEM - MariaDB Slave SQL: s6 on db2058 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [16:58:07] PROBLEM - mysqld processes on db2058 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [17:00:06] mmmh having a look at db2058 [17:01:53] PROBLEM - puppet last run on ms-be2018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 14 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[parted-/dev/sdc] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [17:03:31] db2058 seems that has a doomed root partition (i/o error) depooling db2058, setting downtime for it and opening a task [17:07:26] need any help volans? [17:07:34] cdanis: nah, no need [17:07:45] it's codfw :D [17:07:49] yeah :) [17:07:56] some day that might matter ;)\ [17:08:35] PROBLEM - MariaDB Slave Lag: s6 on db2058 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [17:10:29] PROBLEM - HP RAID on db2058 is CRITICAL: I/O input error https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [17:10:30] ACKNOWLEDGEMENT - HP RAID on db2058 is CRITICAL: I/O input error nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T229449 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [17:12:19] PROBLEM - Device not healthy -SMART- on ms-be2018 is CRITICAL: cluster=swift device=cciss,13 instance=ms-be2018:9100 job=node site=codfw https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2018&var-datasource=codfw+prometheus/ops [17:13:16] (03PS1) 10Volans: db-eqiad.php: depool db2058, I/O error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526730 (https://phabricator.wikimedia.org/T229449) [17:14:14] cdanis: if you want to double check this patch and the config diff on dbctl ;) [17:14:55] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, 10Release-Engineering-Team (Development services): Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10faidon) a:03JAufrecht [17:14:59] RECOVERY - HP RAID on db2063 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:10, 1I:1:11, 1I:1:12, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [17:14:59] (03CR) 10CDanis: [C: 03+1] db-eqiad.php: depool db2058, I/O error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526730 (https://phabricator.wikimedia.org/T229449) (owner: 10Volans) [17:15:03] volans: +1 to dbctl diff as well [17:15:09] thanks [17:15:11] (03PS5) 10BBlack: anycast recdns: use for all install-time DNS [puppet] - 10https://gerrit.wikimedia.org/r/526177 (https://phabricator.wikimedia.org/T228190) [17:15:37] !log volans@cumin1001 dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8841', previous config saved to /var/cache/conftool/dbconfig/20190731-171536-volans.json [17:15:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:16:36] 10Operations, 10Analytics, 10SRE-Access-Requests: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10Nuria) @Mayakp.wiki hue has no ability to connect to druid (which is the data that powers both superset and turnilo), it can only connect to the hive datastore; To see sampling data... [17:17:22] 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T229449 (10Volans) Unable to run hpssacli utility due to I/O error, I've depooled the host on dbctl and from db-codfw.php with the above patch (shortly). I'll look into logs after that. [17:17:24] (03CR) 10Volans: [C: 03+2] db-eqiad.php: depool db2058, I/O error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526730 (https://phabricator.wikimedia.org/T229449) (owner: 10Volans) [17:17:48] (03CR) 10jenkins-bot: db-eqiad.php: depool db2058, I/O error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526730 (https://phabricator.wikimedia.org/T229449) (owner: 10Volans) [17:21:24] !log volans@deploy1001 Synchronized wmf-config/db-codfw.php: depool db2058, I/O error, T229449 (duration: 00m 54s) [17:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:31] T229449: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T229449 [17:24:06] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2063 - https://phabricator.wikimedia.org/T229302 (10Marostegui) 05Open→03Resolved All good! ` logicaldrive 1 (3.3 TB, RAID 1+0, OK) ` [17:26:10] 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T229449 (10Volans) host downtimed on icinga until Friday ~15UTC. chatted with @Marostegui and the host is due decommission, so no hurry, he'll take a look tomorrow. [17:26:51] 10Operations, 10Traffic: All files giving 404 for some people - https://phabricator.wikimedia.org/T229434 (10Aklapper) 05Resolved→03Invalid [17:27:27] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, 10Release-Engineering-Team (Development services): Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10greg) >>! In T226044#5380759, @BBlack wrote: > 2. Configure a new ph... [17:28:34] (03CR) 10Dzahn: [C: 04-1] "tested the query and noticed the Phabricator links don't work, they are missing the "T" before the numbers" [puppet] - 10https://gerrit.wikimedia.org/r/525449 (https://phabricator.wikimedia.org/T228575) (owner: 10Aklapper) [17:28:37] (03CR) 10Nuria: "nice, hopefully piece of mind for a while" [puppet] - 10https://gerrit.wikimedia.org/r/526613 (https://phabricator.wikimedia.org/T228620) (owner: 10Elukey) [17:31:46] (03CR) 10BBlack: [C: 03+2] anycast recdns: use for all install-time DNS [puppet] - 10https://gerrit.wikimedia.org/r/526177 (https://phabricator.wikimedia.org/T228190) (owner: 10BBlack) [17:32:31] (03CR) 10Dzahn: [C: 04-1] "https://phabricator.wikimedia.org/P8842" [puppet] - 10https://gerrit.wikimedia.org/r/525449 (https://phabricator.wikimedia.org/T228575) (owner: 10Aklapper) [17:33:16] db1114 dead over a day in icinga with no ack, known issue I assume? [17:34:16] hmmm https://phabricator.wikimedia.org/T221282#5245519 says it's some kind of test host [17:34:42] 10Operations, 10Traffic: All files giving 404 for some people - https://phabricator.wikimedia.org/T229434 (10CDanis) FTR this was likely another instance of {T207340} [17:35:02] confirmed in SAL. j.ynus called it "s1-test" [17:35:08] eh, "test-s1" [17:35:13] bblack: probably downtime expired [17:35:36] it has notifications disabled but we also downtime it, so it probably expired [17:37:16] (03CR) 10Dzahn: [C: 04-1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/525449 (https://phabricator.wikimedia.org/T228575) (owner: 10Aklapper) [17:39:38] 10Operations, 10Traffic, 10Patch-For-Review: Roll out Anycast RecDNS to more servers - https://phabricator.wikimedia.org/T228190 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts: ` ['cp1008.wikimedia.org'] ` The log can be found in `/var/log/wmf-auto-re... [17:40:02] jouncebot: next [17:40:02] In 0 hour(s) and 19 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190731T1800) [17:40:34] (03PS3) 10BBlack: anycast recdns: set for canary api/appservers [puppet] - 10https://gerrit.wikimedia.org/r/526685 (https://phabricator.wikimedia.org/T228190) [17:41:33] (03CR) 10Krinkle: vcl: add Access-Control-Allow-Origin to mobile redirects (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526627 (https://phabricator.wikimedia.org/T229385) (owner: 10Lucas Werkmeister (WMDE)) [17:41:53] 10Operations, 10ops-eqiad, 10DBA: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) [17:41:57] (03PS1) 10CDanis: dbctl: disable on half of canary hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526735 (https://phabricator.wikimedia.org/T229070) [17:43:39] (03CR) 10CDanis: [C: 03+2] dbctl: disable on half of canary hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526735 (https://phabricator.wikimedia.org/T229070) (owner: 10CDanis) [17:44:10] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): elastic1031 - PSU status critical - https://phabricator.wikimedia.org/T229453 (10Dzahn) [17:44:31] ACKNOWLEDGEMENT - IPMI Sensor Status on elastic1031 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical, Status = Critical] daniel_zahn https://phabricator.wikimedia.org/T229453 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [17:44:48] (03Merged) 10jenkins-bot: dbctl: disable on half of canary hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526735 (https://phabricator.wikimedia.org/T229070) (owner: 10CDanis) [17:46:29] (03CR) 10BBlack: [C: 03+2] anycast recdns: set for canary api/appservers [puppet] - 10https://gerrit.wikimedia.org/r/526685 (https://phabricator.wikimedia.org/T228190) (owner: 10BBlack) [17:46:51] !log cdanis@deploy1001 Synchronized wmf-config/etcd.php: I45b705c8 disable dbctl on half of canary hosts (duration: 00m 57s) [17:46:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:51] ACKNOWLEDGEMENT - Device not healthy -SMART- on ms-be2018 is CRITICAL: cluster=swift device=cciss,13 instance=ms-be2018:9100 job=node site=codfw daniel_zahn https://phabricator.wikimedia.org/T225630 https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2018&var-datasource=codfw+prometheus/ops [17:47:51] ACKNOWLEDGEMENT - puppet last run on ms-be2018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 29 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[parted-/dev/sdc] daniel_zahn https://phabricator.wikimedia.org/T225630 https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [17:50:29] (03CR) 10Jforrester: "recheck" [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) (owner: 10Holger Knust) [17:53:13] (03CR) 10jenkins-bot: dbctl: disable on half of canary hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526735 (https://phabricator.wikimedia.org/T229070) (owner: 10CDanis) [17:54:14] 10Operations, 10Traffic, 10Patch-For-Review: Roll out Anycast RecDNS to more servers - https://phabricator.wikimedia.org/T228190 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp1008.wikimedia.org'] ` Of which those **FAILED**: ` ['cp1008.wikimedia.org'] ` [17:59:08] bblack: failure because first puppet run did not finish probably. you can still do "[puppetmaster1001:~] $ sudo install-console cp1008.wikimedia.org [17:59:18] and then run puppet manually [18:00:05] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190731T1800) [18:00:48] yep, Unable to run wmf-auto-reimage-host: Failed to puppet_first_run [18:02:02] (03CR) 10Jforrester: "recheck" [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) (owner: 10Holger Knust) [18:04:10] PROBLEM - HHVM rendering on mw1278 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers [18:05:00] mutante: yeah I'll fix it up in a little [18:05:09] cp nodes never succeed first puppet runs, so nothing new there heh [18:05:23] bblack: ack! ok [18:05:26] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 76307 bytes in 0.353 second response time https://wikitech.wikimedia.org/wiki/Application_servers [18:09:24] (03PS1) 10Ottomata: Deploy refinery to an-tool1006 in Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/526743 (https://phabricator.wikimedia.org/T228291) [18:09:30] (03PS1) 10BBlack: H2 coalesce 421: fix cache::canary hieradata as well [puppet] - 10https://gerrit.wikimedia.org/r/526744 (https://phabricator.wikimedia.org/T207340) [18:09:49] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, 10Release-Engineering-Team (Development services): Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10JAufrecht) > Decide on a public pretty domainname for these more pub... [18:09:53] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Jclark-ctr) [18:10:13] (03CR) 10BBlack: [C: 03+2] H2 coalesce 421: fix cache::canary hieradata as well [puppet] - 10https://gerrit.wikimedia.org/r/526744 (https://phabricator.wikimedia.org/T207340) (owner: 10BBlack) [18:10:21] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10JAufrecht) a:05JAufrecht→03greg passing to Greg for step 2, configuring the blog and vanity name. [18:11:38] (03CR) 10Ottomata: [C: 03+2] Deploy refinery to an-tool1006 in Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/526743 (https://phabricator.wikimedia.org/T228291) (owner: 10Ottomata) [18:11:46] (03PS2) 10Ottomata: Deploy refinery to an-tool1006 in Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/526743 (https://phabricator.wikimedia.org/T228291) [18:11:48] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Deploy refinery to an-tool1006 in Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/526743 (https://phabricator.wikimedia.org/T228291) (owner: 10Ottomata) [18:13:44] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10BBlack) >>! In T226044#5381138, @JAufrecht wrote: > I think it should be techblog.wikimedia.org, because ev... [18:14:02] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10Krinkle) I tested the "Move Post" feature in Phame today - on the foresight that some people will likely tr... [18:19:24] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10BBlack) Replying to myself earlier: apparently they're datestamped URIs beginning with `/yyyy/mm/`, example... [18:20:32] (03PS2) 10Mforns: analytics::refinery::job::data_purge Migrate geoeditors timers to new script [puppet] - 10https://gerrit.wikimedia.org/r/519693 (https://phabricator.wikimedia.org/T226862) [18:24:09] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): elastic1031 - PSU status critical - https://phabricator.wikimedia.org/T229453 (10wiki_willy) a:03Jclark-ctr @Jclark-ctr - whenever you have a few min free, can you see if this is just a loose cable that maybe got accidentally pulled from the PDU... [18:24:11] (03PS2) 10Dzahn: phabricator weekly project changes email: List cookie-licked tasks [puppet] - 10https://gerrit.wikimedia.org/r/525449 (https://phabricator.wikimedia.org/T228575) (owner: 10Aklapper) [18:25:43] See https://phabricator.wikimedia.org/T228280#5381195, which has become a train-blocker: the grouped recent changes view no longer works with wmf.16, regardless of skin. [18:25:49] 10Operations, 10ops-eqiad, 10DBA: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10wiki_willy) a:03Cmjohnson [18:26:36] (03CR) 10Dzahn: [C: 03+2] phabricator weekly project changes email: List cookie-licked tasks [puppet] - 10https://gerrit.wikimedia.org/r/525449 (https://phabricator.wikimedia.org/T228575) (owner: 10Aklapper) [18:26:47] (03PS3) 10Dzahn: phabricator weekly project changes email: List cookie-licked tasks [puppet] - 10https://gerrit.wikimedia.org/r/525449 (https://phabricator.wikimedia.org/T228575) (owner: 10Aklapper) [18:27:13] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10Krinkle) >>! In T226044#5381193, @BBlack wrote: > [..] apparently they're datestamped URIs beginning with `... [18:27:31] (03CR) 10Ottomata: [C: 03+2] analytics::refinery::job::data_purge Migrate geoeditors timers to new script [puppet] - 10https://gerrit.wikimedia.org/r/519693 (https://phabricator.wikimedia.org/T226862) (owner: 10Mforns) [18:27:37] (03PS3) 10Ottomata: analytics::refinery::job::data_purge Migrate geoeditors timers to new script [puppet] - 10https://gerrit.wikimedia.org/r/519693 (https://phabricator.wikimedia.org/T226862) (owner: 10Mforns) [18:29:04] (03CR) 10Ottomata: [V: 03+2 C: 03+2] analytics::refinery::job::data_purge Migrate geoeditors timers to new script [puppet] - 10https://gerrit.wikimedia.org/r/519693 (https://phabricator.wikimedia.org/T226862) (owner: 10Mforns) [18:29:53] (03CR) 10Jforrester: "recheck" [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) (owner: 10Holger Knust) [18:32:32] 10Operations, 10Traffic: Roll out Anycast RecDNS to more servers - https://phabricator.wikimedia.org/T228190 (10BBlack) Rollout status update: things that are using anycast recdns resolv.conf in production as of 2019-07-31: * All hosts in edge DCs (esams, ulsfo, eqsin) * All cp edge cache hosts globally * All... [18:32:49] 10Operations, 10Traffic: Roll out Anycast RecDNS to more servers - https://phabricator.wikimedia.org/T228190 (10BBlack) [18:33:55] (03PS4) 10Dzahn: phabricator weekly project changes email: List cookie-licked tasks [puppet] - 10https://gerrit.wikimedia.org/r/525449 (https://phabricator.wikimedia.org/T228575) (owner: 10Aklapper) [18:38:38] @mainframe98 thanks for catching that. A fix is on the wy [18:40:09] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): elastic1031 - PSU status critical - https://phabricator.wikimedia.org/T229453 (10Jclark-ctr) inspected elastic1031 both PSU green inspected cables verified fully seated into recently replaced PDU. no physical faults found [18:44:28] PROBLEM - Host lvs5002 is DOWN: PING CRITICAL - Packet loss = 100% [18:44:48] RECOVERY - Host lvs5002 is UP: PING OK - Packet loss = 0%, RTA = 231.27 ms [18:45:02] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [18:46:36] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [18:46:58] lvs5002 wasn't me... [18:47:06] (neither was the other one) [18:49:01] !log phab1003 - manually running project_changes.sh to create mail to phabricator-reports@lists (T228575) [18:49:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:09] T228575: Decrease Phab task cookie licking (assignee field set for years without progress) - https://phabricator.wikimedia.org/T228575 [18:50:28] bblack: hmm.. both look like they could also be on the icinga server or networking [18:50:39] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10BBlack) Heh, apparently I can't even remember things I read and said before even when they're right above m... [18:50:50] yeah [19:00:04] brennen: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - American version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190731T1900). [19:00:52] train is presently blocked; sending status mail now. [19:03:19] (03CR) 10Aklapper: "Gosh. Thanks for fixing my typo and for merging, Daniel!" [puppet] - 10https://gerrit.wikimedia.org/r/525449 (https://phabricator.wikimedia.org/T228575) (owner: 10Aklapper) [19:08:11] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10Aklapper) >>! In T226044#5381170, @Krinkle wrote: > I tested the "Move Post" feature in Phame today [...] f... [19:25:00] (03CR) 10Ayounsi: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/526676 (https://phabricator.wikimedia.org/T226331) (owner: 10Ayounsi) [19:25:05] (03CR) 10Ayounsi: [C: 03+2] Netbox, enable Prometheus endpoint [puppet] - 10https://gerrit.wikimedia.org/r/526676 (https://phabricator.wikimedia.org/T226331) (owner: 10Ayounsi) [19:25:13] (03PS2) 10Ayounsi: Netbox, enable Prometheus endpoint [puppet] - 10https://gerrit.wikimedia.org/r/526676 (https://phabricator.wikimedia.org/T226331) [19:27:32] (03PS1) 10Smalyshev: Add L and M to allowed statement starts [puppet] - 10https://gerrit.wikimedia.org/r/526755 [19:31:32] (03PS1) 10Ejegg: CSP for banner preview: allow remind me later host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526756 (https://phabricator.wikimedia.org/T194019) [19:34:25] (03CR) 10Mepps: [C: 03+1] "I don't seem to have +2 permissions on this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526756 (https://phabricator.wikimedia.org/T194019) (owner: 10Ejegg) [19:34:32] (03PS1) 10Smalyshev: Support /entity/ and other Wikidata URLs for Commons [puppet] - 10https://gerrit.wikimedia.org/r/526757 (https://phabricator.wikimedia.org/T222321) [19:52:14] 10Operations, 10ops-codfw, 10decommission: Decommission db2042 - https://phabricator.wikimedia.org/T225090 (10Papaul) [19:53:56] jdlrobson: around to test https://gerrit.wikimedia.org/r/c/mediawiki/skins/MinervaNeue/+/526754 ? [19:55:01] (03PS1) 10Papaul: DNS: Remove DNS entires for db2042 [dns] - 10https://gerrit.wikimedia.org/r/526762 [19:56:26] brennen: sure [19:57:28] jdlrobson: thanks. i am instructed to get on mwdebug, one sec while i figure that out. [19:59:53] !log mbsantos@deploy1001 Started deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to 529c493 (T227124) [20:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:02] T227124: [Bug] Proton fails in CI and locally for node 6.x - https://phabricator.wikimedia.org/T227124 [20:00:04] cscott, arlolra, subbu, bearND, halfak, and accraze: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190731T2000). [20:00:12] no parsoid deploy today [20:01:37] !log mbsantos@deploy1001 Finished deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to 529c493 (T227124) (duration: 01m 43s) [20:01:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:00] jdlrobson: staged on mwdebug1002, please to check. [20:24:35] (and thank you. :) ) [20:27:58] (03PS3) 10Ayounsi: pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628 [20:28:26] (03CR) 10jerkins-bot: [V: 04-1] pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628 (owner: 10Ayounsi) [20:28:44] reminder wmf.16 is only group0 - so mediawiki.org for comparison i think. [20:30:48] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to 5eb9068 [20:30:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:49] brennen: on it [20:32:26] (03PS4) 10Ayounsi: pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628 [20:32:27] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to 5eb9068 (duration: 01m 39s) [20:32:30] brennen: yup that fixed it. Please sync away! [20:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:38] jdlrobson: thanks! [20:32:52] (03PS1) 10CDanis: dbctl: require commit messages [software/conftool] - 10https://gerrit.wikimedia.org/r/526774 [20:32:56] !log mobileapps deploy failed, investigating [20:33:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:43] 10Operations, 10Maps (Kartotherian), 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban), 10Wikimedia-Incident: Create test in spec.yaml for the kartotherian / geoshape service - https://phabricator.wikimedia.org/T217910 (10MSantos) a:03MSantos [20:34:53] 10Operations, 10ops-codfw, 10netops: update RE-S-X6-64G-S in cr[12]-codfw - https://phabricator.wikimedia.org/T226422 (10ayounsi) First JTAC suggestion is to re-seat the SCB. We didn't do that today as the doc wasn't clear if it could be done with the router online. JTAC is looking into the logs. [20:37:16] !log brennen@deploy1001 Synchronized php-1.34.0-wmf.16/skins/MinervaNeue/includes/MinervaHooks.php: [[gerrit:526754|Limit Recent Changes disable-table mode to Minerva skin]] T228280 (duration: 00m 56s) [20:37:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:37:23] T228280: Disable grouped results on RecentChanges page on mobile - https://phabricator.wikimedia.org/T228280 [20:38:30] jdlrobson: synched! [20:38:54] 10Operations, 10ops-codfw, 10netops: update RE-S-X6-64G-S in cr[12]-codfw - https://phabricator.wikimedia.org/T226422 (10ayounsi) Also noticed the following while looking at the doc again today: >4. Use the request chassis routing-engine master switch command to make the Routing Engine RE-S-X6-64G (RE1) the... [20:39:46] (03CR) 10CDanis: "Manuel, does this UI seem good to you?" [software/conftool] - 10https://gerrit.wikimedia.org/r/526774 (owner: 10CDanis) [20:41:09] jdlrobson: https://phabricator.wikimedia.org/T220741 is still open and shows up on the train blocker task - is this good? [20:41:22] (and/or am i being confused by phabricator?) [20:42:21] @brennen yeh that's not a train blocker now it's swatted. I've removed it from the subtasks. [20:42:23] thank you! [20:42:32] cheers, much appreciated. [20:43:05] going to go ahead and roll forward to group1. [20:44:19] (03PS1) 10Brennen Bearnes: group1 wikis to 1.34.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526776 [20:44:21] (03CR) 10Brennen Bearnes: [C: 03+2] group1 wikis to 1.34.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526776 (owner: 10Brennen Bearnes) [20:45:27] (03Merged) 10jenkins-bot: group1 wikis to 1.34.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526776 (owner: 10Brennen Bearnes) [20:46:20] 10Operations, 10observability, 10Patch-For-Review, 10Performance-Team (Radar): Provision >= 50% of statsd/Graphite-only metrics in Prometheus - https://phabricator.wikimedia.org/T205870 (10colewhite) a:03colewhite [20:46:26] (03CR) 10jenkins-bot: group1 wikis to 1.34.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526776 (owner: 10Brennen Bearnes) [20:47:17] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16 [20:47:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:12] !log brennen@deploy1001 Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s) [20:48:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:51:00] PROBLEM - puppet last run on archiva1001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [20:51:42] PROBLEM - termbox codfw on termbox.svc.codfw.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [20:53:05] (03PS1) 10Brennen Bearnes: Revert "group1 wikis to 1.34.0-wmf.16" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526778 [20:53:07] (03CR) 10Brennen Bearnes: [C: 03+2] Revert "group1 wikis to 1.34.0-wmf.16" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526778 (owner: 10Brennen Bearnes) [20:54:07] (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.34.0-wmf.16" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526778 (owner: 10Brennen Bearnes) [20:54:22] (03CR) 10jenkins-bot: Revert "group1 wikis to 1.34.0-wmf.16" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526778 (owner: 10Brennen Bearnes) [20:55:25] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: Revert group1 back to 1.34.0-wmf.15 [20:55:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:56:36] RECOVERY - termbox codfw on termbox.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [20:57:56] (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler1001/17695/netflow1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/524628 (owner: 10Ayounsi) [20:57:58] (03CR) 10Ayounsi: [C: 03+2] pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628 (owner: 10Ayounsi) [20:58:07] (03PS5) 10Ayounsi: pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628 [20:58:20] !log brennen@deploy1001 Synchronized php: Revert group1 back to 1.34.0-wmf.15 (duration: 00m 53s) [20:58:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:59] brennen: i'm going to try that mobileapps deploy once more. train all done for the time being? [21:06:11] mdholloway: just had to roll back group1 after hitting T229482 [21:06:12] T229482: PHP Warning: Wikibase\Lib\Store\Sql\WikiPageEntityRevisionLookup::getEntityRevision: Entity not loaded - https://phabricator.wikimedia.org/T229482 [21:09:23] (03PS1) 10Cwhite: logstash: update statsd exporter mappings and use exporter [puppet] - 10https://gerrit.wikimedia.org/r/526782 (https://phabricator.wikimedia.org/T205870) [21:09:23] brennen: yep. that one looks familiar to me, actually. i'll put the mobileapps aside for a minute. [21:19:00] RECOVERY - puppet last run on archiva1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [21:20:09] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10cchen) [21:21:00] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10cchen) a:05cchen→03Nuria [21:21:13] brennen: i think this warning was already getting logged at some relatively infrequent rate before, right? [21:21:32] mdholloway: i believe so [21:25:17] there was some relevant discussion on a ticket i'll dig up; i think this warning is because of a scenario that Shouldn't Happen in Wikidata but is expected in WikibaseMediaInfo / Structured Data on Commons [21:26:57] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10Nuria) Approved on my end once employment is verified. [21:27:25] mdholloway: cool - i am now second-guessing myself, and am going to make sure that in fact i reported the error that we were seeing blow up the most. [21:27:52] yeah, see https://phabricator.wikimedia.org/T229279#5375192 and https://phabricator.wikimedia.org/T229279#5375330 [21:28:41] (specifically the part of the former addressed to alaa_wmde) [21:28:47] (03PS1) 10Ayounsi: pmacct, remove tee plugin [puppet] - 10https://gerrit.wikimedia.org/r/526785 [21:29:22] actually, i think it's getting hit before because of a change in how CaptionsPanel is initialized [21:29:31] in WikibaseMediaInfo [21:29:40] (03CR) 10Ayounsi: [C: 03+2] pmacct, remove tee plugin [puppet] - 10https://gerrit.wikimedia.org/r/526785 (owner: 10Ayounsi) [21:30:17] i think you were right to report it and halt the train, but in this case the warning is misleading since the behavior is expected [21:30:21] (03CR) 10Ayounsi: "Actually, the tee plugin is not compatible with the Kafka plugin." [puppet] - 10https://gerrit.wikimedia.org/r/524628 (owner: 10Ayounsi) [21:30:35] 10Operations, 10Electron-PDFs, 10OfflineContentGenerator, 10Core Platform Team Legacy (Designing), 10Services (designing): Improve stability and maintainability of our browser-based PDF render service - https://phabricator.wikimedia.org/T172815 (10Pchelolo) 05Open→03Invalid After switching to #proton... [21:30:38] 10Operations, 10Electron-PDFs, 10Readers-Web-Backlog (Tracking), 10Services (done): pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922 (10Pchelolo) [21:30:41] just need to figure out the best way to isolate it to cases we care about [21:31:57] 10Operations, 10Analytics, 10ChangeProp, 10Core Platform Team, and 2 others: Consider the possibility of separating ChangeProp and JobQueue on Kafka level - https://phabricator.wikimedia.org/T199431 (10Pchelolo) [21:32:56] !log set cr1-eqiad's netflow target port to 2100 (nfacctd) [21:33:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:23] 10Operations, 10Analytics, 10ChangeProp, 10Core Platform Team Legacy (Designing), and 2 others: Separate dev Change-Prop from production Kafka cluster - https://phabricator.wikimedia.org/T199427 (10Pchelolo) 05Open→03Declined We don't really have/use change-prop in dev cluster anymore and I don't think... [21:34:55] (03CR) 10Cwhite: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/526611 (https://phabricator.wikimedia.org/T229357) (owner: 10Elukey) [21:41:29] thanks for context, mdholloway. [21:41:46] PROBLEM - Check systemd state on elastic2051 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:42:34] (03PS3) 10CRusnov: netbox: Add configuration and timers for csv dumps [puppet] - 10https://gerrit.wikimedia.org/r/521313 [21:54:42] RECOVERY - Check systemd state on elastic2051 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:11:50] 10Operations, 10RESTBase, 10Core Platform Team Legacy (Later), 10Services (later): Provide production jessie image with node 4.2; use this for service-runner build command - https://phabricator.wikimedia.org/T123237 (10Pchelolo) 05Open→03Invalid We're updating to k8s and not running node 4 anymore. [22:11:54] 10Blocked-on-Operations, 10Operations, 10RESTBase, 10Services: Switch RESTBase to use Node.js 4.2 - https://phabricator.wikimedia.org/T107762 (10Pchelolo) [22:19:08] (03CR) 10Cwhite: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/17696/" [puppet] - 10https://gerrit.wikimedia.org/r/526782 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [22:27:25] (03PS1) 10BBlack: anycast recdns: enable for codfw clients [puppet] - 10https://gerrit.wikimedia.org/r/526788 (https://phabricator.wikimedia.org/T228190) [22:35:43] 10Operations, 10netops: Instability of the Level3 link between cr2-eqiad and cr2-esams - https://phabricator.wikimedia.org/T228827 (10ayounsi) > This circuit has been impacted by multiple planned maintenances and higher-level network events. They have all been different troubles that have been restored so the... [22:40:26] (03CR) 10Ayounsi: [C: 03+2] Bird anycast, add monitoring for anycast-healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/520643 (https://phabricator.wikimedia.org/T186550) (owner: 10Ayounsi) [22:41:39] 10Operations, 10RESTBase, 10Traffic, 10Core Platform Team Legacy (Later), and 2 others: Split slash decoding from general percent normalization in Varnish VCL - https://phabricator.wikimedia.org/T127387 (10Pchelolo) 05Open→03Resolved a:03Pchelolo doesn't seem to be any more movement or things to be d... [22:41:43] 10Operations, 10Mobile-Content-Service, 10RESTBase, 10Reading-Infrastructure-Team-Backlog, and 2 others: Varnish not purging RESTBase URIs - https://phabricator.wikimedia.org/T127370 (10Pchelolo) [22:41:53] (03PS3) 10Ayounsi: Bird anycast, add monitoring for anycast-healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/520643 (https://phabricator.wikimedia.org/T186550) [22:55:58] (03CR) 10Eevans: [C: 03+1] Add kask session storage configuration. Use only on testwiki, [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519432 (https://phabricator.wikimedia.org/T222099) (owner: 10BPirkle) [22:57:29] (03PS6) 10BPirkle: Add kask session storage configuration. Use only on testwiki, [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519432 (https://phabricator.wikimedia.org/T222099) [22:58:53] (03PS1) 10Ayounsi: Pmacct, add tag2 for traffic direction [puppet] - 10https://gerrit.wikimedia.org/r/526789 [23:00:04] MaxSem, RoanKattouw, and Niharika: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190731T2300). [23:00:04] bpirkle: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:15] I'm here [23:00:45] I can SWAT today! [23:00:53] Thanks Urbanecm! [23:01:01] yw Niharika [23:01:23] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519432 (https://phabricator.wikimedia.org/T222099) (owner: 10BPirkle) [23:02:25] (03Merged) 10jenkins-bot: Add kask session storage configuration. Use only on testwiki, [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519432 (https://phabricator.wikimedia.org/T222099) (owner: 10BPirkle) [23:02:40] (03CR) 10jenkins-bot: Add kask session storage configuration. Use only on testwiki, [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519432 (https://phabricator.wikimedia.org/T222099) (owner: 10BPirkle) [23:02:46] bpirkle, your patch is on mwdebug1002, please test [23:03:57] "hhvm mwdebug1002 - NOTICE Notice: Use of undefined constant ‘caches’ - assumed '‘caches’' in /srv/mediawiki/wmf-config/CommonSettings.php on line 517" is in logstash [23:04:00] bpirkle, ^^ [23:04:13] hrm [23:04:26] https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2019.07.31/hhvm?id=AWxKRm_NQFnOyvY0y2Hy&_g=h@44136fa [23:05:25] it seems you used a bad version of apostrophe [23:05:29] bad type of ’ instead of ' ? [23:05:38] Ahhh, ok [23:05:45] * Urbanecm is fixing that [23:05:58] (03CR) 10Ayounsi: [C: 03+2] Pmacct, add tag2 for traffic direction [puppet] - 10https://gerrit.wikimedia.org/r/526789 (owner: 10Ayounsi) [23:06:08] fixed version is on mwdebug1002 [23:06:52] thank you [23:08:03] yw bpirkle [23:08:03] (03PS1) 10Urbanecm: Fix a typo in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526790 (https://phabricator.wikimedia.org/T222099) [23:08:31] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526790 (https://phabricator.wikimedia.org/T222099) (owner: 10Urbanecm) [23:09:04] bpirkle, let me know if it works [23:09:34] (03Merged) 10jenkins-bot: Fix a typo in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526790 (https://phabricator.wikimedia.org/T222099) (owner: 10Urbanecm) [23:09:49] (03CR) 10jenkins-bot: Fix a typo in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526790 (https://phabricator.wikimedia.org/T222099) (owner: 10Urbanecm) [23:10:49] (03PS1) 10Ayounsi: check_anycast_healthchecker, add sudo bird rights [puppet] - 10https://gerrit.wikimedia.org/r/526791 [23:10:51] Looks good [23:11:04] thanks bpirkle, deploying [23:12:52] !log urbanecm@deploy1001 Synchronized wmf-config/: SWAT: Add kask session storage configuration. Use only on testwiki, (ede989e, 862df8d, T222099) (duration: 00m 56s) [23:12:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:13:00] T222099: Staging release of RESTBagOStuff using Kask - https://phabricator.wikimedia.org/T222099 [23:13:02] bpirkle, synced [23:13:47] Thank you [23:13:59] you're welcome [23:14:23] (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler1001/17698/dns2002.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/526791 (owner: 10Ayounsi) [23:14:32] !log Evening SWAT done [23:14:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:35] OK, finally going to retry that mobileapps deployment now [23:17:59] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@db795ec]: Update mobileapps to b8c4166 [23:18:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:20] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@db795ec]: Update mobileapps to b8c4166 (duration: 04m 21s) [23:22:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:26] (03PS1) 10Urbanecm: urbanecm's dotfiles: gitconfig: Add push-for-review, use SSH for pushing [puppet] - 10https://gerrit.wikimedia.org/r/526796 [23:29:45] RECOVERY - PHP opcache health on mwdebug1002 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [23:30:13] RECOVERY - PHP opcache health on mwdebug2001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [23:40:51] PROBLEM - Check if anycast-healthchecker and all configured threads are running on lithium is CRITICAL: NRPE: Command check_anycast_healthchecker not defined https://wikitech.wikimedia.org/wiki/Anycast_recursive_DNS%23Anycast_healthchecker_not_running [23:51:16] 10Operations, 10Analytics, 10SRE-Access-Requests: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10Nuria) @Mayakp.wiki please give a try to jupyter and let me see on my end what is needed for access [23:58:35] (03PS7) 10Jeena Huneidi: Add Parsoid chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/525481 (https://phabricator.wikimedia.org/T228909)