[00:04:34] PROBLEM - Time elapsed since the last kafka event processed by purged on cp1075 is CRITICAL: cluster=cache_text instance=cp1075 job=purged site=eqiad topic=codfw.resource-purge https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1075 [00:04:46] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [00:05:24] PROBLEM - Time elapsed since the last kafka event processed by purged on cp1081 is CRITICAL: cluster=cache_text instance=cp1081 job=purged site=eqiad topic=codfw.resource-purge https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1081 [00:06:42] PROBLEM - Time elapsed since the last kafka event processed by purged on cp2041 is CRITICAL: cluster=cache_text instance=cp2041 job=purged site=codfw topic=codfw.resource-purge https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2041 [00:08:32] RECOVERY - Time elapsed since the last kafka event processed by purged on cp1075 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1075 [00:08:42] RECOVERY - Time elapsed since the last kafka event processed by purged on cp2041 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2041 [00:09:20] RECOVERY - Time elapsed since the last kafka event processed by purged on cp1081 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1081 [00:10:18] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 53 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [00:16:14] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [00:50:20] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 52 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [01:02:14] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [01:07:48] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [01:12:19] (03PS1) 10Dave Pifke: xhgui: scrape Prometheus exporter [puppet] - 10https://gerrit.wikimedia.org/r/622447 (https://phabricator.wikimedia.org/T256039) [01:13:46] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [01:24:51] (03PS2) 10Dave Pifke: xhgui: enable database access for admins [puppet] - 10https://gerrit.wikimedia.org/r/621100 (https://phabricator.wikimedia.org/T260640) [01:31:40] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 53 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [01:41:29] (03PS3) 10Dave Pifke: xhgui: enable database access for admins [puppet] - 10https://gerrit.wikimedia.org/r/621100 (https://phabricator.wikimedia.org/T260640) [01:49:32] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [02:09:20] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [02:15:22] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [02:21:35] 10Puppet, 10Beta-Cluster-Infrastructure, 10Developer Productivity: puppetdb on deployment-puppetdb03 keeps getting OOMKilled - https://phabricator.wikimedia.org/T248041 (10Krinkle) [02:45:02] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [02:56:58] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [03:06:52] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [03:12:50] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [03:22:52] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [03:28:50] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [03:57:18] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:03:20] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:12:38] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:24:34] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:25:10] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:37:08] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:50:34] PROBLEM - Backup freshness on backup1001 is CRITICAL: All failures: 1 (dbprov1003), Fresh: 103 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [04:56:24] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [05:02:24] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [05:03:36] !log Update db1135 and db1114 after MCR changes [05:03:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:08:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12354 and previous config saved to /var/cache/conftool/dbconfig/20200826-050849-marostegui.json [05:08:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:09:42] PROBLEM - MariaDB Replica Lag: s1 on db2112 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 86356.21 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [05:09:52] PROBLEM - MariaDB Replica Lag: s1 on db2103 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 86364.61 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [05:09:53] ^expired downtime [05:10:10] downtimed all those again [05:12:22] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [05:22:53] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [05:23:05] (03CR) 10Marostegui: "This looks good as a first approach indeed. There's work to do on the proxies (or DNS rr maybe to start with) and with pt-kill and all tha" [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [05:23:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12355 and previous config saved to /var/cache/conftool/dbconfig/20200826-052355-marostegui.json [05:23:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:24:16] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [05:27:45] (03PS1) 10Marostegui: db2125: Reenable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622468 (https://phabricator.wikimedia.org/T260670) [05:28:24] (03CR) 10Marostegui: [C: 03+2] db2125: Reenable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622468 (https://phabricator.wikimedia.org/T260670) (owner: 10Marostegui) [05:28:52] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [05:33:46] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12356 and previous config saved to /var/cache/conftool/dbconfig/20200826-053345-marostegui.json [05:33:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:44:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12357 and previous config saved to /var/cache/conftool/dbconfig/20200826-054409-marostegui.json [05:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:45:57] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1091 for MCR change', diff saved to https://phabricator.wikimedia.org/P12358 and previous config saved to /var/cache/conftool/dbconfig/20200826-054557-marostegui.json [05:45:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:46:36] (03PS1) 10Marostegui: db1091: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622471 [05:47:03] (03CR) 10Marostegui: [C: 03+2] db1091: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622471 (owner: 10Marostegui) [05:48:44] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [05:59:24] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:04:54] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:10:18] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:16:01] (03PS23) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) [06:17:18] (03PS10) 10Ryan Kemper: elasticsearch: Let spicerack handle wait for all write queues to clear [cookbooks] - 10https://gerrit.wikimedia.org/r/603731 (https://phabricator.wikimedia.org/T261239) [06:18:20] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: Let spicerack handle wait for all write queues to clear [cookbooks] - 10https://gerrit.wikimedia.org/r/603731 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [06:18:24] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [06:18:53] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:24:26] (03PS24) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) [06:26:24] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [06:29:15] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:32:09] (03PS25) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) [06:34:33] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:34:46] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [06:40:13] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 49 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:45:19] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:48:38] (03CR) 10Jcrespo: "Brooke: We have a global solution for port assignment that will work for all hosts (not only wikireplicas). I think it would be cleaner to" [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [06:49:48] (03PS26) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) [06:50:26] (03CR) 10Ryan Kemper: "Oops, just noticed I never hit reply for my draft comments for the Aug 20 CR from gehel, so those old draft comments are also tacked onto " (038 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [06:51:17] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:53:51] (03PS1) 10Nikerabbit: Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622473 (https://phabricator.wikimedia.org/T131300) [06:58:27] (03CR) 10Jcrespo: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [07:01:02] (03CR) 10Jcrespo: "Do I remove the section_ports file (it will be mocked on CI and provided by puppet on production), change format to the new CI and merge (" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [07:02:05] 10Operations, 10Traffic, 10conftool, 10serviceops, 10Patch-For-Review: confd's watch functionality appears to be partially broken when interacting with etcd 3.x - https://phabricator.wikimedia.org/T260889 (10Joe) [07:09:11] 10Operations, 10Traffic, 10conftool, 10serviceops, 10Patch-For-Review: confd's watch functionality appears to be partially broken when interacting with etcd 3.x - https://phabricator.wikimedia.org/T260889 (10Joe) After more digging, it seems the problem always existed, and it has to do with how confd wat... [07:22:02] (03CR) 10Marostegui: "elukey this would also need to be applied to labsdb1012" [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [07:23:20] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10elukey) @Cmjohnson we can use the standard recipe for misc nodes, these should have 4 disks so I'd say `partman/standard.cfg partman/raid10-4dev.cfg` ? [07:23:35] (03PS13) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [07:24:00] (03CR) 10Marostegui: [C: 03+1] Tidy up import ordering using isort. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622349 (owner: 10Kormat) [07:26:58] 10Operations, 10video2commons: video-redis-buster.video.eqiad.wmflabs:6379. Connection refused. - https://phabricator.wikimedia.org/T261245 (10Jidanni) Nobody is monitoring https://github.com/toolforge/video2commons/ , and the problem is a wmflabs machine. [07:27:16] (03PS1) 10Elukey: Add partman recipe for an-test-worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/622476 (https://phabricator.wikimedia.org/T255520) [07:28:32] (03PS3) 10Jcrespo: wmfmariadbpy: Add unit tests for resolve method [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620319 [07:28:55] 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10decommission-hardware: Decommission analytics10[28-31,33-41] - https://phabricator.wikimedia.org/T227485 (10elukey) Updating this task - we are setting up the new hadoop test cluster, once done I'll clear all puppet config and set this task as actionable. [07:29:14] (03CR) 10Elukey: [C: 03+2] Add partman recipe for an-test-worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/622476 (https://phabricator.wikimedia.org/T255520) (owner: 10Elukey) [07:44:21] (03PS1) 10Elukey: Add partman recipe for an-test-(master|coord) hosts [puppet] - 10https://gerrit.wikimedia.org/r/622479 (https://phabricator.wikimedia.org/T255518) [07:45:31] (03CR) 10Elukey: [C: 03+2] Add partman recipe for an-test-(master|coord) hosts [puppet] - 10https://gerrit.wikimedia.org/r/622479 (https://phabricator.wikimedia.org/T255518) (owner: 10Elukey) [07:51:34] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 104 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [07:53:22] (03PS1) 10Filippo Giunchedi: hieradata: move swiftrepl to codfw [puppet] - 10https://gerrit.wikimedia.org/r/622522 [07:55:49] PROBLEM - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is CRITICAL: 122 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37 [07:57:25] (03PS12) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [07:57:27] (03PS6) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [07:57:37] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-worker1002.eqiad.wmnet... [08:00:52] (03PS7) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [08:09:03] (03CR) 10Marostegui: "One comment, inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:10:20] (03PS8) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [08:11:19] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:14:32] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [08:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:38] (03CR) 10Jcrespo: "One solution is more flexible and the other more resistent to typos. Ideas?" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:16:17] (03PS9) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [08:16:38] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [08:16:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:11] 10Operations, 10Maps: OSM Replication failed at eqiad and codfw - https://phabricator.wikimedia.org/T237228 (10MSantos) [08:19:48] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler1002/24664/db1090.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:21:02] 10Operations, 10observability: Figure out switchover steps for mwlog hosts - https://phabricator.wikimedia.org/T261274 (10fgiunchedi) [08:23:29] (03CR) 10Marostegui: mariadb: Apply the list of ports to the core::multiinstance class (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:23:43] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Switch openstack::serverpackages::rocky::stretch to component/ceph [puppet] - 10https://gerrit.wikimedia.org/r/622340 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [08:24:11] (03CR) 10Jcrespo: "> Patch Set 9:" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:25:19] (03CR) 10Jcrespo: "BTW, I tried finding the allowed units on mysql documentation and couldn't find them, could you help me?" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:25:39] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] dynamicproxy: update Content-Security-Policy-Report-Only header [puppet] - 10https://gerrit.wikimedia.org/r/622435 (owner: 10BryanDavis) [08:29:01] (03CR) 10Kormat: [C: 03+2] Tidy up import ordering using isort. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622349 (owner: 10Kormat) [08:30:11] (03Merged) 10jenkins-bot: Tidy up import ordering using isort. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622349 (owner: 10Kormat) [08:31:16] (03CR) 10Marostegui: "> Patch Set 9:" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:34:22] (03PS1) 10KartikMistry: Add --notify-age-in-days option to notify users before draft purge [puppet] - 10https://gerrit.wikimedia.org/r/622528 (https://phabricator.wikimedia.org/T261189) [08:34:54] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:35:22] (03PS2) 10Arturo Borrero Gonzalez: dynamicproxy: Remove X-Wikimedia-Debug error page overrides [puppet] - 10https://gerrit.wikimedia.org/r/622436 (owner: 10BryanDavis) [08:36:10] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] dynamicproxy: Remove X-Wikimedia-Debug error page overrides [puppet] - 10https://gerrit.wikimedia.org/r/622436 (owner: 10BryanDavis) [08:38:14] (03PS2) 10Arturo Borrero Gonzalez: dynamicproxy: Update proxy_redirect to use $host to limit scheme rewrites [puppet] - 10https://gerrit.wikimedia.org/r/622437 (owner: 10BryanDavis) [08:38:22] (03CR) 10Kormat: "> Patch Set 12:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [08:38:35] (03PS1) 10Elukey: install_server: set stretch for an-test-* hosts [puppet] - 10https://gerrit.wikimedia.org/r/622529 (https://phabricator.wikimedia.org/T255520) [08:38:39] (03PS2) 10KartikMistry: Add --notify-age-in-days option to notify users before draft purge [puppet] - 10https://gerrit.wikimedia.org/r/622528 (https://phabricator.wikimedia.org/T261189) [08:39:05] (03CR) 10jerkins-bot: [V: 04-1] install_server: set stretch for an-test-* hosts [puppet] - 10https://gerrit.wikimedia.org/r/622529 (https://phabricator.wikimedia.org/T255520) (owner: 10Elukey) [08:39:19] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] dynamicproxy: Update proxy_redirect to use $host to limit scheme rewrites [puppet] - 10https://gerrit.wikimedia.org/r/622437 (owner: 10BryanDavis) [08:39:48] (03CR) 10Kormat: [C: 03+1] wmfmariadbpy: Add unit tests for resolve method [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620319 (owner: 10Jcrespo) [08:40:25] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-worker1002.eqiad.wmnet'] ` and were **ALL** successful. [08:40:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1110 replication broken', diff saved to https://phabricator.wikimedia.org/P12360 and previous config saved to /var/cache/conftool/dbconfig/20200826-084044-marostegui.json [08:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:48] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:42:08] ^ that should recover soon, I have depooled the broken slave [08:42:25] created https://phabricator.wikimedia.org/T261276 [08:42:31] kormat: ^ [08:42:33] (03CR) 10Arturo Borrero Gonzalez: dynamicproxy: serve default /robots.txt and /favicon.ico for Toolforge (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622237 (https://phabricator.wikimedia.org/T251628) (owner: 10BryanDavis) [08:42:39] (03PS4) 10Arturo Borrero Gonzalez: dynamicproxy: serve default /robots.txt and /favicon.ico for Toolforge [puppet] - 10https://gerrit.wikimedia.org/r/622237 (https://phabricator.wikimedia.org/T251628) (owner: 10BryanDavis) [08:43:02] marostegui: oh you trying to blame this one on me? :) [08:43:08] nope, just a heads up [08:43:12] :) [08:43:20] I would totally blame kormat [08:43:38] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:43:40] oh damn, he's back [08:43:47] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] dynamicproxy: serve default /robots.txt and /favicon.ico for Toolforge [puppet] - 10https://gerrit.wikimedia.org/r/622237 (https://phabricator.wikimedia.org/T251628) (owner: 10BryanDavis) [08:43:52] ahahhah [08:44:29] 10Operations, 10serviceops: Create a gateway in kubernetes for the execution of our "lambdas" - https://phabricator.wikimedia.org/T261277 (10Joe) [08:46:43] (03PS4) 10Arturo Borrero Gonzalez: dynamicproxy: allow service workers in Toolforge [puppet] - 10https://gerrit.wikimedia.org/r/622238 (https://phabricator.wikimedia.org/T158216) (owner: 10BryanDavis) [08:49:04] (03PS2) 10Elukey: install_server: set stretch for an-test-* hosts [puppet] - 10https://gerrit.wikimedia.org/r/622529 (https://phabricator.wikimedia.org/T255520) [08:49:53] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 58 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:50:17] (03PS10) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [08:52:12] 10Operations, 10serviceops: Create a gateway in kubernetes for the execution of our "lambdas" - https://phabricator.wikimedia.org/T261277 (10Joe) [08:53:31] (03CR) 10Elukey: [C: 03+2] install_server: set stretch for an-test-* hosts [puppet] - 10https://gerrit.wikimedia.org/r/622529 (https://phabricator.wikimedia.org/T255520) (owner: 10Elukey) [08:54:52] (03CR) 10Jcrespo: [C: 04-1] mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:55:03] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:55:35] 10Operations, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Joe) >>! In T260330#6408193, @tstarling wrote: > Has anyone got an idea for giving the HMAC key to the server without allowing the comma... [08:56:54] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler1001/24665/db1090.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:57:51] (03PS11) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [08:58:06] (03CR) 10Jcrespo: "Checking regex." [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:58:32] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-worker1002.eqiad.wmnet... [09:00:28] !log re-enable IPv6 BGP to Init7 in knams [09:00:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:20] (03CR) 10Jcrespo: "Error: Evaluation Error: Error while evaluating a Function Call, 's2' buffer pool: '185T' is not in the right format." [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:02:30] (03PS12) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [09:05:03] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [09:05:40] (03PS13) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [09:06:49] PROBLEM - Thanos query has high latency for range queries on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query [09:08:17] RECOVERY - Thanos query has high latency for range queries on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query [09:09:09] that was me ^ [09:10:45] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [09:14:35] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [09:14:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:00] (03PS14) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [09:17:11] (03CR) 10Jcrespo: "I will have to update the description, but please have a first look. For now this only touches cumin hosts." [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:18:48] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:18:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:06] (03CR) 10Kormat: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:21:34] (03PS15) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [09:22:32] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:23:36] (03CR) 10Jcrespo: "Question?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:24:14] (03PS16) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [09:24:35] (03CR) 10Kormat: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:24:42] (03CR) 10Jcrespo: "Server class, of course, is unused at the moment, but will it be needed by the db hosts?" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:27:16] (03CR) 10Jcrespo: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:29:07] (03PS17) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [09:29:36] (03CR) 10Jcrespo: "This will fail, but please approve interface before implementation:" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:31:52] (03CR) 10Kormat: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:36:50] (03CR) 10Jcrespo: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:37:39] (03CR) 10Kormat: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:38:00] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [09:39:46] (03CR) 10Jcrespo: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:39:51] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-worker1002.eqiad.wmnet'] ` and were **ALL** successful. [09:40:59] (03CR) 10Jcrespo: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:41:08] (03PS1) 10Matthias Mullie: MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622222 [09:41:22] (03PS1) 10Matthias Mullie: MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622223 [09:42:27] (03CR) 10Jcrespo: "@Kormat See here for the example of why we should move the functionality to a module." [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [09:43:04] (03PS18) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [09:43:44] 10Operations, 10observability: Grafana/Thanos serves 503s for long-time-window requests - https://phabricator.wikimedia.org/T260241 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi I'm resolving this task as the original issue is mitigated and queries work (albeit slow), please see the followup on further... [09:44:06] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:44:56] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 53 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [09:47:09] (03PS19) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [09:48:17] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:49:31] (03CR) 10Filippo Giunchedi: "Overall LGTM, if we can get a sample PCC run on a subsection of hosts to validate all is well then I'm +1." [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn) [09:49:56] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [09:51:05] (03PS20) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [09:52:04] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [09:53:16] (03CR) 10Filippo Giunchedi: prometheus: add apache2 es-exporter config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621597 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite) [09:56:41] (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 11: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/601429 (https://phabricator.wikimedia.org/T229584) (owner: 10Dave Pifke) [09:56:48] (03PS19) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) [09:57:22] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10elukey) The an-test-worker1002 host is now running Stretch (not ready for Buster yet) with the following lvs volumes: ` elukey@an-test-w... [09:58:46] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-worker1001.eqiad.wmnet... [10:03:36] (03PS20) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) [10:12:28] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-worker1003.eqiad.wmnet... [10:13:51] (03PS21) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [10:14:50] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [10:14:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:53] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [10:14:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:14] (03PS21) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) [10:15:31] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10elukey) @Cmjohnson added the config for partman and os, I think that we are missing DNS and then we can reimage. [10:16:10] (03CR) 10Jbond: "looks good see inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:17:55] (03CR) 10Ayounsi: [C: 03+1] "tested and works as expected, thanks :)" [software/homer] - 10https://gerrit.wikimedia.org/r/622356 (https://phabricator.wikimedia.org/T260769) (owner: 10Volans) [10:18:56] (03PS13) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [10:19:39] (03CR) 10Jcrespo: "Applied one, thinking about the other, as we wouldn't have a nicer error message?" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:20:32] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:22:42] (03PS22) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [10:25:18] (03CR) 10Volans: [C: 03+2] junos: colorize configuration diff [software/homer] - 10https://gerrit.wikimedia.org/r/622356 (https://phabricator.wikimedia.org/T260769) (owner: 10Volans) [10:25:42] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:26:06] (03CR) 10Kormat: "One minor comment left." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:26:37] (03Merged) 10jenkins-bot: junos: colorize configuration diff [software/homer] - 10https://gerrit.wikimedia.org/r/622356 (https://phabricator.wikimedia.org/T260769) (owner: 10Volans) [10:28:25] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [10:28:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:05] (03CR) 10Jcrespo: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:30:37] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:30:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:03] (03CR) 10Jcrespo: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:34:08] (03PS23) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [10:34:42] (03CR) 10Kormat: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:36:02] (03PS24) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [10:37:37] (03PS14) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [10:38:38] (03CR) 10Kormat: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:39:35] (03CR) 10Jcrespo: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:40:19] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-worker1001.eqiad.wmnet'] ` and were **ALL** successful. [10:41:05] (03PS15) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [10:42:26] (03PS25) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [10:44:07] (03CR) 10Kormat: "This looks good. A PCC run would be good to see." [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:45:47] (03PS16) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [10:45:51] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:48:15] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:51:50] (03PS17) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [10:53:01] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:53:45] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:53:48] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-worker1003.eqiad.wmnet'] ` and were **ALL** successful. [10:55:37] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler1002/24669/cumin1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:56:50] (03CR) 10Kormat: "> Patch Set 25:" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:57:52] (03CR) 10Jcrespo: "> Patch Set 25:" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [10:59:12] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: How many deployers does it take to do European mid-day backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200826T1100). [11:00:04] Lucas_WMDE and matthiasmullie: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:10] o/ [11:01:06] matthiasmullie: is it okay if we start with your backports? [11:01:12] (03PS26) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [11:01:41] (03CR) 10Jcrespo: "I am thinking ruby cannot sort hashes, but it can short its keya?" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:03:38] (03CR) 10Matthias Mullie: [C: 04-1] MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622223 (owner: 10Matthias Mullie) [11:03:42] (03CR) 10Matthias Mullie: [C: 04-1] MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622222 (owner: 10Matthias Mullie) [11:03:57] (03CR) 10Jcrespo: "It's not that, any ideas https://puppet-compiler.wmflabs.org/compiler1002/24670/cumin1001.eqiad.wmnet/fulldiff.html" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:04:02] Lucas_WMDE: having trouble with mine atm :D [11:04:10] need to make a minor fix first [11:04:34] ok [11:04:43] then I’ll sync my config change now :) [11:04:47] (just the first one, the second one probably won’t go in today) [11:05:25] (03PS2) 10Lucas Werkmeister (WMDE): Enable propagateChangeVisibility for testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599344 (owner: 10Hoo man) [11:06:03] (03PS27) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [11:06:23] (03CR) 10Jcrespo: "Let'see if the problem is sorting." [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:06:56] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Enable propagateChangeVisibility for testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599344 (owner: 10Hoo man) [11:07:45] (03Merged) 10jenkins-bot: Enable propagateChangeVisibility for testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599344 (owner: 10Hoo man) [11:08:11] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:08:16] (03CR) 10Jcrespo: "Doesn't work either: https://puppet-compiler.wmflabs.org/compiler1003/24671/cumin1001.eqiad.wmnet/fulldiff.html" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:08:30] testing on mwdebug1001 [11:10:29] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:12:02] (03Abandoned) 10Matthias Mullie: MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622222 (owner: 10Matthias Mullie) [11:12:24] (03CR) 10Jcrespo: "Found the issue- ruby variables vs template/puppet variables." [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:13:00] (03Restored) 10Matthias Mullie: MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622222 (owner: 10Matthias Mullie) [11:13:04] (03PS28) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [11:14:16] (03PS2) 10Matthias Mullie: MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622222 [11:15:16] (03CR) 10Jcrespo: "I think starting on some ruby version it is ordered by key by default: https://puppet-compiler.wmflabs.org/compiler1003/24672/cumin1001.eq" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:15:20] (03PS2) 10Matthias Mullie: MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622223 [11:15:50] Lucas_WMDE: I'll start to +2 my 2 patches - should take another half hour or so before they pass CI; ok? [11:16:32] (03PS18) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [11:17:02] I plan to run script on mwmaint1002 (which is cron job, but need to test with new parameter and did dry-run already). Do I need to put it in Deployment calendar or running during Backport/Config Window is OK? [11:17:09] Lucas_WMDE: ^ [11:17:23] (03CR) 10Matthias Mullie: [C: 03+2] MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622223 (owner: 10Matthias Mullie) [11:17:30] (03CR) 10Matthias Mullie: [C: 03+2] MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622222 (owner: 10Matthias Mullie) [11:18:06] kart_: no matter what other people say, unless it is trivial, consider at least !log-ging it here when starting it [11:18:10] kart_: probably okay to run here if it doesn’t take too long [11:18:19] matthiasmullie: ok [11:18:34] (I’m currently trying to figure out why my config change doesn’t seem to be doing what it should be doing) [11:18:42] :D [11:19:00] no rush :) [11:19:02] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:19:40] (03CR) 10Jcrespo: "BTW, we don't accept the T because we are just replicating valid mysql formats options, as per Manuel's link: https://dev.mysql.com/doc/re" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:20:35] jynus: sure sure. Will do that. [11:20:54] Lucas_WMDE: should take few minutes. [11:21:28] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:25:27] syncing my config change now [11:25:40] doesn’t quite work as expected but doesn’t break anything either [11:25:44] so I think syncing it is better than reverting [11:26:37] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:599344|Enable propagateChangeVisibility for testwikidata]], part 1 (duration: 01m 19s) [11:26:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:11] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler1003/24673/db1090.eqiad.wmnet/fulldiff.html" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:27:33] uh, did someone change the sudo/sudoers config on mwdebug1001 [11:27:51] it’s showing me the “with great power comes great responsibility” lecture and then asking for a password [11:28:09] (03CR) 10Jcrespo: "So this is mostly a proof of concept, to check the previous patch works in all cases." [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:28:17] both when I run `sudo` manually, and from the `scap sync-file` [11:29:05] Lucas_WMDE: I never used sudo with scap [11:29:08] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:29:18] it’s scap that runs sudo [11:29:36] scap indicates that it tried to run `/usr/bin/sudo -u root -- /usr/local/sbin/check-and-restart-php php7.2-fpm 100` [11:29:39] ah then I don't know :/ [11:29:47] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:599344|Enable propagateChangeVisibility for testwikidata]], part 2 (duration: 01m 03s) [11:29:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:12] @seen Niharika [11:30:13] it only seems to affect mwdebug1001, so I’ll do another `scap pull` there [11:30:13] hauskatze: Last time I saw Niharika they were changing the nickname to Guest8262 and Guest8262 is still in the channel #wikimedia-dev at 8/20/2020 7:46:22 AM (6d3h43m50s ago) [11:30:19] and hope that has more or less the same effect [11:31:40] oh, I can’t sudo on mwmaint1002 either [11:31:49] and on deploy1001? [11:31:51] uhhh [11:32:11] unless the config says that you can only run certain commands with NOPASSWD, and `who am i` is not one of the commands? :D [11:32:16] I don’t think I tried that before [11:32:24] but previously `sudo` just worked without password as far as I recall [11:33:07] (03CR) 10Jbond: [C: 03+1] "LGTM and happy to review and refactor CR's" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:33:17] I guess we’ll continue with the deployment window for now… [11:33:24] matthiasmullie’s backports are still going through CI [11:33:31] kart_: ready for the maintenance script? [11:34:33] I'm about to run a maintenance script against a live site for the first time. (right? i know!) [11:34:36] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:34:54] What should the !log message look like that I post here? [11:35:09] (03PS22) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) [11:35:15] Lucas_WMDE: OK. Starting it soon. [11:35:20] duesen: \o/ [11:35:46] duesen: I think usually people copy their entire prompt / command line [11:36:01] so, !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript script.php thewiki argument1 # task number [11:36:06] something like that [11:37:09] ah right! I was wondering whether the bot would pick up my name from irc, but just copying the prompt from the command line makes sense! [11:37:30] heh, I also had the “oh it’s that simple” moment when I saw Amir1 do it the first time :D [11:38:10] duesen: I'm running a script. Can you wait for sometime? :) [11:38:43] It is simple, the point is to be ready when things go south [11:38:46] RECOVERY - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is OK: (C)100 gt (W)80 gt 75.25 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37 [11:38:49] :D [11:39:16] !log Started manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189) [11:39:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:20] T261189: Test user notifications for old unpublished cx draft purge script - https://phabricator.wikimedia.org/T261189 [11:42:42] (03PS1) 10Jbond: pki: add new PKI role [puppet] - 10https://gerrit.wikimedia.org/r/622549 (https://phabricator.wikimedia.org/T259117) [11:43:21] (03Merged) 10jenkins-bot: MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622223 (owner: 10Matthias Mullie) [11:43:24] (03Merged) 10jenkins-bot: MediaSearchQueryBuilder should support keyword only queries [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622222 (owner: 10Matthias Mullie) [11:44:09] (03CR) 10Jbond: [C: 03+2] pki: add new PKI role [puppet] - 10https://gerrit.wikimedia.org/r/622549 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [11:45:18] kart_: you think our scripts will interfere? I'm running the script in analysis mode for now, for investigating T205936. I'll check in with you before running it in write mode. [11:45:18] duesen: how urgent is your maintenance script? I think I’d prefer to deploy matthiasmullie’s backports first, to ensure they don’t get missed [11:45:18] T205936: Unable to view some pages due to fatal RevisionAccessException: "Failed to load data blob from tt" - https://phabricator.wikimedia.org/T205936 [11:46:22] (I should be quick - should be over in a couple of mins) [11:46:44] duesen: won't interfere it seems. [11:47:04] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:47:16] deployment calendar is empty after this, so your maintenance script wouldn’t conflict with any other slot [11:47:59] (03PS1) 10DCausse: [wdqs] cleanup the munge path when doing a data-reload [cookbooks] - 10https://gerrit.wikimedia.org/r/622550 [11:48:45] (03CR) 10Jforrester: [C: 03+1] Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622473 (https://phabricator.wikimedia.org/T131300) (owner: 10Nikerabbit) [11:50:02] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:50:50] kart_: can you ping me when it's ok for me to scap? [11:52:45] (03CR) 10Kormat: "> Patch Set 28:" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [11:53:20] matthiasmullie: OK. [11:53:26] matthiasmullie: all done. Go ahead. [11:53:28] !log Finished manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189) [11:53:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:32] T261189: Test user notifications for old unpublished cx draft purge script - https://phabricator.wikimedia.org/T261189 [11:53:35] thanks [11:55:02] uh, i see no entry for s3 in https://noc.wikimedia.org/dbconfig/eqiad.json [11:55:03] !log mlitn@deploy1001 Synchronized php-1.36.0-wmf.5/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 08s) [11:55:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:06] what am i missing? [11:55:23] duesen: s3 is called: default [11:55:38] ah :) [11:55:58] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:56:43] !log mlitn@deploy1001 Synchronized php-1.36.0-wmf.6/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 00s) [11:56:44] * duesen is feeling like a noob [11:56:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:07] duesen: I'm done - the floor is yours [11:57:09] sql oswiki --host db1078 [11:57:09] Error: [11:57:09] oswiki is not a valid DBURL [11:57:15] ^--- what am i doing wrong? [11:58:50] !log kormat@cumin1001 dbctl commit (dc=all): 'Start repooling db1110 T261276', diff saved to https://phabricator.wikimedia.org/P12361 and previous config saved to /var/cache/conftool/dbconfig/20200826-115850-kormat.json [11:58:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:54] T261276: Replication broken on db1110 - https://phabricator.wikimedia.org/T261276 [11:59:00] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:00:01] duesen: sql oswiki works for me [12:00:22] duesen: you don't pass host [12:00:30] duesen: and sql --wiki=oswiki --host db1078 too [12:00:37] (don't need to) [12:00:37] i htink i know why. i'm on mwdebug, not mwmaint [12:00:43] that explains it :) [12:00:46] i guess i'm talking to the wrong "sql" [12:00:46] ha [12:03:03] 10Operations, 10ops-eqiad, 10netops: eqiad row D switch fabric recabling - https://phabricator.wikimedia.org/T256112 (10ayounsi) [12:06:56] 10Puppet, 10Scap, 10Wikimedia-production-error: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261304 (10Lucas_Werkmeister_WMDE) [12:07:12] created that task ^ for the scap / sudo error I got [12:10:48] (03PS1) 10Jbond: pki: add cfssl and pki profile to pki role [puppet] - 10https://gerrit.wikimedia.org/r/622552 (https://phabricator.wikimedia.org/T259117) [12:11:46] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 53 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:11:52] (03CR) 10Jbond: [C: 03+2] pki: add cfssl and pki profile to pki role [puppet] - 10https://gerrit.wikimedia.org/r/622552 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [12:12:50] !log upgrade nagios-nrpe-server to 2.15-2 on jessie hosts - T261198 [12:12:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:54] T261198: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 [12:14:29] (03PS1) 10Jbond: pki: Use correct profile name [puppet] - 10https://gerrit.wikimedia.org/r/622553 (https://phabricator.wikimedia.org/T259117) [12:15:20] I’m about to leave for a bit – duesen, when you’re done, can you !log that the deployment window is over? [12:15:37] (03CR) 10Jbond: [C: 03+2] pki: Use correct profile name [puppet] - 10https://gerrit.wikimedia.org/r/622553 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [12:15:50] (03PS1) 10Kormat: Remove unused 'charset' attribute. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622554 [12:17:14] (03PS1) 10Kormat: Move WMFMariaDB.__init__() to the top of the class. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622555 [12:19:45] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:20:17] (03PS1) 10Jbond: cfssl: manage base config dir [puppet] - 10https://gerrit.wikimedia.org/r/622556 [12:21:00] !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling db1110 @ 20% T261276', diff saved to https://phabricator.wikimedia.org/P12362 and previous config saved to /var/cache/conftool/dbconfig/20200826-122059-kormat.json [12:21:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:04] T261276: Replication broken on db1110 - https://phabricator.wikimedia.org/T261276 [12:21:37] (03CR) 10Jbond: [C: 03+2] cfssl: manage base config dir [puppet] - 10https://gerrit.wikimedia.org/r/622556 (owner: 10Jbond) [12:22:13] 10Operations, 10ops-eqiad, 10netops: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10ayounsi) [12:24:33] 10Operations, 10ops-eqiad, 10netops: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10Marostegui) Any expected downtime for row D hosts? [12:25:39] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:26:47] 10Operations, 10ops-eqiad, 10netops: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10ayounsi) [12:27:33] (03PS29) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [12:27:50] Lucas_WMDE: I'm still fiddeling. screen is not my friend. But yea, will do. [12:27:58] 10Operations, 10ops-eqiad, 10netops: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10ayounsi) @Marostegui I'm going to send an email, but partially yes, this means a hard downtime of ~1h for all hosts in D4, see the full list on https://netbox.wikimedia.org/dcim/devices/?q=... [12:31:01] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:31:16] (03CR) 10Jcrespo: "It orders it in lexicographical order, not the order in which they were defined: https://puppet-compiler.wmflabs.org/compiler1002/24675/cu" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [12:32:42] (03CR) 10Kormat: [C: 03+1] "> Patch Set 29:" [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [12:34:15] (03PS30) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [12:34:30] 10Operations, 10ops-eqiad, 10netops: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10Marostegui) Thank you! [12:34:37] (03CR) 10Kormat: [C: 03+1] mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [12:36:47] 10Operations, 10observability, 10User-fgiunchedi: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi [12:38:49] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:39:56] (03PS1) 10Filippo Giunchedi: prometheus: minimal default alerts for Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/622557 (https://phabricator.wikimedia.org/T258948) [12:47:01] !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling db1110 @ 30% T261276', diff saved to https://phabricator.wikimedia.org/P12363 and previous config saved to /var/cache/conftool/dbconfig/20200826-124700-kormat.json [12:47:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:05] T261276: Replication broken on db1110 - https://phabricator.wikimedia.org/T261276 [12:48:54] (03PS1) 10Filippo Giunchedi: prometheus: add 'alertmanagers' setting to all instances [puppet] - 10https://gerrit.wikimedia.org/r/622558 (https://phabricator.wikimedia.org/T258948) [12:50:01] (03CR) 10jerkins-bot: [V: 04-1] prometheus: add 'alertmanagers' setting to all instances [puppet] - 10https://gerrit.wikimedia.org/r/622558 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [12:50:26] (03PS1) 10Jbond: pki: add managed root CSR certificate [puppet] - 10https://gerrit.wikimedia.org/r/622559 (https://phabricator.wikimedia.org/T259117) [12:51:29] (03PS2) 10Filippo Giunchedi: prometheus: add 'alertmanagers' setting to all instances [puppet] - 10https://gerrit.wikimedia.org/r/622558 (https://phabricator.wikimedia.org/T258948) [12:51:57] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10elukey) Hosts reimaged, and the status on netbox is "Staged". @Cmjohnson please check if there is anything left to do, if not let's close :) [12:51:58] (03CR) 10Jbond: [C: 03+2] pki: add managed root CSR certificate [puppet] - 10https://gerrit.wikimedia.org/r/622559 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [12:52:36] (03CR) 10jerkins-bot: [V: 04-1] prometheus: add 'alertmanagers' setting to all instances [puppet] - 10https://gerrit.wikimedia.org/r/622558 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [12:55:53] (03PS1) 10Jbond: pki::server: use correct key paramater [puppet] - 10https://gerrit.wikimedia.org/r/622560 (https://phabricator.wikimedia.org/T259117) [12:56:23] (03CR) 10Jcrespo: [C: 03+2] mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [12:59:11] (03PS2) 10Jbond: pki::server: use correct key paramater [puppet] - 10https://gerrit.wikimedia.org/r/622560 (https://phabricator.wikimedia.org/T259117) [13:00:20] (03PS2) 10Filippo Giunchedi: prometheus: minimal default alerts for Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/622557 (https://phabricator.wikimedia.org/T258948) [13:00:22] (03PS3) 10Filippo Giunchedi: prometheus: add 'alertmanagers' setting to all instances [puppet] - 10https://gerrit.wikimedia.org/r/622558 (https://phabricator.wikimedia.org/T258948) [13:00:24] (03PS1) 10Filippo Giunchedi: prometheus: move beta to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/622561 (https://phabricator.wikimedia.org/T258948) [13:00:42] (03CR) 10Jbond: [C: 03+2] pki::server: use correct key paramater [puppet] - 10https://gerrit.wikimedia.org/r/622560 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [13:02:04] (03CR) 10Kormat: [C: 03+1] wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [13:02:11] (03CR) 10Vgutierrez: [C: 03+2] vcl: Use synthetic warning for DHE-RSA-AES128-SHA pageviews [puppet] - 10https://gerrit.wikimedia.org/r/622321 (https://phabricator.wikimedia.org/T258405) (owner: 10Vgutierrez) [13:02:23] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:02:49] (03CR) 10Jcrespo: [C: 03+2] wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [13:03:10] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1003/24677/prometheus1003.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/622558 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [13:04:00] (03CR) 10Jcrespo: [C: 03+2] wmfmariadbpy: Add unit tests for resolve method [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620319 (owner: 10Jcrespo) [13:05:45] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:05:57] RECOVERY - MariaDB Replica Lag: s1 on db2103 is OK: OK slave_sql_lag Replication lag: 0.18 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:06:01] !log serve a synthetic warn page to DHE-RSA-AES128-SHA users - T258405 [13:06:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:05] T258405: Deprecate TLSv1.2 weak ciphersuites - https://phabricator.wikimedia.org/T258405 [13:06:13] RECOVERY - MariaDB Replica Lag: s1 on db2112 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:07:36] !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling db1110 @ 50% T261276', diff saved to https://phabricator.wikimedia.org/P12364 and previous config saved to /var/cache/conftool/dbconfig/20200826-130735-kormat.json [13:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:39] T261276: Replication broken on db1110 - https://phabricator.wikimedia.org/T261276 [13:08:19] 10Operations, 10Release-Engineering-Team, 10Scap, 10Security: scap displays an error message about mwdebug1001 after synching a file in /private - https://phabricator.wikimedia.org/T261167 (10thcipriani) [13:09:21] 10Puppet, 10Scap, 10Wikimedia-production-error: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261304 (10thcipriani) [13:09:51] 10Operations, 10Release-Engineering-Team, 10Scap, 10Security: scap displays an error message about mwdebug1001 after synching a file in /private - https://phabricator.wikimedia.org/T261167 (10thcipriani) [13:09:56] 10Operations, 10Release-Engineering-Team, 10Scap, 10Security: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261167 (10thcipriani) [13:11:43] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:11:46] (03PS5) 10JMeybohm: sre.discovery: Refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) [13:11:52] ok, i'm finally ready to run my script to mark some broken revisions as "known bad" per T205936. This affects 12 revisions on oswiki, and 21 revisions on dewiki. All of them more than ten years old. [13:11:53] T205936: Unable to view some pages due to fatal RevisionAccessException: "Failed to load data blob from tt" - https://phabricator.wikimedia.org/T205936 [13:12:10] (03CR) 10JMeybohm: sre.discovery: Refactor (0312 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [13:12:19] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:12:30] Anyone not happy with me running that now? [13:13:40] 10Operations, 10Release-Engineering-Team, 10Scap, 10Security: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261167 (10Urbanecm) @jijiki Can we un-break this? Deployers are expected to get confused about this, potentionally halting the window to... [13:14:45] (03CR) 10Abijeet Patro: [C: 03+1] Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622473 (https://phabricator.wikimedia.org/T131300) (owner: 10Nikerabbit) [13:16:47] !log daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php oswiki --mark T205936 --revisions - < ~/T205936-oswiki-20090309200000.ids # marking known bad revisions for T205936 [13:16:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:13] (03PS1) 10Elukey: Remove druid-public-overlord records since they are not used [dns] - 10https://gerrit.wikimedia.org/r/622563 [13:17:33] !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling db1110 @ 75% T261276', diff saved to https://phabricator.wikimedia.org/P12365 and previous config saved to /var/cache/conftool/dbconfig/20200826-131732-kormat.json [13:17:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:36] T261276: Replication broken on db1110 - https://phabricator.wikimedia.org/T261276 [13:18:00] !log daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php dewiki --mark T205936 --revisions - < ~/T205936-dewiki-20050512070000.ids # marking known bad revisions for T205936 [13:18:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:03] T205936: Unable to view some pages due to fatal RevisionAccessException: "Failed to load data blob from tt" - https://phabricator.wikimedia.org/T205936 [13:20:03] (03PS1) 10Jbond: pki::server: Manage root certificate [puppet] - 10https://gerrit.wikimedia.org/r/622564 (https://phabricator.wikimedia.org/T259117) [13:21:15] (03CR) 10Jbond: [C: 03+2] pki::server: Manage root certificate [puppet] - 10https://gerrit.wikimedia.org/r/622564 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [13:23:24] (03CR) 10Hashar: "That is heavily tied to how the CI image spawns mysqld. The definition is at https://gerrit.wikimedia.org/r/plugins/gitiles/integration/co" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621762 (https://phabricator.wikimedia.org/T261098) (owner: 10Hashar) [13:23:34] 10Operations, 10Release-Engineering-Team, 10Scap, 10Security: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261167 (10Lucas_Werkmeister_WMDE) I suspect this task also means that changes can’t actually be tested on mwdebug1001 at the moment. Whe... [13:25:01] (03PS1) 10Jbond: pki::server: fix secret location [puppet] - 10https://gerrit.wikimedia.org/r/622565 [13:26:03] (03CR) 10Jbond: [C: 03+2] pki::server: fix secret location [puppet] - 10https://gerrit.wikimedia.org/r/622565 (owner: 10Jbond) [13:31:39] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 52 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:33:00] (03CR) 10Kormat: "> See proposed patches https://gerrit.wikimedia.org/r/c/operations/puppet/+/620722 and https://gerrit.wikimedia.org/r/c/operations/puppet/" [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [13:34:20] (03PS1) 10Filippo Giunchedi: icinga: redirect to https if not already proxied [puppet] - 10https://gerrit.wikimedia.org/r/622566 (https://phabricator.wikimedia.org/T258948) [13:34:43] (03PS2) 10Filippo Giunchedi: icinga: redirect to https if not already proxied [puppet] - 10https://gerrit.wikimedia.org/r/622566 (https://phabricator.wikimedia.org/T258948) [13:36:02] (03PS1) 10Elukey: Add new schema[12]00[34] vms to the related LVS endpoints [puppet] - 10https://gerrit.wikimedia.org/r/622567 (https://phabricator.wikimedia.org/T255026) [13:36:13] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:37:35] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:37:37] (03CR) 10Ottomata: [C: 03+1] "+1, but I don't have a comprehensive off the top of my head understanding of all the moving parts needed to make sure LVS works. We shoul" [puppet] - 10https://gerrit.wikimedia.org/r/622567 (https://phabricator.wikimedia.org/T255026) (owner: 10Elukey) [13:37:53] !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling db1110 @ 100% T261276', diff saved to https://phabricator.wikimedia.org/P12366 and previous config saved to /var/cache/conftool/dbconfig/20200826-133753-kormat.json [13:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:57] T261276: Replication broken on db1110 - https://phabricator.wikimedia.org/T261276 [13:41:08] (03PS1) 10Filippo Giunchedi: pontoon: latest additions to observability stack [puppet] - 10https://gerrit.wikimedia.org/r/622568 [13:41:51] (03PS1) 10Kormat: mariadb: Simplify mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622569 [13:42:15] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Simplify mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622569 (owner: 10Kormat) [13:43:27] (03PS2) 10Kormat: mariadb: Simplify mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622569 [13:44:01] (03PS3) 10Kormat: mariadb: Simplify mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622569 [13:44:46] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Simplify mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622569 (owner: 10Kormat) [13:45:38] (03PS4) 10Kormat: mariadb: Simplify mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622569 [13:45:40] 10Operations, 10Goal: Verify ATS handling of DNS TTLs - https://phabricator.wikimedia.org/T261312 (10ema) [13:45:48] 10Operations, 10Goal: Verify ATS handling of DNS TTLs - https://phabricator.wikimedia.org/T261312 (10ema) p:05Triage→03Medium [13:45:55] 10Operations, 10Traffic, 10Goal: Verify ATS handling of DNS TTLs - https://phabricator.wikimedia.org/T261312 (10ema) [13:46:07] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 53 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:47:50] (03PS1) 10Ema: ATS: trace hostdb handling of TTLs [puppet] - 10https://gerrit.wikimedia.org/r/622570 (https://phabricator.wikimedia.org/T261312) [13:48:08] (03CR) 10Volans: "I didn't check the dnspython API calls, but in general looks sane. Few comments inline" (0310 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [14:00:00] (03PS5) 10Kormat: WIP mariadb: Reduce duplication. [puppet] - 10https://gerrit.wikimedia.org/r/622569 [14:00:12] 10Operations, 10ops-eqiad, 10netops: eqiad row D switch fabric recabling - https://phabricator.wikimedia.org/T256112 (10ayounsi) [14:01:02] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10Trizek-WMF) [14:03:31] 10Operations, 10ops-codfw, 10decommission-hardware, 10serviceops: decommission mc2028.codfw.wmnet - https://phabricator.wikimedia.org/T261168 (10Papaul) ` [edit interfaces interface-range vlan-private1-c-codfw] - member ge-1/0/6; [edit interfaces interface-range disabled] member ge-5/0/23 { ... } +... [14:04:36] 10Operations, 10ops-codfw, 10decommission-hardware, 10serviceops: decommission mc2028.codfw.wmnet - https://phabricator.wikimedia.org/T261168 (10Papaul) [14:05:52] 10Operations, 10ops-eqiad, 10DBA, 10netops, 10User-Kormat: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10Kormat) [14:06:21] 10Operations, 10ops-eqiad, 10DBA, 10netops, 10User-Kormat: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10Kormat) I'll be the contact person for the data-persistence team for this. [14:07:03] 10Operations, 10Release-Engineering-Team, 10Scap, 10Security: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261167 (10Urbanecm) >>! In T261167#6412217, @Lucas_Werkmeister_WMDE wrote: > I suspect this task also means that changes can’t actually... [14:12:14] (03PS6) 10Kormat: WIP mariadb: Reduce duplication. [puppet] - 10https://gerrit.wikimedia.org/r/622569 [14:12:19] 10Operations, 10ops-codfw, 10decommission-hardware, 10serviceops: decommission mc2028.codfw.wmnet - https://phabricator.wikimedia.org/T261168 (10Papaul) [14:13:57] !log pt1979@cumin2001 START - Cookbook sre.dns.netbox [14:14:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:59] (03CR) 10Elukey: [C: 03+2] Add new schema[12]00[34] vms to the related LVS endpoints [puppet] - 10https://gerrit.wikimedia.org/r/622567 (https://phabricator.wikimedia.org/T255026) (owner: 10Elukey) [14:20:22] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10Trizek-WMF) [14:20:31] !log Upgrade mysql on db1091 after MCR changes [14:20:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:23] !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [14:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:30] (03CR) 10Bstorm: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [14:23:11] (03CR) 10Ema: [C: 03+2] ATS: trace hostdb handling of TTLs [puppet] - 10https://gerrit.wikimedia.org/r/622570 (https://phabricator.wikimedia.org/T261312) (owner: 10Ema) [14:24:34] 10Operations, 10Traffic, 10Platform Team Initiatives (API Gateway), 10Platform Team Sprints Board (Sprint 1), and 2 others: Client Developer has a cookie-free API call - https://phabricator.wikimedia.org/T258748 (10Naike) [14:27:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12367 and previous config saved to /var/cache/conftool/dbconfig/20200826-142746-marostegui.json [14:27:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:06] (03PS1) 10Filippo Giunchedi: hieradata: use FQDN for chartmuseum service configuration [puppet] - 10https://gerrit.wikimedia.org/r/622577 [14:30:19] (03PS1) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [14:31:07] (03PS6) 10JMeybohm: sre.discovery: Refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) [14:32:11] (03CR) 10jerkins-bot: [V: 04-1] sre.discovery: Refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [14:32:35] (03CR) 10JMeybohm: "Thanks for the reviews! Some follow up questions..." (0310 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [14:33:36] !log elukey@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=schema1003.eqiad.wmnet [14:33:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:42] !log elukey@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=schema1004.eqiad.wmnet [14:33:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:15] (03PS2) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [14:34:19] !log elukey@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=schema2004.codfw.wmnet [14:34:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:22] !log elukey@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=schema2003.codfw.wmnet [14:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:04] 10Operations, 10ops-eqiad, 10netops: eqiad row D switch fabric recabling - https://phabricator.wikimedia.org/T256112 (10ayounsi) [14:36:02] 10Operations, 10serviceops: assess and re-evaluate 'weight' settings of appservers in codfw - https://phabricator.wikimedia.org/T261159 (10Dzahn) Please see this new spreadsheet I made: https://docs.google.com/spreadsheets/d/1rtg4DMx4glZA6T_XVLzt_OlFHQx53Eb_U8criLzCQs4/edit?usp=sharing If you go to the "hard... [14:36:13] (03PS3) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [14:36:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12368 and previous config saved to /var/cache/conftool/dbconfig/20200826-143623-marostegui.json [14:36:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:31] <_joe_> jayme: incoming... [14:36:35] (03PS1) 10Giuseppe Lavagetto: termbox: switch to use envoy to call MediaWiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/622580 (https://phabricator.wikimedia.org/T244843) [14:36:37] (03PS1) 10Giuseppe Lavagetto: Correctly treat fixtures files for new-style deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622581 [14:36:39] (03PS1) 10Giuseppe Lavagetto: Refresh the documentation of the helmfile.d/services [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) [14:36:41] (03PS1) 10Giuseppe Lavagetto: Add an helper script for the conversion to the new layout. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622583 (https://phabricator.wikimedia.org/T258572) [14:36:43] (03PS1) 10Giuseppe Lavagetto: Convert termbox to the new layout using the convert script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622584 (https://phabricator.wikimedia.org/T258572) [14:36:45] (03PS1) 10Giuseppe Lavagetto: Convert citoid to new layout using the conversion script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622585 (https://phabricator.wikimedia.org/T258572) [14:36:52] (03PS4) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [14:38:38] (03PS5) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [14:39:12] (03CR) 10jerkins-bot: [V: 04-1] Convert termbox to the new layout using the convert script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622584 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:39:25] (03CR) 10jerkins-bot: [V: 04-1] Convert citoid to new layout using the conversion script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622585 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:40:49] (03PS6) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [14:40:59] (03PS1) 10Elukey: Remove schema[12]00[12] from their LVS endpoint configs [puppet] - 10https://gerrit.wikimedia.org/r/622587 (https://phabricator.wikimedia.org/T255026) [14:42:42] (03CR) 10Ottomata: [C: 03+1] Remove schema[12]00[12] from their LVS endpoint configs [puppet] - 10https://gerrit.wikimedia.org/r/622587 (https://phabricator.wikimedia.org/T255026) (owner: 10Elukey) [14:42:50] (03PS1) 10Filippo Giunchedi: icinga: support contactgroups stubs [puppet] - 10https://gerrit.wikimedia.org/r/622588 [14:43:45] (03PS1) 10Jbond: pki add fake key [labs/private] - 10https://gerrit.wikimedia.org/r/622590 [14:44:35] (03CR) 10Jbond: [V: 03+2 C: 03+2] pki add fake key [labs/private] - 10https://gerrit.wikimedia.org/r/622590 (owner: 10Jbond) [14:44:47] (03CR) 10Volans: "replies to questions inline" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [14:44:59] !log elukey@puppetmaster1001 conftool action : set/pooled=inactive:weight=0; selector: name=schema2002.codfw.wmnet [14:45:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:02] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Use include instead of template to include defines [deployment-charts] - 10https://gerrit.wikimedia.org/r/622354 (owner: 10JMeybohm) [14:45:04] !log elukey@puppetmaster1001 conftool action : set/pooled=inactive:weight=0; selector: name=schema2001.codfw.wmnet [14:45:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:08] (03PS1) 10Jbond: pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) [14:45:13] !log elukey@puppetmaster1001 conftool action : set/pooled=inactive:weight=0; selector: name=schema1001.eqiad.wmnet [14:45:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:17] !log elukey@puppetmaster1001 conftool action : set/pooled=inactive:weight=0; selector: name=schema1002.eqiad.wmnet [14:45:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:32] (03PS7) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [14:45:39] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [14:46:07] (03CR) 10jerkins-bot: [V: 04-1] pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [14:46:20] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Update patch: Detect kubeconfig as known argument in plugin invocations [debs/helm] - 10https://gerrit.wikimedia.org/r/620890 (owner: 10JMeybohm) [14:47:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12369 and previous config saved to /var/cache/conftool/dbconfig/20200826-144750-marostegui.json [14:47:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:20] (03CR) 10JMeybohm: [C: 03+1] "Thanks. LGTM," [puppet] - 10https://gerrit.wikimedia.org/r/622577 (owner: 10Filippo Giunchedi) [14:49:10] 10Operations, 10Traffic, 10Goal, 10Patch-For-Review: Verify ATS handling of DNS TTLs - https://phabricator.wikimedia.org/T261312 (10ema) Running [[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/trafficserver/files/hostdb_ttls.stp | hostdb_ttls.stp ]] on cp... [14:50:02] (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: use FQDN for chartmuseum service configuration [puppet] - 10https://gerrit.wikimedia.org/r/622577 (owner: 10Filippo Giunchedi) [14:50:13] jayme: thanks for the quick review ! [14:50:34] thanks for taking care! :) [14:53:58] 10Operations, 10observability: Grafana link redirecting to port :3000 - https://phabricator.wikimedia.org/T261184 (10fgiunchedi) Thanks for the report, AFAICT this is because we never set `root_url` in `grafana.ini` and I guess the Grafana 7 upgrade made the setting significant now [14:55:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12370 and previous config saved to /var/cache/conftool/dbconfig/20200826-145531-marostegui.json [14:55:32] (03CR) 10JMeybohm: [C: 03+2] Use include instead of template to include defines [deployment-charts] - 10https://gerrit.wikimedia.org/r/622354 (owner: 10JMeybohm) [14:55:33] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [14:55:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1134 for MCR change', diff saved to https://phabricator.wikimedia.org/P12371 and previous config saved to /var/cache/conftool/dbconfig/20200826-145612-marostegui.json [14:56:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:24] (03PS1) 10Marostegui: db1134: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622593 [14:57:26] (03CR) 10Giuseppe Lavagetto: mariadb: Add profile::mariadb::common (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/622578 (owner: 10Kormat) [14:57:41] (03Merged) 10jenkins-bot: Use include instead of template to include defines [deployment-charts] - 10https://gerrit.wikimedia.org/r/622354 (owner: 10JMeybohm) [14:58:25] (03CR) 10Marostegui: [C: 03+2] db1134: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622593 (owner: 10Marostegui) [14:58:48] (03CR) 10Giuseppe Lavagetto: [C: 03+2] termbox: switch to use envoy to call MediaWiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/622580 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [14:59:16] (03PS1) 10Filippo Giunchedi: grafana: set root_url to fix dashboard redirects [puppet] - 10https://gerrit.wikimedia.org/r/622594 (https://phabricator.wikimedia.org/T261184) [15:00:47] 10Operations, 10Traffic, 10Goal, 10Patch-For-Review: Verify ATS handling of DNS TTLs - https://phabricator.wikimedia.org/T261312 (10Volans) @ema as one of the requester for this test thanks a lot for the effort. It looks like we're in good shape here. [15:01:11] (03PS2) 10Filippo Giunchedi: grafana: set root_url to fix dashboard redirects [puppet] - 10https://gerrit.wikimedia.org/r/622594 (https://phabricator.wikimedia.org/T261184) [15:01:15] (03Merged) 10jenkins-bot: termbox: switch to use envoy to call MediaWiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/622580 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [15:01:31] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:01:47] (03PS2) 10Jbond: pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) [15:02:47] (03CR) 10jerkins-bot: [V: 04-1] pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [15:03:17] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Seve Kim - https://phabricator.wikimedia.org/T261208 (10jijiki) p:05Triage→03Medium a:03jijiki [15:03:22] 10Operations, 10ops-codfw, 10netops: (Need by: ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Jgreen) >>! In T253154#6409864, @Papaul wrote: > @Jgreen is it okay for me to replace fmsw on the 27th (start time 9:30am end time 11:30am) CT Yes, no problem. [15:03:31] 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10jijiki) p:05Triage→03High [15:04:17] (03PS3) 10Filippo Giunchedi: grafana: set root_url to fix dashboard redirects [puppet] - 10https://gerrit.wikimedia.org/r/622594 (https://phabricator.wikimedia.org/T261184) [15:04:19] 10Operations, 10observability, 10serviceops: Figure out switchover steps for mwlog hosts - https://phabricator.wikimedia.org/T261274 (10jijiki) p:05Triage→03High [15:04:22] (03PS8) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [15:05:01] (03PS7) 10JMeybohm: sre.discovery: Refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) [15:05:24] 10Operations, 10observability, 10serviceops: Figure out switchover steps for mwlog hosts - https://phabricator.wikimedia.org/T261274 (10jijiki) @RLazarus I am setting priority to high as the switchover is scheduled for next week. [15:05:25] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 (owner: 10Kormat) [15:05:32] (03PS3) 10Jbond: pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) [15:05:35] (03PS9) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [15:05:37] 10Operations, 10serviceops: Create a gateway in kubernetes for the execution of our "lambdas" - https://phabricator.wikimedia.org/T261277 (10jijiki) p:05Triage→03Medium [15:05:53] (03PS1) 10Giuseppe Lavagetto: termbox: add service-proxy file to eqiad,codfw deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/622595 [15:06:33] (03CR) 10jerkins-bot: [V: 04-1] pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [15:06:42] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 (owner: 10Kormat) [15:06:49] (03CR) 10JMeybohm: "Thanks!" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [15:06:54] (03CR) 10Giuseppe Lavagetto: [C: 03+2] termbox: add service-proxy file to eqiad,codfw deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/622595 (owner: 10Giuseppe Lavagetto) [15:08:23] (03PS10) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [15:09:15] (03CR) 10Kormat: mariadb: Add profile::mariadb::common (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/622578 (owner: 10Kormat) [15:09:51] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1002/24697/" [puppet] - 10https://gerrit.wikimedia.org/r/622594 (https://phabricator.wikimedia.org/T261184) (owner: 10Filippo Giunchedi) [15:11:02] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10jijiki) [15:11:05] 10Operations, 10DBA, 10observability: smart-data-dump --syslog producing errors and spamming root@ - https://phabricator.wikimedia.org/T252500 (10jijiki) 05Resolved→03Open [15:11:27] !log oblivian@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' . [15:11:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:37] 10Operations, 10observability, 10serviceops: Figure out switchover steps for mwlog hosts - https://phabricator.wikimedia.org/T261274 (10fgiunchedi) p:05High→03Medium @jijiki thank you for the triaging, we'll be likely skipping mwlog hosts this time around though (i.e. leave in eqiad) [15:12:41] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_mobileapps_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:14:05] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10jijiki) [15:14:07] 10Operations, 10DBA, 10observability: smart-data-dump --syslog producing errors and spamming root@ - https://phabricator.wikimedia.org/T252500 (10jijiki) 05Open→03Resolved Reopened the wrong task, re-closing. Nothing to see here, move along. [15:14:19] (03PS4) 10Jbond: pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) [15:14:39] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:14:57] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:15:32] (03CR) 10jerkins-bot: [V: 04-1] pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [15:17:18] (03PS11) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 [15:17:27] (03PS5) 10Jbond: pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) [15:19:10] Lucas_WMDE: uh, o fogot to do what I promised i'd do. Sorry about that. Took me a lot longer to get my ducks in a row than I expected. [15:19:17] anyway... [15:19:32] I don’t think it’s that big a deal ^^ [15:19:36] probably not worth logging now, at least [15:19:36] (03PS12) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 (https://phabricator.wikimedia.org/T256972) [15:19:38] is there a sane reason why mwscript redirects all script output to stderr? Took me a while to figure that one out... [15:19:45] ?! [15:19:48] hum [15:19:54] haven’t noticed that before, no… [15:20:10] but I also remember using `tee` succesfully with mwscript… [15:20:13] I was trying to capture output with tee, and got empty files... [15:20:17] 2>&1 fixed it [15:20:24] e. g. https://sal.toolforge.org/log/YONM_HMBj_Bg1xd3_Wxc [15:20:36] so that sounds like it must be a recent change [15:20:53] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 558 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:22:43] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10RLazarus) @Trizek-WMF Question from @debt earlier, will you be posting to wikitech-l also, or only wikitech-ambassadors? No wrong answer A... [15:23:49] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10Trizek-WMF) I sent a message to wikitech-l earlier today. Maybe it is pending moderation. [15:24:12] (03PS6) 10Jbond: pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) [15:26:20] Lucas_WMDE: you know what? I can't reproduce the issue. I must have been imagining things. [15:26:29] strange ^^ [15:26:34] (03PS1) 10Elukey: presto: set hive.parquet.use-column-names to true [puppet] - 10https://gerrit.wikimedia.org/r/622598 (https://phabricator.wikimedia.org/T261261) [15:27:31] PROBLEM - Check correctness of the icinga configuration on icinga1001 is CRITICAL: Icinga configuration contains errors https://wikitech.wikimedia.org/wiki/Icinga [15:27:36] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "given parsoid-php is the only parsoid remaining, I'd rather switch parsoid-php to use the "parsoid" conftool service. That needs some addi" [puppet] - 10https://gerrit.wikimedia.org/r/559705 (https://phabricator.wikimedia.org/T241207) (owner: 10Dzahn) [15:27:44] (03PS7) 10Jbond: pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) [15:29:53] (03CR) 10Jbond: [C: 03+2] pki::server: refactor to pass through params [puppet] - 10https://gerrit.wikimedia.org/r/622591 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [15:29:59] checking the icinga configuration alert [15:30:48] *ahem* I was wrong with the chartmuseu patch :( I'll revert [15:30:56] jayme: FYI ^ [15:31:52] (03CR) 10Elukey: [C: 03+2] presto: set hive.parquet.use-column-names to true [puppet] - 10https://gerrit.wikimedia.org/r/622598 (https://phabricator.wikimedia.org/T261261) (owner: 10Elukey) [15:33:20] (03CR) 10Nuria: [C: 03+1] presto: set hive.parquet.use-column-names to true [puppet] - 10https://gerrit.wikimedia.org/r/622598 (https://phabricator.wikimedia.org/T261261) (owner: 10Elukey) [15:33:54] (03PS6) 10Lucas Werkmeister (WMDE): Add new slow-bot group for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618245 (https://phabricator.wikimedia.org/T258354) (owner: 10Tobias Andersson) [15:34:00] (03PS1) 10Filippo Giunchedi: Revert "hieradata: use FQDN for chartmuseum service configuration" [puppet] - 10https://gerrit.wikimedia.org/r/622599 [15:34:26] godog: hmm...you know what went wrong? [15:34:37] (03CR) 10Filippo Giunchedi: [C: 03+2] Revert "hieradata: use FQDN for chartmuseum service configuration" [puppet] - 10https://gerrit.wikimedia.org/r/622599 (owner: 10Filippo Giunchedi) [15:35:08] jayme: yeah icinga hosts definitions are not FQDNs, which is expected [15:35:10] (03CR) 10Lucas Werkmeister (WMDE): Add new slow-bot group for Wikidata (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618245 (https://phabricator.wikimedia.org/T258354) (owner: 10Tobias Andersson) [15:35:19] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:36:51] godog: and puppet does some magic on mangling the other entries then? Stipping .svc....? [15:38:12] (03PS1) 10Milimetric: analytics_cluster/turnilo: Configure url shortner [puppet] - 10https://gerrit.wikimedia.org/r/622600 (https://phabricator.wikimedia.org/T233336) [15:38:33] jayme: eehh "it depends", hostnames are unqualified in icinga *but* service hostnames are not, for historical raisins AIUI [15:39:00] the alert should recover shortly [15:40:21] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:41:10] !log oblivian@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' . [15:41:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:32] (03PS1) 10CDanis: ripeatlas: bump IPv6 failure threshold [puppet] - 10https://gerrit.wikimedia.org/r/622602 [15:47:40] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10Cparle) @MarkTraceur is on vacation this week, so maybe @dr0ptp4kt could sign this off for me instead? [15:47:44] (03CR) 10Ayounsi: [C: 03+1] "Thanks" [puppet] - 10https://gerrit.wikimedia.org/r/622602 (owner: 10CDanis) [15:47:51] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:48:05] (03CR) 10CDanis: [C: 03+2] ripeatlas: bump IPv6 failure threshold [puppet] - 10https://gerrit.wikimedia.org/r/622602 (owner: 10CDanis) [15:48:07] (03PS1) 10Giuseppe Lavagetto: termbox: enable envoy telemetry in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/622603 [15:48:37] (03PS1) 10Bearloga: wgEventStreams: Streams for testing MEP-based analytics instruments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622604 (https://phabricator.wikimedia.org/T259714) [15:49:07] RECOVERY - Check correctness of the icinga configuration on icinga1001 is OK: Icinga configuration is correct https://wikitech.wikimedia.org/wiki/Icinga [15:49:09] (03CR) 10jerkins-bot: [V: 04-1] wgEventStreams: Streams for testing MEP-based analytics instruments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622604 (https://phabricator.wikimedia.org/T259714) (owner: 10Bearloga) [15:50:21] (03PS2) 10Bearloga: wgEventStreams: Streams for testing MEP-based analytics instruments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622604 (https://phabricator.wikimedia.org/T259714) [15:50:57] PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [15:51:23] <_joe_> effie: is that ^^ you? [15:51:42] today I have not touched it tbh [15:51:53] I will check it since I am here [15:51:57] thank you [15:52:03] <_joe_> thanks :) [15:52:13] <_joe_> I assumed it was you given your experiments ongoing [15:52:26] (03CR) 10Giuseppe Lavagetto: [C: 03+2] termbox: enable envoy telemetry in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/622603 (owner: 10Giuseppe Lavagetto) [15:53:12] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 52 probes of 556 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:53:59] !log oblivian@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' . [15:54:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:48] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10dr0ptp4kt) Approved. [15:56:17] (03PS1) 10Jbond: cfssl: add additional usages [puppet] - 10https://gerrit.wikimedia.org/r/622605 [15:57:51] (03CR) 10Jbond: [C: 03+2] cfssl: add additional usages [puppet] - 10https://gerrit.wikimedia.org/r/622605 (owner: 10Jbond) [15:59:03] <_joe_> jbond42: what are we using cfssl for? [15:59:31] !log oblivian@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' . [15:59:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:14] _joe_: the plan is to set it up as a new pki solution [16:01:23] <_joe_> oh, nice :) [16:01:49] i have an okr to essentially get a PoC up this Q [16:02:26] 10Operations, 10LDAP-Access-Requests: LDAP access to wmf for Ryan Brounley - https://phabricator.wikimedia.org/T261324 (10RBrounley_WMF) [16:04:25] (03PS1) 10Elukey: role::analytics_cluster::coordinator: set hive.parquet.use-column-names to true [puppet] - 10https://gerrit.wikimedia.org/r/622607 (https://phabricator.wikimedia.org/T261261) [16:05:30] (03PS1) 10Filippo Giunchedi: alertmanager: assign AM-specific active_host/partners variables [puppet] - 10https://gerrit.wikimedia.org/r/622608 (https://phabricator.wikimedia.org/T258948) [16:10:00] RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 648 bytes in 4.534 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [16:11:21] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1003/24708/" [puppet] - 10https://gerrit.wikimedia.org/r/622608 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [16:13:31] (03CR) 10Ottomata: [C: 03+1] "Nice, I should be able merge and deploy this later today, or tomorrow my morning." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622604 (https://phabricator.wikimedia.org/T259714) (owner: 10Bearloga) [16:16:12] ottomata: awesome, thank you!!! [16:21:02] (03CR) 10Bstorm: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [16:23:41] (03PS1) 10Cwhite: profile: re-enable grafana rsync codfw->eqiad [puppet] - 10https://gerrit.wikimedia.org/r/622610 (https://phabricator.wikimedia.org/T259143) [16:26:51] (03PS1) 10Itamar Givon: Rename localEntitySourceName setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622612 (https://phabricator.wikimedia.org/T258060) [16:29:17] (03PS2) 10Itamar Givon: Rename localEntitySourceName setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622612 (https://phabricator.wikimedia.org/T258060) [16:32:28] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/622608 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [16:38:27] (03PS2) 10Milimetric: analytics_cluster/turnilo: Configure url shortner [puppet] - 10https://gerrit.wikimedia.org/r/622600 (https://phabricator.wikimedia.org/T233336) [16:47:12] (03PS3) 10Hashar: Merge tag 'debian/1.8.19-1~exp1' into debian/buster-wikimedia [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/621291 (https://phabricator.wikimedia.org/T254465) [16:54:31] (03PS1) 10Andrew Bogott: wmcs galera tests: don't page the whole SRE team for galera issues! [puppet] - 10https://gerrit.wikimedia.org/r/622613 (https://phabricator.wikimedia.org/T260688) [16:55:11] (03CR) 10Bstorm: [C: 03+1] wmcs galera tests: don't page the whole SRE team for galera issues! [puppet] - 10https://gerrit.wikimedia.org/r/622613 (https://phabricator.wikimedia.org/T260688) (owner: 10Andrew Bogott) [16:56:07] (03CR) 10Andrew Bogott: [C: 03+2] wmcs galera tests: don't page the whole SRE team for galera issues! [puppet] - 10https://gerrit.wikimedia.org/r/622613 (https://phabricator.wikimedia.org/T260688) (owner: 10Andrew Bogott) [16:56:26] (03CR) 10jerkins-bot: [V: 04-1] Merge tag 'debian/1.8.19-1~exp1' into debian/buster-wikimedia [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/621291 (https://phabricator.wikimedia.org/T254465) (owner: 10Hashar) [17:02:34] 10Operations, 10Release-Engineering-Team, 10Scap, 10Security: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261167 (10thcipriani) >>! In T261167#6412564, @Urbanecm wrote: > I see. However, it seems your goal is to not let anyone to test changes... [17:05:49] jouncebot: next [17:05:49] In 0 hour(s) and 54 minute(s): Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200826T1800) [17:05:49] In 0 hour(s) and 54 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200826T1800) [17:06:12] I'm grabbing the conch for a VE config patch that we should have deployed before the train (oops). [17:07:28] (03PS3) 10Jforrester: Prepare for VE's new Beta Feature preference [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620953 (https://phabricator.wikimedia.org/T254349) (owner: 10Esanders) [17:07:32] (03CR) 10Jforrester: [C: 03+2] Prepare for VE's new Beta Feature preference [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620953 (https://phabricator.wikimedia.org/T254349) (owner: 10Esanders) [17:08:16] (03Merged) 10jenkins-bot: Prepare for VE's new Beta Feature preference [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620953 (https://phabricator.wikimedia.org/T254349) (owner: 10Esanders) [17:12:41] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T254349 Set wgVisualEditorEnableBetaFeature true on wikis that need it (duration: 01m 03s) [17:12:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:12:45] T254349: A default install of MW+VE still shows VE in beta features and defaults the user option to disabled - https://phabricator.wikimedia.org/T254349 [17:13:50] RoanKattouw: Double-checking you don't want 621108 deployed now? [17:14:26] James_F: I don't, it's for tomorrow [17:14:27] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Documentation-only change; sync for line sanity (duration: 01m 04s) [17:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:15:30] 10Operations, 10Release-Engineering-Team, 10Scap, 10Security: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261167 (10Urbanecm) For those who connect to mwdebug1001 sure, but not for those who use 1002 by default, and are just surprised by the... [17:15:48] OK, I'm releasing the conch. [17:22:37] (03PS1) 10Cmjohnson: Adding an-test-worker1001-3 to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/622619 (https://phabricator.wikimedia.org/T255520) [17:22:53] (03CR) 10jerkins-bot: [V: 04-1] Adding an-test-worker1001-3 to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/622619 (https://phabricator.wikimedia.org/T255520) (owner: 10Cmjohnson) [17:28:21] cmjohnson1: o/ I already done the work for an-test-worker [17:28:30] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10Nuria) I see, approved on my end. Please make sure to read https://wikitech.wikimedia.org/wiki/Analytics/Data_Access_Guidelines the main takes are that data cannot leave our bo... [17:29:07] (03CR) 10Elukey: [C: 04-1] "Already added the partman recipe for an-test-worker, please don't merge this one" [puppet] - 10https://gerrit.wikimedia.org/r/622619 (https://phabricator.wikimedia.org/T255520) (owner: 10Cmjohnson) [17:33:30] (03CR) 10Hashar: [C: 03+1] "AFAIK all the data have been rsynced out of releases1001 and we should have a backup stored somewhere in case we missed something? So I g" [puppet] - 10https://gerrit.wikimedia.org/r/621090 (https://phabricator.wikimedia.org/T260742) (owner: 10Dzahn) [17:35:27] (03Abandoned) 10Cmjohnson: Adding an-test-worker1001-3 to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/622619 (https://phabricator.wikimedia.org/T255520) (owner: 10Cmjohnson) [17:35:51] (03PS1) 10Jason Linehan: Enables error logging on Hebrew Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622620 (https://phabricator.wikimedia.org/T255585) [17:36:14] (03PS1) 10Jdlrobson: Update Thai and Greek taglines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622621 (https://phabricator.wikimedia.org/T258552) [17:36:24] (03CR) 10jerkins-bot: [V: 04-1] Update Thai and Greek taglines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622621 (https://phabricator.wikimedia.org/T258552) (owner: 10Jdlrobson) [17:36:38] 10Operations, 10DNS, 10Traffic: 'skip_first' feature flag for gdnsd GeoIP plugin - https://phabricator.wikimedia.org/T261340 (10CDanis) [17:36:41] (03CR) 10Mholloway: [C: 03+2] Enables error logging on Hebrew Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622620 (https://phabricator.wikimedia.org/T255585) (owner: 10Jason Linehan) [17:36:51] 10Operations, 10DNS, 10Traffic: 'skip_first' feature flag for gdnsd GeoIP plugin - https://phabricator.wikimedia.org/T261340 (10CDanis) [17:36:55] 10Operations, 10Epic, 10Goal: automatically collect network error reports from users' browsers (Network Error Logging API) - https://phabricator.wikimedia.org/T257527 (10CDanis) [17:37:10] (03PS2) 10Jdlrobson: Update Thai and Greek taglines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622621 (https://phabricator.wikimedia.org/T258552) [17:37:26] (03Merged) 10jenkins-bot: Enables error logging on Hebrew Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622620 (https://phabricator.wikimedia.org/T255585) (owner: 10Jason Linehan) [17:39:00] 10Operations, 10Epic, 10Goal: automatically collect network error reports from users' browsers (Network Error Logging API) - https://phabricator.wikimedia.org/T257527 (10CDanis) [17:41:25] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable client side error logging on hewiki (T255585) (duration: 01m 04s) [17:41:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:29] T255585: Extend client-side error logging coverage - https://phabricator.wikimedia.org/T255585 [17:46:15] (03PS1) 10Cmjohnson: Adding production dns an-test-master100[12] & an-test-coord1001 [dns] - 10https://gerrit.wikimedia.org/r/622622 (https://phabricator.wikimedia.org/T255518) [17:46:59] (03CR) 10Cmjohnson: [C: 03+2] Adding production dns an-test-master100[12] & an-test-coord1001 [dns] - 10https://gerrit.wikimedia.org/r/622622 (https://phabricator.wikimedia.org/T255518) (owner: 10Cmjohnson) [17:47:41] (03CR) 10Hashar: [C: 04-1] "Looks like we are now missing helm charts that were published at: https://releases.wikimedia.org/charts/" [puppet] - 10https://gerrit.wikimedia.org/r/621090 (https://phabricator.wikimedia.org/T260742) (owner: 10Dzahn) [17:48:55] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki1001 - https://phabricator.wikimedia.org/T259826 (10Cmjohnson) @jbond Can you please tell me which partman recipe you need for this server. [17:50:16] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-test-master1001.eqiad.wmn... [18:00:04] marxarelli and longma: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Train log triage with CPT. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200826T1800). [18:00:05] RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do Morning backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200826T1800). [18:00:05] Jdlrobson: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:11] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-master1001.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-test-mast... [18:00:26] I can deploy today Jdlrobson ! [18:00:43] (03CR) 10Dzahn: "> Patch Set 3: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/621090 (https://phabricator.wikimedia.org/T260742) (owner: 10Dzahn) [18:01:29] Urbanecm: great [18:01:39] just a small one [18:02:02] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-test-master1002.eqiad.wmn... [18:02:02] (03PS3) 10Urbanecm: Update Thai and Greek taglines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622621 (https://phabricator.wikimedia.org/T258552) (owner: 10Jdlrobson) [18:02:05] small ones are better than big :-) [18:02:05] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime [18:02:06] !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [18:02:08] (03CR) 10Urbanecm: [C: 03+2] Update Thai and Greek taglines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622621 (https://phabricator.wikimedia.org/T258552) (owner: 10Jdlrobson) [18:02:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:41] (03Merged) 10jenkins-bot: Update Thai and Greek taglines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622621 (https://phabricator.wikimedia.org/T258552) (owner: 10Jdlrobson) [18:03:15] Jdlrobson: btw, could I make sure you know about https://phabricator.wikimedia.org/T258552#6400256? I've actually deployed that few days ago, because I felt the issue is quite urgent for the community, but I'd still like to be sure you know about that :-) [18:03:23] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-test-coord1001.eqiad.wmne... [18:04:18] Urbanecm: that's fine [18:04:37] we have a task open for fixing chinese [18:04:49] okay, cool! [18:05:06] Jdlrobson: ready for you at mwdebug1002! [18:05:21] hashar: regarding the helm charts you are missing, see https://github.com/helm/chartmuseum [18:06:19] Urbanecm: lookng [18:06:39] perfect Urbanecm please sync! [18:06:43] doing! [18:07:07] mutante: yeah that was for marxarelli ;] [18:07:11] I am off, dinner time! [18:07:23] hashar: https://releases.wikimedia.org/charts/ should have been replaced by https://helm-charts.wikimedia.org/ and it's not related to replacement of ... [18:07:29] Urbanecm: thanks for taking care of the chinese logo deployment though. I appreciate it! [18:08:00] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack - https://phabricator.wikimedia.org/T234854 (10herron) >>! In T234854#6344456, @jcrespo wrote: > I am getting a lot of 500 internal server errors on logstash-next instance. I am guessing that is expected/WIP? Not necessarily expected,... [18:08:44] !log upgraded eqiad elk v7 cluster from 7.8.0 to 7.9.0 T234854 [18:08:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:47] T234854: Upgrade ELK Stack - https://phabricator.wikimedia.org/T234854 [18:09:04] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/: 40092898d8c70191324e844d2c222469b954e9ef: Update Thai and Greek taglines (T258552) (duration: 01m 05s) [18:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:07] T258552: Add wordmarks and taglines for 26 more Wikipedias - https://phabricator.wikimedia.org/T258552 [18:11:17] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-master1002.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-test-mast... [18:11:25] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 40092898d8c70191324e844d2c222469b954e9ef: Update Thai and Greek taglines (T258552) (duration: 01m 03s) [18:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:05] !log Purge Thai and Greek taglines, URLs are at P12372 (T258552) [18:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:10] Jdlrobson: should be all live then :) [18:12:16] thanks Urbanecm great!! [18:13:08] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-coord1001.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-test-coord... [18:13:20] 10Operations, 10Product-Infrastructure-Data, 10Epic, 10Goal: automatically collect network error reports from users' browsers (Network Error Logging API) - https://phabricator.wikimedia.org/T257527 (10jlinehan) [18:15:14] (03CR) 10Dzahn: [C: 04-1] "one inline comment about matching path in data.yaml" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621343 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm) [18:15:22] (03PS4) 10Urbanecm: Added import sources for mlwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621741 (https://phabricator.wikimedia.org/T260716) (owner: 10Jayprakash12345) [18:15:24] (03CR) 10Urbanecm: [C: 03+2] Added import sources for mlwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621741 (https://phabricator.wikimedia.org/T260716) (owner: 10Jayprakash12345) [18:16:21] (03Merged) 10jenkins-bot: Added import sources for mlwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621741 (https://phabricator.wikimedia.org/T260716) (owner: 10Jayprakash12345) [18:18:08] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack to version 7 - https://phabricator.wikimedia.org/T234854 (10Krinkle) [18:19:59] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 945b97cff8b8a1e4bb43b613fc93b099f74945f7: Added import sources for mlwiktionary (T260716) (duration: 01m 05s) [18:20:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:03] T260716: Add import sources for mlwiktionary - https://phabricator.wikimedia.org/T260716 [18:20:09] (03PS5) 10Bstorm: cumin: for new wmcs. prefix for cookbooks, grant access to wmcs-admins [puppet] - 10https://gerrit.wikimedia.org/r/621343 (https://phabricator.wikimedia.org/T260389) [18:20:37] (03CR) 10Bstorm: "> Patch Set 4: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621343 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm) [18:21:26] !log Morning B&C done [18:21:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:49] (03Abandoned) 10Dzahn: conftool: remove parsoid, keep parsoid-php [puppet] - 10https://gerrit.wikimedia.org/r/559705 (https://phabricator.wikimedia.org/T241207) (owner: 10Dzahn) [18:28:55] (03CR) 10Dzahn: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn) [18:34:01] (03CR) 10Dzahn: [C: 03+1] "The intention and the puppet part all seem good to me now. +1 for that. Let's have more reviews of secure-cookbook.py though." [puppet] - 10https://gerrit.wikimedia.org/r/621343 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm) [18:34:22] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) install new controller into frdb1001 OR add to spares - https://phabricator.wikimedia.org/T261348 (10RobH) [18:34:32] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) install new controller into frdb1001 OR add to spares - https://phabricator.wikimedia.org/T261348 (10RobH) [18:37:02] (03PS7) 10Dzahn: prometheus: hiera() -> lookup(), add data type for prometheus_nodes [puppet] - 10https://gerrit.wikimedia.org/r/621759 [18:37:55] (03CR) 10Dzahn: "PS7: sed -i 's/Stdlib::Fqdn/Stdlib::Host/g' *.pp (though I am not fully convinced we need to use IP addresses, but it's fine with me eit" [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn) [18:39:34] ; [18:54:44] 10Operations, 10serviceops, 10Patch-For-Review: decom releases1001 and releases2001 - https://phabricator.wikimedia.org/T260742 (10Dzahn) 05Open→03Stalled [18:54:48] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review: replace backends for releases.wikimedia.org with buster VMs - https://phabricator.wikimedia.org/T247652 (10Dzahn) [18:56:37] (03PS1) 10Ppchelko: Api-gateway: implement fallback for anon users in lua until envoy 1.16 [deployment-charts] - 10https://gerrit.wikimedia.org/r/622650 (https://phabricator.wikimedia.org/T254914) [18:57:31] (03CR) 10Dzahn: "Can I use gerrit to get a general OK to move on with T260654 before the switch? I created https://docs.google.com/spreadsheets/d/1rtg4DMx4" [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [18:57:36] (03PS2) 10Ppchelko: Api-gateway: implement fallback for anon users in lua until envoy 1.16 [deployment-charts] - 10https://gerrit.wikimedia.org/r/622650 (https://phabricator.wikimedia.org/T254914) [19:00:04] marxarelli and longma: That opportune time is upon us again. Time for a Mediawiki train - American Version deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200826T1900). [19:02:16] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [19:02:46] (03PS1) 10Herron: kibana: move kibana.yml settings to paramaters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [19:03:36] (03PS2) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [19:04:00] (03Restored) 10Ppchelko: api-gateway: Restrict unauthenticated write HTTP methods, permit read HTTP methods [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [19:04:08] 10Operations, 10Mail, 10OTRS, 10Trust-and-Safety, and 2 others: Forward emails addressed to privacy@wikidata to privacy@wikimedia - https://phabricator.wikimedia.org/T255733 (10Dzahn) a:05Dzahn→03jrbs @jrbs Assinging over to you because it seems remaining questions are for you and this is currently sta... [19:04:12] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [19:04:36] (03CR) 10jerkins-bot: [V: 04-1] kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 (owner: 10Herron) [19:05:41] (03PS3) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [19:06:55] (03CR) 10Ppchelko: "After much discussion, we decided that ok, we are going to resurrect this." (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [19:16:41] o/ rolling train to group1 shortly [19:20:22] (03PS1) 10Dduvall: group1 wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622654 [19:20:24] (03CR) 10Dduvall: [C: 03+2] group1 wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622654 (owner: 10Dduvall) [19:21:48] (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622654 (owner: 10Dduvall) [19:23:25] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.6 [19:23:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:28] !log dduvall@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.6 (duration: 01m 03s) [19:24:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:34] all quiet on the wmf.6 front [19:33:09] nice. [19:33:11] !log 1.36.0-wmf.6 promoted to group1 (T257974). logs show no new errors [19:33:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:15] T257974: 1.36.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T257974 [19:35:39] (03Abandoned) 10Jdlrobson: WIP: Make it easier to configure wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621303 (owner: 10Jdlrobson) [19:35:43] * marxarelli wonders if we should add "read some wikinews" to the group1 deployment docs [19:36:11] (03PS1) 10Bstorm: shared-storage: add specific NFS volume monitoring for cleanups [puppet] - 10https://gerrit.wikimedia.org/r/622655 (https://phabricator.wikimedia.org/T261335) [19:42:20] (03PS3) 10Ppchelko: api-gateway: Restrict unauthenticated write HTTP methods, permit read HTTP methods [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [19:44:14] (03CR) 10Ppchelko: "Made it a bit simpler using the fact matches are executed in order. So, we allow anything for /, /w, /wiki, and then add a requirement for" [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [19:51:21] (03PS4) 10Ppchelko: api-gateway: Restrict unauthenticated write HTTP methods, permit read HTTP methods [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [19:51:29] !log standardize pfw3-eqiad [19:51:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:51:58] (03CR) 10Ppchelko: "And, made the rules a bit more sophisticated - allow missing JWT, but if it is present - verify it. It makes more sense then just ignoring" [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [19:54:32] (03CR) 10Ppchelko: api-gateway: Restrict unauthenticated write HTTP methods, permit read HTTP methods (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [20:00:04] halfak and accraze: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200826T2000). [20:09:40] (03CR) 10Clarakosi: [C: 03+2] Api-gateway: implement fallback for anon users in lua until envoy 1.16 [deployment-charts] - 10https://gerrit.wikimedia.org/r/622650 (https://phabricator.wikimedia.org/T254914) (owner: 10Ppchelko) [20:12:08] (03Merged) 10jenkins-bot: Api-gateway: implement fallback for anon users in lua until envoy 1.16 [deployment-charts] - 10https://gerrit.wikimedia.org/r/622650 (https://phabricator.wikimedia.org/T254914) (owner: 10Ppchelko) [20:52:26] (03CR) 10Dzahn: "here is compiler output from running on *" [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn) [20:53:47] (03CR) 10RLazarus: "> We also need to actually pool the ro records in the dc_to, and depool them in dc_from" [cookbooks] - 10https://gerrit.wikimedia.org/r/621304 (owner: 10RLazarus) [20:57:42] (03Abandoned) 10Dzahn: confd/redis::multidc: switch to systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/503097 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [20:58:25] (03Abandoned) 10Dzahn: mediawiki::cgroup: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448778 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [20:59:23] (03PS2) 10Dzahn: mediawiki/php: use new data type for PHP version [puppet] - 10https://gerrit.wikimedia.org/r/605179 [21:01:03] (03Abandoned) 10Dzahn: jenkins: replace system user/group with systemd-sysuser [puppet] - 10https://gerrit.wikimedia.org/r/606286 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [21:01:08] (03Abandoned) 10Dzahn: zuul: replace user/group with systemd-sysuser and reserved UID [puppet] - 10https://gerrit.wikimedia.org/r/607853 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [21:05:25] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/24710/" [puppet] - 10https://gerrit.wikimedia.org/r/605179 (owner: 10Dzahn) [21:06:26] (03CR) 10RLazarus: [C: 03+1] "Ping. :) We can include this and the two following patches in the test on Friday, if they're merged in time. Otherwise let's wait and merg" [puppet] - 10https://gerrit.wikimedia.org/r/617745 (owner: 10Krinkle) [21:07:35] (03CR) 10Krinkle: "Ping for whom? I've tested this already on mwmaint in codfw. I don't have merge rights there though :)" [puppet] - 10https://gerrit.wikimedia.org/r/617745 (owner: 10Krinkle) [21:08:35] (03Abandoned) 10Dzahn: httpd: fix mpm_event module conflict with mpm_prefork [puppet] - 10https://gerrit.wikimedia.org/r/451206 (https://phabricator.wikimedia.org/T196968) (owner: 10Dzahn) [21:08:51] (03Abandoned) 10Dzahn: mediawiki: also include the MW apache2.conf on jobrunners [puppet] - 10https://gerrit.wikimedia.org/r/599683 (https://phabricator.wikimedia.org/T190111) (owner: 10Dzahn) [21:08:57] (03CR) 10RLazarus: [C: 03+2] "Whoops! Sorry, I'll go ahead and merge all three then. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/617745 (owner: 10Krinkle) [21:09:19] (03Abandoned) 10Dzahn: role::mediawiki::appserver: merge role::mediawiki::common in [puppet] - 10https://gerrit.wikimedia.org/r/526290 (owner: 10Dzahn) [21:09:30] (03CR) 10RLazarus: [C: 03+2] mediawiki-cache-warmup: Add "dry" mode [puppet] - 10https://gerrit.wikimedia.org/r/617746 (owner: 10Krinkle) [21:10:10] (03CR) 10RLazarus: [C: 03+2] mediawiki-cache-warmup: Limit warmup URLs to large wikis [puppet] - 10https://gerrit.wikimedia.org/r/617747 (owner: 10Krinkle) [21:16:27] 10Operations, 10Graphoid, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Platform Engineering (Icebox): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Iniquity) Hello, when is it planned to complete undeploy? :) [21:34:09] (03CR) 10Dzahn: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/622655 (https://phabricator.wikimedia.org/T261335) (owner: 10Bstorm) [21:39:28] (03CR) 10BryanDavis: [C: 03+1] "Seems reasonable to me. I checked with Brooke on IRC about disabling the existing system alerts. Her thought is that those will fire *afte" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622655 (https://phabricator.wikimedia.org/T261335) (owner: 10Bstorm) [21:53:12] (03PS1) 10Dzahn: labstore: add missing data types and minor stye fixes [puppet] - 10https://gerrit.wikimedia.org/r/622666 [21:59:33] (03CR) 10Dzahn: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1001/24711/labstore1004.eqiad.wmnet/change.labstore1004.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn) [22:04:53] (03PS2) 10Dzahn: labstore: add missing data types and minor stye fixes [puppet] - 10https://gerrit.wikimedia.org/r/622666 [22:05:44] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:07:44] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:09:30] (03PS3) 10Dzahn: labstore: add missing data types and minor stye fixes [puppet] - 10https://gerrit.wikimedia.org/r/622666 [22:13:29] (03PS1) 10Volans: sre.hosts.decommission: delete ifaces on Netbox [cookbooks] - 10https://gerrit.wikimedia.org/r/622676 (https://phabricator.wikimedia.org/T258729) [22:17:19] (03PS4) 10Dzahn: labstore: add data types and some other style fixes [puppet] - 10https://gerrit.wikimedia.org/r/622666 [22:20:38] !log dzahn@cumin1001 START - Cookbook sre.ganeti.makevm [22:20:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:26:02] (03PS5) 10Dzahn: labstore: add data types and some other style fixes [puppet] - 10https://gerrit.wikimedia.org/r/622666 [22:26:55] !log dzahn@cumin1001 START - Cookbook sre.ganeti.makevm [22:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:28:02] (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/24717/" [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn) [22:29:10] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:30:22] !log dzahn@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [22:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:30:26] (03CR) 10Bstorm: "PCC suggests that it uses the drbd_role variable right at least https://puppet-compiler.wmflabs.org/compiler1002/24715/" [puppet] - 10https://gerrit.wikimedia.org/r/622655 (https://phabricator.wikimedia.org/T261335) (owner: 10Bstorm) [22:31:06] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:34:03] (03PS1) 10BryanDavis: dynamicproxy: un-nest /favicon.ico and /robots.txt locations [puppet] - 10https://gerrit.wikimedia.org/r/622677 (https://phabricator.wikimedia.org/T251628) [22:36:50] !log dzahn@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [22:36:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:57] (03PS1) 10Dzahn: DHCP: add install3001 and install5001 MAC addresses [puppet] - 10https://gerrit.wikimedia.org/r/622680 (https://phabricator.wikimedia.org/T254157) [22:44:23] (03CR) 10Dzahn: [C: 03+2] DHCP: add install3001 and install5001 MAC addresses [puppet] - 10https://gerrit.wikimedia.org/r/622680 (https://phabricator.wikimedia.org/T254157) (owner: 10Dzahn) [22:55:38] (03PS1) 10Dzahn: DHCP: use install1003 as next-server for install4001 [puppet] - 10https://gerrit.wikimedia.org/r/622685 (https://phabricator.wikimedia.org/T254157) [22:56:08] (03CR) 10Dzahn: [C: 03+2] DHCP: use install1003 as next-server for install4001 [puppet] - 10https://gerrit.wikimedia.org/r/622685 (https://phabricator.wikimedia.org/T254157) (owner: 10Dzahn) [23:00:05] RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200826T2300). [23:00:05] !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:00:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:06:47] (03CR) 10Bstorm: [C: 03+2] dynamicproxy: un-nest /favicon.ico and /robots.txt locations [puppet] - 10https://gerrit.wikimedia.org/r/622677 (https://phabricator.wikimedia.org/T251628) (owner: 10BryanDavis) [23:07:19] (03CR) 10BryanDavis: "PCC output: https://puppet-compiler.wmflabs.org/compiler1003/24718/" [puppet] - 10https://gerrit.wikimedia.org/r/622677 (https://phabricator.wikimedia.org/T251628) (owner: 10BryanDavis) [23:28:08] PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:30:04] RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 649 bytes in 7.090 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:38:09] (03PS1) 10Dzahn: DHCP: remove next-server settings for new install servers [puppet] - 10https://gerrit.wikimedia.org/r/622687 (https://phabricator.wikimedia.org/T254157) [23:38:59] (03CR) 10Dzahn: [C: 03+2] DHCP: remove next-server settings for new install servers [puppet] - 10https://gerrit.wikimedia.org/r/622687 (https://phabricator.wikimedia.org/T254157) (owner: 10Dzahn) [23:50:04] (03PS1) 10Ladsgroup: Initial configuration for jawikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622689 (https://phabricator.wikimedia.org/T260320) [23:54:35] (03PS6) 10Huji: Set $wgCheckUserLogLogins to true for fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599492 (https://phabricator.wikimedia.org/T253802) [23:55:09] (03CR) 10CRusnov: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/622676 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [23:56:00] (03PS7) 10Huji: Set $wgCheckUserLogLogins to true for fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599492 (https://phabricator.wikimedia.org/T253802) [23:57:45] (03PS8) 10Huji: Start logging log-ins on select wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599492 (https://phabricator.wikimedia.org/T253802)