[06:08:37] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2038" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462632 (owner: 10Marostegui) [06:10:33] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2038" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462632 (owner: 10Marostegui) [06:10:46] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2059" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462627 (owner: 10Marostegui) [06:10:48] (03CR) 10jenkins-bot: db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462631 (owner: 10Marostegui) [06:10:50] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2038" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462632 (owner: 10Marostegui) [06:11:43] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2038 (duration: 00m 49s) [06:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:38] !log Deploy schema change on s4 eqiad master (db1068), might generate lag on s4 eqiad - T204006 [06:42:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:46] T204006: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 [06:46:03] (03CR) 10Giuseppe Lavagetto: mediawiki::web::prod_sites: convert meta.w.o (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/461396 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [06:46:10] !log Deploy schema change on s7 eqiad master (db1062), might generate lag on s4 eqiad - T204006 [06:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:31] (03CR) 10Giuseppe Lavagetto: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/461396 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [06:49:00] !log Deploy schema change on s2 eqiad master (db1066), might generate lag on s2 eqiad - T204006 [06:49:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:49:08] T204006: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 [06:49:45] (03PS6) 10Giuseppe Lavagetto: mediawiki::web::prod_sites: convert meta.w.o [puppet] - 10https://gerrit.wikimedia.org/r/461396 (https://phabricator.wikimedia.org/T196968) [06:52:12] !log Deploy schema change on s1 eqiad master (db1067), might generate lag on s1 eqiad - T204006 [06:52:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:38] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler1002/12581/mw1261.eqiad.wmnet/ no diffs." [puppet] - 10https://gerrit.wikimedia.org/r/461396 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [06:58:39] PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on einsteinium is CRITICAL: 33.09 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [06:58:42] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::web::prod_sites: convert meta.w.o [puppet] - 10https://gerrit.wikimedia.org/r/461396 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [06:58:50] !log Deploy schema change on labtestwiki - T204006 [06:58:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:00] T204006: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 [07:00:05] !log Deploy schema change on labswiki (wikitech) m5 master db1073 - T204006 [07:00:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:02] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/453093 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [07:11:06] 10Operations, 10ops-eqiad: helium (bacula) - Device not healthy -SMART- - https://phabricator.wikimedia.org/T205364 (10MoritzMuehlenhoff) [07:12:44] (03PS1) 10Elukey: profile::mariadb::misc::el::sanitization: swap cron with systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/462638 (https://phabricator.wikimedia.org/T172532) [07:13:25] !log Deploy schema change on s3 eqiad master (db1075), might generate lag on s3 eqiad - T204006 [07:13:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:33] T204006: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 [07:17:19] RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on einsteinium is OK: (C)60 le (W)70 le 70.81 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [07:19:29] PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on einsteinium is CRITICAL: 56.04 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [07:20:24] (03CR) 10Muehlenhoff: [C: 031] rsync::server: add parameter to use IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/456522 (owner: 10Dzahn) [07:20:49] !log repair sdm sdi on ms-be2043 - T199198 [07:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:00] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [07:23:03] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10User-Addshore: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Smalyshev) Vast majority of remaining failures is from wdqs2003: https://logstash.wikimedia.org/goto/fe077467d39c2ee03ce8127bd... [07:23:50] RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on einsteinium is OK: (C)60 le (W)70 le 70.67 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [07:25:31] (03CR) 10Elukey: [C: 031] rsync::server: add parameter to use IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/456522 (owner: 10Dzahn) [07:33:09] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [07:33:27] (03PS32) 10Gehel: convert role::logstash::elasticsearch to profiles [puppet] - 10https://gerrit.wikimedia.org/r/441894 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [07:34:49] (03CR) 10Gehel: [C: 032] "PPC is happy: https://puppet-compiler.wmflabs.org/compiler1002/12583/" [puppet] - 10https://gerrit.wikimedia.org/r/441894 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [07:34:51] (03CR) 10Giuseppe Lavagetto: mediawiki::web::prod_sites: convert wikisource.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/461397 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [07:37:09] (03CR) 10Filippo Giunchedi: [C: 031] Scrape Kafka jmx exporters in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/462567 (https://phabricator.wikimedia.org/T204088) (owner: 10Ottomata) [07:37:29] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [07:38:45] (03CR) 10Elukey: [C: 032] profile::mariadb::misc::el::sanitization: swap cron with systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/462638 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [07:38:52] (03PS2) 10Elukey: profile::mariadb::misc::el::sanitization: swap cron with systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/462638 (https://phabricator.wikimedia.org/T172532) [07:44:50] PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on einsteinium is CRITICAL: 28.98 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [07:47:39] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:48:42] I've silenced the traffic drop for eqiad until oct 9th [07:49:22] ack [07:49:26] thanks godog [07:49:59] np, I was thinking on whether it is really useful that alert [07:50:40] (03PS6) 10Giuseppe Lavagetto: mediawiki::web::prod_sites: convert wikisource.org [puppet] - 10https://gerrit.wikimedia.org/r/461397 (https://phabricator.wikimedia.org/T196968) [07:50:51] (03PS1) 10Elukey: profile::mariadb::misc::el::sanitization: add correct user to timer [puppet] - 10https://gerrit.wikimedia.org/r/462642 (https://phabricator.wikimedia.org/T172532) [07:51:43] (03CR) 10Elukey: [C: 032] profile::mariadb::misc::el::sanitization: add correct user to timer [puppet] - 10https://gerrit.wikimedia.org/r/462642 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [07:54:41] !log Deploy schema change on s4 codfw - T204006 [07:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:54:50] T204006: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 [07:55:56] (03CR) 10Alexandros Kosiaris: [C: 031] icinga: remove nsca::firewall class, use ferm::service in profile [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [07:57:29] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:58:42] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup backup1001.eqiad.wmnet - https://phabricator.wikimedia.org/T189801 (10akosiaris) 05Open>03Invalid This was impossible to happen, a new box was procured in T196478 [07:59:09] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:59:39] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:00:31] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/461397 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [08:01:12] RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on einsteinium is OK: (C)60 le (W)70 le 78.09 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [08:01:21] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:01:51] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:03:27] !log start of the maintenance to swap Hadoop masters from analytics100[1,2] to an-master100[1,2] - T203635 [08:03:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:34] T203635: Replace the Analytics HDFS/Yarn masters (hardware refresh) - https://phabricator.wikimedia.org/T203635 [08:03:34] this means complete Hadoop shutdown --^ [08:03:51] !log installing reportbug DLA update on jessie hosts [08:03:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:04:44] (03CR) 10Alexandros Kosiaris: [C: 04-1] icinga: on stretch, use systemd::service and upstream unit file (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/462600 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [08:10:08] (03CR) 10Alexandros Kosiaris: [C: 031] "Unfortunately this FIXME has been kept around far too long and now thing are more complicated because of it. the icinga package actually r" [puppet] - 10https://gerrit.wikimedia.org/r/462593 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [08:10:53] (03CR) 10Alexandros Kosiaris: [C: 031] icinga: remove icinga::group class [puppet] - 10https://gerrit.wikimedia.org/r/462590 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [08:11:54] (03CR) 10Alexandros Kosiaris: [C: 04-1] icinga: make icinga user configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/462604 (owner: 10Dzahn) [08:13:13] (03PS34) 10Gehel: prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [08:17:43] 10Operations, 10Traffic, 10HTTPS: WMF servers support ESNI? - https://phabricator.wikimedia.org/T205378 (10Shizhao) [08:18:19] !log mobrovac@deploy1001 Started deploy [restbase/deploy@40b81a8] (dev-cluster): Do not dynamically generate Parsoid content if TID is provided [08:18:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:29] (03PS2) 10Gehel: maps: migrate maps1004 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/459535 (https://phabricator.wikimedia.org/T198622) [08:18:48] (03PS7) 10Petar.petkovic: Remove unused default source language config for CX [mediawiki-config] - 10https://gerrit.wikimedia.org/r/460492 [08:20:07] 10Operations, 10Traffic, 10HTTPS: WMF servers support ESNI? - https://phabricator.wikimedia.org/T205378 (10Krenair) I would think this needs to come from nginx upstream [08:21:11] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@40b81a8] (dev-cluster): Do not dynamically generate Parsoid content if TID is provided (duration: 02m 52s) [08:21:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:27] 10Operations, 10Wikimedia-Mailing-lists, 10User-Urbanecm: Non-working archive for wikimediacz-l list - https://phabricator.wikimedia.org/T205380 (10Urbanecm) [08:22:09] (03CR) 10Gehel: [C: 032] "PPC looks good: https://puppet-compiler.wmflabs.org/compiler1002/12588/" [puppet] - 10https://gerrit.wikimedia.org/r/459535 (https://phabricator.wikimedia.org/T198622) (owner: 10Gehel) [08:23:17] !log upgrading & rebooting dbproxy1004 [08:23:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:03] !log installing libapache2-mod-perl2 security updates [08:24:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:06] !log mobrovac@deploy1001 Started deploy [restbase/deploy@40b81a8]: Do not dynamically generate Parsoid content if TID is provided - T204880 [08:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:14] T204880: RESTBase should 404 if it cannot satisfy requested TID - https://phabricator.wikimedia.org/T204880 [08:34:01] (03PS2) 10Gehel: maps: change partitioning scheme for new SSDs in maps1004 [puppet] - 10https://gerrit.wikimedia.org/r/459536 (https://phabricator.wikimedia.org/T195285) [08:35:16] (03CR) 10Gehel: [C: 032] maps: change partitioning scheme for new SSDs in maps1004 [puppet] - 10https://gerrit.wikimedia.org/r/459536 (https://phabricator.wikimedia.org/T195285) (owner: 10Gehel) [08:35:18] (03Abandoned) 10Giuseppe Lavagetto: mediawiki::web: convert mediawiki.org to mediawiki::vhost [puppet] - 10https://gerrit.wikimedia.org/r/443843 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [08:36:31] 10Operations, 10Wikimedia-Mailing-lists, 10User-Urbanecm: Non-working archive for wikimediacz-l list - https://phabricator.wikimedia.org/T205380 (10Peachey88) First thing to double check is, is it still set to archive in the list admin view? [08:37:05] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@40b81a8]: Do not dynamically generate Parsoid content if TID is provided - T204880 (duration: 12m 00s) [08:37:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:14] T204880: RESTBase should 404 if it cannot satisfy requested TID - https://phabricator.wikimedia.org/T204880 [08:37:31] !log mobrovac@deploy1001 Started deploy [restbase/deploy@40b81a8]: Do not dynamically generate Parsoid content if TID is provided, take #2 [08:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:05] !log installing twitter-bootstrap3 security updates [08:38:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:54] !log upgrade & reboot dbproxy1009 [08:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:06] 10Operations, 10Traffic, 10HTTPS: WMF servers support ESNI? - https://phabricator.wikimedia.org/T205378 (10Aklapper) @Shizhao: Is this a [[ https://www.mediawiki.org/wiki/How_to_report_a_bug | feature request ]]? Currently it looks like a question, and questions can be asked on mailing lists, on IRC, or in f... [08:42:35] (03PS1) 10Jcrespo: mariadb: Add recommendation api database accounts to m2 [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) [08:43:01] (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/12585/mw1261.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/461397 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [08:43:16] (03PS7) 10Giuseppe Lavagetto: mediawiki::web::prod_sites: convert wikisource.org [puppet] - 10https://gerrit.wikimedia.org/r/461397 (https://phabricator.wikimedia.org/T196968) [08:44:00] (03PS2) 10Jcrespo: mariadb: Add recommendation api database accounts to m2 [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) [08:47:26] (03PS3) 10Jcrespo: mariadb: Add recommendation api database accounts to m2 [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) [08:49:50] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@40b81a8]: Do not dynamically generate Parsoid content if TID is provided, take #2 (duration: 12m 19s) [08:49:51] (03PS1) 10Effie Mouzeli: mediawiki: Added redirect for wikisource.gr [puppet] - 10https://gerrit.wikimedia.org/r/462652 (https://phabricator.wikimedia.org/T205077) [08:49:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:28] (03PS5) 10Elukey: Swap analytics100[1,2] with an-master100[1,2] [puppet] - 10https://gerrit.wikimedia.org/r/461979 (https://phabricator.wikimedia.org/T203635) [08:51:27] (03CR) 10Elukey: [C: 032] Swap analytics100[1,2] with an-master100[1,2] [puppet] - 10https://gerrit.wikimedia.org/r/461979 (https://phabricator.wikimedia.org/T203635) (owner: 10Elukey) [08:51:51] (03CR) 10Marostegui: mariadb: Add recommendation api database accounts to m2 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) (owner: 10Jcrespo) [08:52:36] !log installing libtirpc security updates [08:52:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:59] (03CR) 10Jcrespo: [C: 04-1] mariadb: Add recommendation api database accounts to m2 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) (owner: 10Jcrespo) [08:53:08] (03CR) 10Gehel: [C: 031] "LGTM" [software/cumin] - 10https://gerrit.wikimedia.org/r/458357 (owner: 10Volans) [08:53:34] (03CR) 10Volans: [C: 032] Tests: improve naming for SSH key file [software/cumin] - 10https://gerrit.wikimedia.org/r/458357 (owner: 10Volans) [08:54:56] !log upgrade & reboot dbproxy1005 [08:55:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:22] (03PS4) 10Jcrespo: mariadb: Add recommendation api database accounts to m2 [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) [08:56:57] (03Merged) 10jenkins-bot: Tests: improve naming for SSH key file [software/cumin] - 10https://gerrit.wikimedia.org/r/458357 (owner: 10Volans) [08:57:25] (03PS5) 10Jcrespo: mariadb: Add recommendation api database accounts to m2 [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) [08:58:02] (03CR) 10Marostegui: [C: 031] mariadb: Add recommendation api database accounts to m2 [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) (owner: 10Jcrespo) [08:58:16] (03CR) 10Effie Mouzeli: [C: 031] "https://puppet-compiler.wmflabs.org/compiler1002/12590/mw2290.codfw.wmnet/ looks alright" [puppet] - 10https://gerrit.wikimedia.org/r/462652 (https://phabricator.wikimedia.org/T205077) (owner: 10Effie Mouzeli) [08:58:18] (03CR) 10jenkins-bot: Tests: improve naming for SSH key file [software/cumin] - 10https://gerrit.wikimedia.org/r/458357 (owner: 10Volans) [08:58:40] (03CR) 10Jcrespo: [C: 032] mariadb: Add recommendation api database accounts to m2 [puppet] - 10https://gerrit.wikimedia.org/r/462650 (https://phabricator.wikimedia.org/T205294) (owner: 10Jcrespo) [08:59:13] (03PS2) 10Gehel: Remove unused 'style' var from Kartotherian module [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [09:03:50] (03CR) 10Mholloway: "Hmm, what about tilerator_style...?" [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [09:04:07] (03CR) 10Gehel: "puppet compiler looks happy: https://puppet-compiler.wmflabs.org/compiler1002/12591/" [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [09:04:46] (03PS1) 10Muehlenhoff: Add library hint for libtirpc [puppet] - 10https://gerrit.wikimedia.org/r/462654 [09:06:07] (03CR) 10Muehlenhoff: [C: 032] Add library hint for libtirpc [puppet] - 10https://gerrit.wikimedia.org/r/462654 (owner: 10Muehlenhoff) [09:06:56] (03CR) 10Gehel: [C: 04-1] "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [09:09:02] PROBLEM - Hue Server on analytics-tool1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue [09:09:33] this is expected, downtime expired, no big deal --^ [09:10:11] (03PS3) 10Gehel: Remove unused 'style' var from Kartotherian / Tilerator modules [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [09:18:05] !log upgrade and reboot dbproxy1007 [09:18:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:48] 10Operations, 10Wikimedia-Mailing-lists, 10Security, 10Upstream: Implement proper AAA for lists.wikimedia.org (mailman) - https://phabricator.wikimedia.org/T118641 (10MoritzMuehlenhoff) p:05Triage>03Normal [09:19:51] (03CR) 10Filippo Giunchedi: "LGTM overall, see nits inline." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [09:21:45] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Wikimedia-Logstash, 10Patch-For-Review: Rate limit wdqs logs - https://phabricator.wikimedia.org/T204364 (10MoritzMuehlenhoff) a:03Gehel [09:23:20] 10Operations, 10ops-eqiad: Degraded RAID on rdb1004 - https://phabricator.wikimedia.org/T205284 (10MoritzMuehlenhoff) [09:23:22] 10Operations: Degraded RAID on rdb1004 - https://phabricator.wikimedia.org/T205287 (10MoritzMuehlenhoff) [09:23:42] 10Operations, 10ops-eqiad: Degraded RAID on rdb1004 - https://phabricator.wikimedia.org/T205284 (10MoritzMuehlenhoff) p:05Triage>03Low [09:25:38] !log upgrade and reboot dbproxy1008 [09:25:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:21] 10Operations, 10Puppet: Why doesn't profile::mediawiki::nutcracker create /var/run/nutcracker/ ? - https://phabricator.wikimedia.org/T204450 (10fgiunchedi) FWIW this is also a shortcoming of our tmpfiles abstraction in puppet I think, namely we should be notifying in puppet `systemd-tmpfiles --create` when pup... [09:33:25] 10Operations, 10monitoring: Evaluate/integrate rasdaemon as a replacement for mcelog - https://phabricator.wikimedia.org/T205396 (10MoritzMuehlenhoff) [09:33:46] (03CR) 10Gehel: "puppet compiler still looks good: https://puppet-compiler.wmflabs.org/compiler1002/12595/" [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [09:34:45] !log rebooting again dbproxy1004, 1005, 1007, 1009 [09:34:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:42] (03PS2) 10Mathew.onipe: Add elasticsearch [cookbooks] - 10https://gerrit.wikimedia.org/r/462514 (https://phabricator.wikimedia.org/T202885) [09:38:16] !log ladsgroup@mwmaint2001:~$ foreachwikiindblist all populateChangeTagDef.php --set-user-tags-only [09:38:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:20] (03PS4) 10Jcrespo: mariadb: First version of mariadb backup fresness alert [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) [09:40:08] 10Operations, 10Patch-For-Review: mcelog is deprecated in kernel >= 4.12 - https://phabricator.wikimedia.org/T205366 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff I have created https://phabricator.wikimedia.org/T205366 to migrate away from mcelog. In principle we could make the installa... [09:41:20] (03PS5) 10Jcrespo: mariadb: First version of mariadb backup fresness alert [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) [09:41:53] 10Operations, 10monitoring: Evaluate/integrate rasdaemon as a replacement for mcelog - https://phabricator.wikimedia.org/T205396 (10MoritzMuehlenhoff) [09:42:00] 10Operations, 10monitoring: Evaluate/integrate rasdaemon as a replacement for mcelog - https://phabricator.wikimedia.org/T205396 (10MoritzMuehlenhoff) p:05Triage>03Normal [09:42:03] (03PS1) 10Gehel: wdqs: activate throttling of log messages to logstash [puppet] - 10https://gerrit.wikimedia.org/r/462661 (https://phabricator.wikimedia.org/T204364) [09:46:35] 10Operations, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Convert makevm to spicerack cookbook - https://phabricator.wikimedia.org/T203963 (10MoritzMuehlenhoff) p:05Triage>03Normal [09:46:50] 10Operations: SRE quarterly goal: allow MediaWiki requests to be served by PHP7 alongside HHVM - https://phabricator.wikimedia.org/T203959 (10MoritzMuehlenhoff) p:05Triage>03High [09:47:07] 10Operations, 10Release-Engineering-Team, 10Scap: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild - https://phabricator.wikimedia.org/T203625 (10MoritzMuehlenhoff) p:05Triage>03Normal [09:48:35] !log upgrading & rebooting dbproxy1006 [09:48:37] (03PS6) 10Jcrespo: mariadb: First version of mariadb backup fresness alert [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) [09:48:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:20] (03CR) 10Marostegui: mariadb: First version of mariadb backup fresness alert (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [09:54:28] (03CR) 10Gehel: "Already a few comments inline. Note that some of the steps taken in the current script have disappeared." (037 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/462514 (https://phabricator.wikimedia.org/T202885) (owner: 10Mathew.onipe) [09:58:46] (03PS7) 10Jcrespo: mariadb: First version of mariadb backup fresness alert [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) [09:59:06] 10Operations, 10Patch-For-Review: mcelog is deprecated in kernel >= 4.12 - https://phabricator.wikimedia.org/T205366 (10GTirloni) Thanks Moritz, that makes sense. [09:59:55] (03CR) 10Jcrespo: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [10:00:14] PROBLEM - Check systemd state on an-master1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:00:16] (03PS5) 10Giuseppe Lavagetto: mediawiki::web::prod_sites: convert wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/461976 (https://phabricator.wikimedia.org/T196968) [10:00:39] (03CR) 10Alexandros Kosiaris: [C: 031] "Per https://puppet-compiler.wmflabs.org/compiler1002/12597/mw1221.eqiad.wmnet/ +1" [puppet] - 10https://gerrit.wikimedia.org/r/462652 (https://phabricator.wikimedia.org/T205077) (owner: 10Effie Mouzeli) [10:01:30] (03CR) 10Alexandros Kosiaris: [C: 031] "I just saw the comment about PCC already. My PCC run is redundant, but +1 anyway" [puppet] - 10https://gerrit.wikimedia.org/r/462652 (https://phabricator.wikimedia.org/T205077) (owner: 10Effie Mouzeli) [10:01:32] (03CR) 10Marostegui: [C: 031] "Nice! :-)" [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [10:03:07] (03CR) 10Jcrespo: [C: 032] mariadb: First version of mariadb backup fresness alert [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [10:03:53] PROBLEM - Hadoop HDFS Zookeeper failover controller on an-master1001 is CRITICAL: NRPE: Command check_hadoop-hdfs-zkfc not defined [10:04:24] PROBLEM - puppet last run on an-master1001 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 15 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[hdfs_put_mysql-analytics-research-client-pw.txt],Exec[hdfs_put_mysql-analytics-labsdb-client-pw.txt],Package[hadoop-client],Package[libhdfs0] [10:04:53] RECOVERY - Hadoop HDFS Zookeeper failover controller on an-master1001 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.tools.DFSZKFailoverController [10:09:03] RECOVERY - Filesystem available is greater than filesystem size on ms-be2043 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2043&var-datasource=codfw%2520prometheus%252Fops [10:09:24] RECOVERY - puppet last run on an-master1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [10:10:23] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1002/12598/mw1261.eqiad.wmnet/ we have quite some differences here, but most are just due to u" [puppet] - 10https://gerrit.wikimedia.org/r/461976 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto) [10:14:04] (03CR) 10Volans: "I was reviewing it while it was merged, publishing anyway my comments in case they want to be picked for later revisions." (0311 comments) [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [10:16:35] volans: your comments contradict each other "I don't know why you check sections", and then "paramter injection" [10:17:33] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [10:17:34] jynus: as I suggest to not hardcode the sections, it will become mandatory to safely pass the parameters [10:17:37] to the query [10:18:11] see line 51 [10:18:12] and anyway I would suggest to use the proper parameter replacement always, becase maybe in 6 months the upper part of the script will be changed, the check removed and the query not modified [10:18:27] (03CR) 10Mholloway: [C: 031] "Yes, thanks for updating. I only caught the tilerator_style thing because of your first round of edits. :) LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [10:19:34] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [10:22:09] 10Operations, 10Release-Engineering-Team, 10Scap, 10Datacenter-Switchover-2018, 10Wikimedia-Incident: Scap is checking canary servers in dormant instead of active-dc - https://phabricator.wikimedia.org/T204907 (10hashar) [10:26:45] (03CR) 10Jcrespo: [C: 032] "Please backup vulnerability claims with POC. I can demonstrate that my method is actual more secure than prepared statements, which is eas" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/461665 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [10:28:37] 10Operations, 10Cloud-VPS, 10DNS, 10Traffic: Inconsistent lists of labs-ns* nameservers - https://phabricator.wikimedia.org/T205344 (10MoritzMuehlenhoff) p:05Triage>03Normal [10:28:44] jouncebot: next [10:28:44] In 0 hour(s) and 31 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1100) [10:29:21] (03PS5) 10Arturo Borrero Gonzalez: cloudvps: add prometheus-openstack-exporter [puppet] - 10https://gerrit.wikimedia.org/r/462455 (https://phabricator.wikimedia.org/T203177) [10:29:53] RECOVERY - Hue Server on analytics-tool1001 is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue [10:30:00] (03CR) 10jerkins-bot: [V: 04-1] cloudvps: add prometheus-openstack-exporter [puppet] - 10https://gerrit.wikimedia.org/r/462455 (https://phabricator.wikimedia.org/T203177) (owner: 10Arturo Borrero Gonzalez) [10:30:32] PROBLEM - Check systemd state on an-master1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:35:44] (03PS6) 10Arturo Borrero Gonzalez: cloudvps: add prometheus-openstack-exporter [puppet] - 10https://gerrit.wikimedia.org/r/462455 (https://phabricator.wikimedia.org/T203177) [10:36:32] (03CR) 10jerkins-bot: [V: 04-1] cloudvps: add prometheus-openstack-exporter [puppet] - 10https://gerrit.wikimedia.org/r/462455 (https://phabricator.wikimedia.org/T203177) (owner: 10Arturo Borrero Gonzalez) [10:43:57] (03PS7) 10Arturo Borrero Gonzalez: cloudvps: add prometheus-openstack-exporter [puppet] - 10https://gerrit.wikimedia.org/r/462455 (https://phabricator.wikimedia.org/T203177) [10:52:06] (03PS1) 10Muehlenhoff: Install subversion on application servers [puppet] - 10https://gerrit.wikimedia.org/r/462673 (https://phabricator.wikimedia.org/T204801) [10:53:08] !log onimisionipe: starting inplace reindex of eqiad enwiki - T204362 [10:53:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:17] T204362: Resolve elasticsearch shard size alert by doing an in place reindex - https://phabricator.wikimedia.org/T204362 [10:53:57] onimisionipe: in the future you don't need to inculde your username, the !log command will auto list who ever issued it in the channel [10:55:24] Oh...Ok. thanks! [10:57:16] zeljkof, hello :), I'm here [10:59:03] PROBLEM - Hadoop Namenode - Primary on analytics1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.namenode.NameNode [10:59:10] PROBLEM - Hadoop Namenode - Stand By on analytics1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.namenode.NameNode [10:59:10] PROBLEM - At least one Hadoop HDFS NameNode is active on analytics1001 is CRITICAL: Hadoop Active NameNode CRITICAL: no namenodes are active [10:59:22] PROBLEM - Hadoop ResourceManager on analytics1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.resourcemanager.ResourceManager [10:59:24] elukey: that you ^ [10:59:24] PROBLEM - Hadoop HistoryServer on analytics1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer [10:59:32] 10Operations, 10Wikimedia-Mailing-lists: Open Foundation West Africa (OFWA) mailing list - https://phabricator.wikimedia.org/T203966 (10MoritzMuehlenhoff) I've created the mailing list. The list admin password will be sent to you shortly. [10:59:32] PROBLEM - Hive Server on analytics1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hive.service.server.HiveServer2 [10:59:33] <_joe_> I assume so [10:59:36] I believe so, there was planned hdfs maintenance [10:59:39] yeah I think it is expired downtime [10:59:42] expired downtime maybe? [10:59:42] sorry for the noise [10:59:43] PROBLEM - Oozie Server on analytics1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.catalina.startup.Bootstrap [10:59:43] yeah [10:59:48] np [10:59:52] PROBLEM - Hive Metastore on analytics1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hive.metastore.HiveMetaStore [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1100). [11:00:04] raynor: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:30] expired downtime >> forgetting and creating an outage >> an unscheduled outage [11:00:41] * raynor is here and has deployment rights [11:00:51] '>>' meaning "better than" [11:01:00] can I start with SWAT? is there anything going on on servers atm? [11:01:48] there was an alert, but I would let elukey or someone else to greenlight deployments [11:03:15] zeljkof, elukey hashar ^ [11:03:39] please go ahead only noise [11:04:11] sorry for the issues :( [11:04:12] RECOVERY - Hive Metastore on analytics1003 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hive.metastore.HiveMetaStore [11:04:13] \o godog, is there any chance you could check the aggregation of a couple of graphite metrics for me? I'm doing an investigation similar to https://phabricator.wikimedia.org/T199968 [11:04:23] PROBLEM - Filesystem available is greater than filesystem size on ms-be2042 is CRITICAL: cluster=swift device=/dev/sde1 fstype=xfs instance=ms-be2042:9100 job=node mountpoint=/srv/swift-storage/sde1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2042&var-datasource=codfw%2520prometheus%252Fops [11:04:27] np [11:04:45] o/ [11:04:47] godog: the metric is MediaWiki.RevisionSlider.event.load.sum [11:04:55] raynor: I'm around, in case you need help [11:04:56] elukey, I never started the SWAT deployment yet, is there anything I need to log? (I know that I log when I finish) [11:05:03] RECOVERY - Hive Server on analytics1003 is OK: PROCS OK: 1 process with command name java, args org.apache.hive.service.server.HiveServer2 [11:05:14] raynor: jouncebot announces the start, you should log the end :) [11:05:58] awesome, thx, I start with https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/462042/ [11:06:09] addshore: hi, sure, yeah same thing, aggregationMethod: average [11:06:26] (03PS3) 10Pmiazga: Increase sampling ratio for ReadingDepth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462042 (https://phabricator.wikimedia.org/T205176) (owner: 10HaeB) [11:06:28] addshore: I'm off to lunch, bbl [11:07:14] godog: thanks, I'll file a ticket for fixing it so we can track this! [11:08:46] (03CR) 10Pmiazga: [C: 032] Increase sampling ratio for ReadingDepth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462042 (https://phabricator.wikimedia.org/T205176) (owner: 10HaeB) [11:11:02] godog: tracked as https://phabricator.wikimedia.org/T205416 [11:11:08] (03Merged) 10jenkins-bot: Increase sampling ratio for ReadingDepth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462042 (https://phabricator.wikimedia.org/T205176) (owner: 10HaeB) [11:13:02] RECOVERY - Oozie Server on analytics1003 is OK: PROCS OK: 1 process with command name java, args org.apache.catalina.startup.Bootstrap [11:13:35] 10Operations, 10Revision-Slider, 10TCB-Team, 10WMDE-Analytics-Engineering, 10Graphite: Fix aggregation of "MediaWiki.RevisionSlider.event.load.sum" from average to sum - https://phabricator.wikimedia.org/T205416 (10Addshore) [11:13:54] (03Abandoned) 10GTirloni: Do not enable mcelog if kernel >= 4.12 [puppet] - 10https://gerrit.wikimedia.org/r/462613 (https://phabricator.wikimedia.org/T205366) (owner: 10GTirloni) [11:16:27] (03CR) 10jenkins-bot: Increase sampling ratio for ReadingDepth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462042 (https://phabricator.wikimedia.org/T205176) (owner: 10HaeB) [11:19:29] 10Operations, 10Cloud-Services, 10Mail: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10faidon) This is exciting to see materialize, it has been a long time coming :) So, a few different things: - This is of course completely up to the WMCS team, but I'd recommend not using a... [11:20:31] !log pmiazga@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:462042|Increase sampling ratio for ReadingDepth (T205176)]] (duration: 00m 50s) [11:20:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:39] T205176: Increase default sampling ratio of ReadingDepth - https://phabricator.wikimedia.org/T205176 [11:24:42] PROBLEM - Disk space on notebook1003 is CRITICAL: DISK CRITICAL - free space: /srv 505 MB (0% inode=87%) [11:26:02] ^ elukey [11:26:58] ah yes this is not me! [11:27:06] I think it is a user [11:27:15] going to triple check in 2 mins [11:27:52] RECOVERY - Disk space on notebook1003 is OK: DISK OK [11:37:12] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [11:37:52] it takes ages to finish gate-and-submit-swat1 job for the skin ;/ [11:38:28] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, and 2 others: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Deskana) There is a task to try to quantify if the addition of the sitemap a... [11:41:32] (03PS2) 10Jcrespo: backups: Setup new backup check for s1-eqiad on tendril db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [11:41:33] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [11:42:26] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for s1-eqiad on tendril db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [11:42:39] (03PS3) 10Jcrespo: backups: Setup new backup check for s1-eqiad on tendril db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [11:46:43] PROBLEM - swift-object-replicator on ms-be2043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [11:46:53] PROBLEM - swift-object-auditor on ms-be2043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [11:47:03] PROBLEM - swift-object-updater on ms-be2043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [11:47:12] PROBLEM - swift-object-server on ms-be2043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [11:47:16] (03CR) 10Jcrespo: "I think I should move check_section resource to the mariadb module? How to select the sections to check, hiera? Should I use the same than" [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [11:47:27] !log pmiazga@deploy1001 Started scap: php-1.32.0-wmf.22/skins/MinervaNeue/resources/skins.minerva.scripts/pageIssues.js SWAT: [[gerrit:462598|It should be possible to opt into new page issues treatment via query string parameter (T204746)]] [11:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:35] T204746: It should be possible to opt into new page issues treatment via query string parameter - https://phabricator.wikimedia.org/T204746 [11:49:55] (03PS1) 10Elukey: Swap occurrences of analytics1002 with an-master1002 [puppet] - 10https://gerrit.wikimedia.org/r/462684 (https://phabricator.wikimedia.org/T203635) [11:50:18] (03CR) 10Elukey: [C: 032] Swap occurrences of analytics1002 with an-master1002 [puppet] - 10https://gerrit.wikimedia.org/r/462684 (https://phabricator.wikimedia.org/T203635) (owner: 10Elukey) [11:50:25] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2018), 10Goal, 10User-Johan: Community Relations support for the 2018 data center switchover - https://phabricator.wikimedia.org/T199676 (10Qgil) [11:53:04] (03PS4) 10Jcrespo: backups: Setup new backup check for s1-eqiad on tendril db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [11:53:31] I'm still swatting, merging and syncing is pretty long ;( [11:59:22] 10Operations, 10Maps, 10Maps-Sprint, 10Reading-Infrastructure-Team-Backlog, 10Patch-For-Review: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` [... [11:59:59] 10Operations, 10Maps, 10Maps-Sprint, 10Reading-Infrastructure-Team-Backlog, 10Patch-For-Review: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` [... [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1200) [12:01:04] !log end of the maintenance to swap Hadoop masters from analytics100[1,2] to an-master100[1,2] - T203635 [12:01:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:12] T203635: Replace the Analytics HDFS/Yarn masters (hardware refresh) - https://phabricator.wikimedia.org/T203635 [12:01:18] Hadoop cluster is back and working [12:01:45] I am going to clean up the old masters in a few hours (just as precaution) [12:05:06] !log pmiazga@deploy1001 Finished scap: php-1.32.0-wmf.22/skins/MinervaNeue/resources/skins.minerva.scripts/pageIssues.js SWAT: [[gerrit:462598|It should be possible to opt into new page issues treatment via query string parameter (T204746)]] (duration: 17m 39s) [12:05:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:14] T204746: It should be possible to opt into new page issues treatment via query string parameter - https://phabricator.wikimedia.org/T204746 [12:07:14] (03CR) 10Gehel: prometheus/elasticsearch support multiple exporters per host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [12:14:42] elukey, zeljkof - I'm almost done, last question, can I sync the whole folder [12:14:46] or do I have to go file by file? [12:14:57] https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/462658/ [12:15:08] raynor: you _have to_ sync the entire folder [12:15:11] it's the rule [12:15:16] ok, thx [12:15:36] the entire patch has to be synced at once, to avoid any problems of a file needing another file [12:15:39] (03PS1) 10BBlack: eeden: remove from authdns set, reimage to jessie test [puppet] - 10https://gerrit.wikimedia.org/r/462687 [12:16:07] (03PS2) 10BBlack: eeden: remove from authdns set, reimage to jessie test [puppet] - 10https://gerrit.wikimedia.org/r/462687 (https://phabricator.wikimedia.org/T156208) [12:16:30] (03PS35) 10Gehel: prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [12:17:16] 10Operations, 10Wikimedia-Mailing-lists, 10User-Urbanecm: Non-working archive for wikimediacz-l list - https://phabricator.wikimedia.org/T205380 (10Urbanecm) I see the following (it's in Czech, I have no idea how to change the interface's language): {F26196582} There is "Yes" ("Ano") next to "Archive posts... [12:17:36] !log pmiazga@deploy1001 Started scap: php-1.32.0-wmf.22/skins/MinervaNeue/includes SWAT: [[gerrit:462658|Minerva A/B tests are not subject to HTML caching time (T205355)]] [12:17:40] (03CR) 10Gehel: prometheus/elasticsearch support multiple exporters per host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [12:17:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:44] T205355: A/B config flag should be subject to ResourceLoader caching rules not HTML caching rules - https://phabricator.wikimedia.org/T205355 [12:18:02] just out of curiosity, why skin/extension sync takes so much time to process? [12:18:18] config sync goes almost immediately, the skin takes like 20mins [12:19:07] (03CR) 10MSantos: [C: 031] Remove unused 'style' var from Kartotherian / Tilerator modules [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [12:19:20] (03PS5) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [12:20:10] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [12:23:23] (03PS6) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [12:24:04] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [12:26:02] (03PS7) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [12:26:14] (03CR) 10BBlack: [C: 032] eeden: remove from authdns set, reimage to jessie test [puppet] - 10https://gerrit.wikimedia.org/r/462687 (https://phabricator.wikimedia.org/T156208) (owner: 10BBlack) [12:26:36] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [12:27:15] (03PS8) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [12:27:59] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [12:31:59] RECOVERY - swift-object-updater on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:32:00] RECOVERY - swift-object-server on ms-be2043 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:33:03] (03PS9) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [12:34:24] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [12:34:25] !log repair sde on ms-be2042 - T199198 [12:34:27] !log pmiazga@deploy1001 Finished scap: php-1.32.0-wmf.22/skins/MinervaNeue/includes SWAT: [[gerrit:462658|Minerva A/B tests are not subject to HTML caching time (T205355)]] (duration: 16m 51s) [12:34:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:33] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [12:34:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:40] T205355: A/B config flag should be subject to ResourceLoader caching rules not HTML caching rules - https://phabricator.wikimedia.org/T205355 [12:34:50] RECOVERY - swift-object-auditor on ms-be2043 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:36:39] RECOVERY - swift-object-replicator on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:36:57] !log EU SWAT finished [12:37:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:09] sorry for making this SWAT window so long, again [12:41:09] PROBLEM - cache_text: Varnishkafka Webrequest Delivery Errors per second on einsteinium is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [5.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=webrequest&var-host=All [12:41:29] (03PS1) 10Reedy: Add a place to link Education Program extension sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/462691 (https://phabricator.wikimedia.org/T174802) [12:41:34] checking vk [12:42:50] !log installing chromium security updates on proton* (tested the new Chromium version in deployment-prep) [12:42:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:19] RECOVERY - cache_text: Varnishkafka Webrequest Delivery Errors per second on einsteinium is OK: OK: Less than 1.00% above the threshold [1.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=webrequest&var-host=All [12:49:28] (03PS1) 10BBlack: Switch CAA records to proper RR format [dns] - 10https://gerrit.wikimedia.org/r/462693 [12:49:41] (03CR) 10jerkins-bot: [V: 04-1] Switch CAA records to proper RR format [dns] - 10https://gerrit.wikimedia.org/r/462693 (owner: 10BBlack) [12:51:46] !log rebooting mw1221-mw1239 for kernel security updates [12:51:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:54] (03PS10) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [12:54:32] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [12:56:18] (03PS11) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [12:57:09] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [12:59:01] So the failure in: https://integration.wikimedia.org/ci/job/operations-dns-lint/5558/console is because the authdns job runs on (a) a jessie node rather than stretch and/or (b) in either case might need a package update for the gdnsd package. Is it possible to require stretch? [13:00:05] zeljkof: That opportune time is upon us again. Time for a MediaWiki train - European version deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1300). [13:00:26] (03PS12) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [13:01:10] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [13:02:09] RECOVERY - cassandra CQL 10.64.48.154:9042 on maps1004 is OK: TCP OK - 0.002 second response time on 10.64.48.154 port 9042 [13:04:29] RECOVERY - Filesystem available is greater than filesystem size on ms-be2042 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2042&var-datasource=codfw%2520prometheus%252Fops [13:06:21] bblack: slightly related, see also T191764 (there is a mention about stretch CI) [13:06:22] T191764: CI: run tests with multiple Python3 versions - https://phabricator.wikimedia.org/T191764 [13:06:42] o/ [13:07:07] !log Deploy schema change on s5 eqiad - this might generate lag on s5 eqiad - T203709 [13:07:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:15] T203709: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 [13:07:21] no blockers for train T191069 [13:07:22] T191069: 1.32.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T191069 [13:07:50] 10Operations, 10RESTBase, 10Availability, 10Performance, and 5 others: RFC: Request timeouts and retries - https://phabricator.wikimedia.org/T97204 (10tstarling) [13:07:56] 10Operations, 10MediaWiki-API, 10Availability, 10HHVM, and 6 others: HHVM request timeouts not working; support lowering the API request timeout per request - https://phabricator.wikimedia.org/T97192 (10tstarling) 05Open>03Resolved [13:08:16] (03PS13) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [13:08:58] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [13:09:34] 10Operations, 10Cloud-Services, 10Mail: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10Krenair) >>! In T41785#4614434, @faidon wrote: > - This is of course completely up to the WMCS team, but I'd recommend not using a per-project domain name, but a global one, as these are go... [13:10:34] Krenair: ack, thanks :) [13:13:26] !log Deploy schema change on s6 eqiad - this might generate lag on s6 eqiad - T203709 [13:13:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:34] T203709: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 [13:14:39] 10Operations, 10Fundraising-Backlog, 10Traffic, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561 (10Jgreen) @CCogdill_WMF @BBlack there's no change as far as I can see in Trilogy's SSL rating according to Qualys, still a B with the main issues being... [13:17:47] (03PS1) 10Elukey: profile::prometheus::alert: increase eventlogging critical threshold [puppet] - 10https://gerrit.wikimedia.org/r/462700 [13:18:18] (03PS14) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [13:19:19] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [13:20:20] (03CR) 10Elukey: [C: 032] profile::prometheus::alert: increase eventlogging critical threshold [puppet] - 10https://gerrit.wikimedia.org/r/462700 (owner: 10Elukey) [13:20:25] (03PS3) 10Alex Monk: Compatibility with new flask version [software/certcentral] - 10https://gerrit.wikimedia.org/r/459841 [13:20:28] (03PS10) 10Alex Monk: Remove maximum version for acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/459866 [13:20:43] (03PS4) 10Alex Monk: [WIP] Detect when cert config changes and re-issue [software/certcentral] - 10https://gerrit.wikimedia.org/r/460382 [13:20:54] (03PS2) 10Alex Monk: [WIP] Check for outdated/expired certs in the main loop [software/certcentral] - 10https://gerrit.wikimedia.org/r/460397 [13:21:20] (03PS15) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [13:22:01] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [13:22:44] (03CR) 10Vgutierrez: [C: 04-1] Compatibility with new flask version (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/459841 (owner: 10Alex Monk) [13:22:50] (03PS1) 10Gehel: maps: migrate maps1004 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/462702 (https://phabricator.wikimedia.org/T198622) [13:24:13] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Check for outdated/expired certs in the main loop [software/certcentral] - 10https://gerrit.wikimedia.org/r/460397 (owner: 10Alex Monk) [13:25:03] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received [13:25:21] (03CR) 10Bstorm: "Let's use other/educationprogram/ as the link here. It will cut down on the symlinking." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/462691 (https://phabricator.wikimedia.org/T174802) (owner: 10Reedy) [13:25:38] (03CR) 10Gehel: [C: 032] maps: migrate maps1004 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/462702 (https://phabricator.wikimedia.org/T198622) (owner: 10Gehel) [13:25:49] (03CR) 10Reedy: "Fine by me! Will amend" [puppet] - 10https://gerrit.wikimedia.org/r/462691 (https://phabricator.wikimedia.org/T174802) (owner: 10Reedy) [13:25:53] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [13:26:18] (03PS4) 10Alex Monk: Compatibility with new flask version [software/certcentral] - 10https://gerrit.wikimedia.org/r/459841 [13:26:36] (03PS2) 10Reedy: Add a place to link Education Program extension sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/462691 (https://phabricator.wikimedia.org/T174802) [13:27:00] 10Operations, 10Fundraising-Backlog, 10Traffic, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561 (10BBlack) I would choose not to proceed with a vendor who cares so little about security. The "Weak DH" issue, in particular, made security headlines... [13:27:03] (03PS16) 10Jcrespo: backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) [13:27:57] (03CR) 10jerkins-bot: [V: 04-1] backups: Setup new backup check for all sections on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [13:29:51] !log Deploy schema change on s2 eqiad - this might generate lag on s2 eqiad - T203709 [13:29:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:59] T203709: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 [13:30:46] 10Operations, 10Maps, 10Maps-Sprint, 10Reading-Infrastructure-Team-Backlog, 10Patch-For-Review: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` [... [13:30:58] (03CR) 10Jcrespo: "I don't understand what jenkins actually complains about, https://puppet-compiler.wmflabs.org/compiler1002/12604/db1115.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [13:31:17] (03PS4) 10Gehel: Remove unused 'style' var from Kartotherian / Tilerator modules [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [13:32:25] (03CR) 10Gehel: [C: 032] Remove unused 'style' var from Kartotherian / Tilerator modules [puppet] - 10https://gerrit.wikimedia.org/r/462340 (https://phabricator.wikimedia.org/T195328) (owner: 10Mholloway) [13:35:06] (03CR) 10Vgutierrez: [C: 031] Compatibility with new flask version [software/certcentral] - 10https://gerrit.wikimedia.org/r/459841 (owner: 10Alex Monk) [13:35:26] (03CR) 10Effie Mouzeli: [C: 032] mediawiki: Added redirect for wikisource.gr [puppet] - 10https://gerrit.wikimedia.org/r/462652 (https://phabricator.wikimedia.org/T205077) (owner: 10Effie Mouzeli) [13:35:46] (03PS2) 10Effie Mouzeli: mediawiki: Added redirect for wikisource.gr [puppet] - 10https://gerrit.wikimedia.org/r/462652 (https://phabricator.wikimedia.org/T205077) [13:37:26] (03CR) 10Vgutierrez: "Nice catch. I'd like to add a regression test to avoid falling into the same issue in the future." [software/certcentral] - 10https://gerrit.wikimedia.org/r/459662 (owner: 10Alex Monk) [13:39:33] (03CR) 10Filippo Giunchedi: "> Patch Set 16:" [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [13:40:10] (03CR) 10Jcrespo: "> > Patch Set 16:" [puppet] - 10https://gerrit.wikimedia.org/r/461679 (https://phabricator.wikimedia.org/T203969) (owner: 10Jcrespo) [13:42:11] (03CR) 10Vgutierrez: [C: 031] setup.py test dependencies: Remove pylint maximum version [software/certcentral] - 10https://gerrit.wikimedia.org/r/459811 (owner: 10Alex Monk) [13:42:55] (03CR) 10Vgutierrez: [C: 031] Log command we run for DNS zone updates [software/certcentral] - 10https://gerrit.wikimedia.org/r/459799 (owner: 10Alex Monk) [13:43:29] jynus: the output I was looking at is the from jenkins' failure message on gerrit, https://integration.wikimedia.org/ci/job/operations-puppet-tests-docker/28705/console [13:44:30] yes, I had reached there already, but still don't understand it [13:44:45] I am not using global variables there, but local ones [13:44:47] (03PS2) 10Ottomata: Scrape Kafka jmx exporters in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/462567 (https://phabricator.wikimedia.org/T204088) [13:45:10] (03CR) 10Ottomata: [V: 032 C: 032] Scrape Kafka jmx exporters in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/462567 (https://phabricator.wikimedia.org/T204088) (owner: 10Ottomata) [13:45:30] (03CR) 10Vgutierrez: [C: 031] Be a lot more verbose about problems in the ACME process [software/certcentral] - 10https://gerrit.wikimedia.org/r/459798 (owner: 10Alex Monk) [13:46:53] 10Operations, 10ops-codfw: ms-be2030 spontaneous reboot - https://phabricator.wikimedia.org/T204567 (10Papaul) a:05Papaul>03fgiunchedi @fgiunchedi Since 9-20-18 after making the power settings change, I have been monitoring the server; so far the server has been up with no reset. Is it possible to put back... [13:47:23] (03PS1) 10Zfilipin: Group0 to 1.32.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462709 [13:48:16] (03CR) 10Filippo Giunchedi: prometheus/elasticsearch support multiple exporters per host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [13:49:25] jynus: the variables in the title are singular, but in the class declaration are plural [13:49:56] godog: they are different variables! [13:50:23] ugh, indeed, not sure what's wrong [13:50:41] so you understad my question ,now, it is ok if you don't have the answer! :-D [13:51:12] maybe the linter is not puppet-future-parser proof or I don't know what I am doing [13:51:38] 10Operations, 10Traffic, 10Continuous-Integration-Config: CI jobs for authdns linting need to run on Stretch - https://phabricator.wikimedia.org/T205439 (10BBlack) p:05Triage>03Normal [13:52:56] I will ask some of the puppet masters when they have time [13:55:28] 10Operations, 10ops-codfw: ms-be2030 spontaneous reboot - https://phabricator.wikimedia.org/T204567 (10fgiunchedi) a:05fgiunchedi>03Papaul Thanks @Papaul, the server has been in production the whole time and indeed no power resets so far. [13:56:13] (03PS1) 10Marostegui: db-eqiad.php: Depool db1098:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462711 [13:57:01] (03CR) 10Elukey: "Couple of nits, the overall changes looks good even if I don't have a huge context :)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/460417 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [13:57:44] 10Operations, 10Maps-Sprint, 10Maps (Tilerator), 10Reading-Infrastructure-Team-Backlog (Kanban): investigate tilerator crash on maps eqiad - https://phabricator.wikimedia.org/T204047 (10MSantos) Tilerator is receiving an `AccessShareLock` error from postgresql while `populate_admin` script is running. ``... [13:58:07] (03PS1) 10MSantos: Chaging populate admin cron [puppet] - 10https://gerrit.wikimedia.org/r/462712 (https://phabricator.wikimedia.org/T204047) [13:59:17] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1098:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462711 (owner: 10Marostegui) [13:59:30] (03CR) 10Vgutierrez: "I'm clearly missing something here, on a self signed certificate it doesn't make any sense to generate a chain only PEM or a chained PEM f" [software/certcentral] - 10https://gerrit.wikimedia.org/r/458939 (owner: 10Alex Monk) [14:00:30] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1098:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462711 (owner: 10Marostegui) [14:00:36] (03PS1) 10Effie Mouzeli: mediawiki: Added redirect for wikimedia.gr [puppet] - 10https://gerrit.wikimedia.org/r/462713 (https://phabricator.wikimedia.org/T205077) [14:00:45] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1098:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462711 (owner: 10Marostegui) [14:01:38] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [14:02:17] !log zfilipin@deploy1001 Pruned MediaWiki: 1.32.0-wmf.16 (duration: 08m 27s) [14:02:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:01] (03CR) 10Alex Monk: "The point of the self-signed certificates is to get webservers working with basic functionality before we've got a fully trusted one (for " [software/certcentral] - 10https://gerrit.wikimedia.org/r/458939 (owner: 10Alex Monk) [14:03:21] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1098:3316 (duration: 00m 50s) [14:03:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:38] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [14:05:20] !log Stop db1088 and db1098:3316 in sync [14:05:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:40] (03CR) 10Effie Mouzeli: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/12605/mwdebug1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/462713 (https://phabricator.wikimedia.org/T205077) (owner: 10Effie Mouzeli) [14:10:35] !log Upgrade db1088 [14:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:25] !log zfilipin@deploy1001 Pruned MediaWiki: 1.32.0-wmf.18 (duration: 03m 02s) [14:12:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:13] (03CR) 10Vgutierrez: [C: 031] "right, I feel we should document that 0 bytes PEM size files are expected to be generated during this step, otherwise LGTM" [software/certcentral] - 10https://gerrit.wikimedia.org/r/458939 (owner: 10Alex Monk) [14:14:12] godog: if you have a min, can you help Krenair and I with prometheus exporter stuff in beta? [14:15:23] !log zfilipin@deploy1001 Pruned MediaWiki: 1.32.0-wmf.19 [keeping static files] (duration: 01m 29s) [14:15:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:45] ottomata: in meetings for ~1.5h though I'll read the backscroll [14:16:41] godog: ok we were in analytics [14:16:57] but basically, we've got it all applied in deployment-prep, but query_resources isn't finding anything [14:17:07] not sure if $site is the problem (i believe it shouldn't be, site should be eqiad there too) [14:17:22] query_resources shoudl work though, since they are using it to collect ssh keys currently [14:17:23] (03PS1) 10Effie Mouzeli: mediawiki: tabs vs spaces in redirects.dat [puppet] - 10https://gerrit.wikimedia.org/r/462717 (https://phabricator.wikimedia.org/T205077) [14:18:04] and we aren't really sure how to test, apart from making puppet patches and notfify {} [14:18:14] !log zfilipin@deploy1001 Started scap: testwiki to php-1.32.0-wmf.23 and rebuild l10n cache [14:18:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:28] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [14:21:29] jijiki: is that your change? ^^^ [14:22:05] probably [14:22:18] hangon [14:23:34] (03PS3) 10Bstorm: Add a place to link Education Program extension sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/462691 (https://phabricator.wikimedia.org/T174802) (owner: 10Reedy) [14:23:41] (03PS1) 10Marostegui: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462719 [14:25:01] (03CR) 10Bstorm: [C: 032] Add a place to link Education Program extension sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/462691 (https://phabricator.wikimedia.org/T174802) (owner: 10Reedy) [14:25:52] 10Operations, 10Scap, 10Datacenter-Switchover-2018, 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: Scap is checking canary servers in dormant instead of active-dc - https://phabricator.wikimedia.org/T204907 (10greg) (I assume SRE will do the adding to conftool and the editing/ext... [14:26:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462719 (owner: 10Marostegui) [14:27:21] (03PS1) 10Elukey: Set analytics100[1,2] to role spare system [puppet] - 10https://gerrit.wikimedia.org/r/462720 (https://phabricator.wikimedia.org/T203635) [14:27:24] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462719 (owner: 10Marostegui) [14:28:06] zeljkof: there's an unstaged change in deploy1001 [14:28:16] zeljkof: modified: wikiversions.json [14:30:17] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462719 (owner: 10Marostegui) [14:30:35] marostegui: just pushing to testwiki, will revert in a few minutes https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Sync_to_cluster_and_verify_on_testwiki [14:30:49] sync-apaches at 59% [14:30:51] ok! [14:31:05] zeljkof: forgot the train was now :( [14:31:17] 🚂 [14:31:33] it's in the calendar ;) [14:31:59] zeljkof: I know, I just didn't realise what time it was [14:32:05] (03PS8) 10Petar.petkovic: Remove unused default source language config for CX [mediawiki-config] - 10https://gerrit.wikimedia.org/r/460492 [14:32:19] 10Operations, 10Maps, 10Maps-Sprint, 10Reading-Infrastructure-Team-Backlog, 10Patch-For-Review: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['maps1004.eqiad.wmnet'] ``` and were **ALL** suc... [14:36:27] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [14:36:50] (03PS1) 10Jcrespo: mariadb backup monitoring: Add size checks [puppet] - 10https://gerrit.wikimedia.org/r/462724 (https://phabricator.wikimedia.org/T203969) [14:38:52] !log rebooting mw1240-mw1258 for kernel security updates [14:38:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:04] !log zfilipin@deploy1001 Finished scap: testwiki to php-1.32.0-wmf.23 and rebuild l10n cache (duration: 20m 50s) [14:39:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:19] (03PS2) 10Effie Mouzeli: mediawiki: tabs vs spaces in redirects.dat [puppet] - 10https://gerrit.wikimedia.org/r/462717 (https://phabricator.wikimedia.org/T205077) [14:43:33] (03PS1) 10Bstorm: dumps: add initial slash to url path [puppet] - 10https://gerrit.wikimedia.org/r/462726 (https://phabricator.wikimedia.org/T174802) [14:44:20] (03CR) 10Effie Mouzeli: [C: 032] mediawiki: tabs vs spaces in redirects.dat [puppet] - 10https://gerrit.wikimedia.org/r/462717 (https://phabricator.wikimedia.org/T205077) (owner: 10Effie Mouzeli) [14:44:27] (03CR) 10Bstorm: [C: 032] dumps: add initial slash to url path [puppet] - 10https://gerrit.wikimedia.org/r/462726 (https://phabricator.wikimedia.org/T174802) (owner: 10Bstorm) [14:44:39] (03PS3) 10Effie Mouzeli: mediawiki: tabs vs spaces in redirects.dat [puppet] - 10https://gerrit.wikimedia.org/r/462717 (https://phabricator.wikimedia.org/T205077) [14:44:45] <_joe_> moritzm, zeljkof I suggest you two coordinate a bit [14:45:00] <_joe_> scap and mw reboots don't work well at the same time :P [14:45:28] I just wanted to ask if something is wrong, testwiki takes forever to load [14:49:27] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 (duration: 00m 57s) [14:49:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:42] !log Stop MySQL on db1098 s6 and s7 for upgrade [14:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:02] zeljkof: those servers should all have come back up beforre the deploy started, or did you ran into any unreachable servers? [14:52:14] moritzm: I didn't notice anything, but I wasn't looking at the output all the time (it takes 10-20 minutes), but I'm sure scap would scream and shout if something was wrong [14:54:01] (03PS1) 10Tarrow: Invalidate wikidatawiki cache with wgCacheEpoch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462728 (https://phabricator.wikimedia.org/T205330) [14:55:35] jouncebot: now [14:55:36] For the next 0 hour(s) and 4 minute(s): MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1300) [14:55:38] jouncebot: next [14:55:38] In 1 hour(s) and 4 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1600) [14:56:39] (03CR) 10Zfilipin: [C: 032] "train 🚂" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462709 (owner: 10Zfilipin) [14:56:56] Reedy: finishing train 🚂 [14:57:08] :) [14:57:19] zeljkof: ack [14:59:30] 10Operations, 10Cloud-Services, 10Mail: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10herron) >>! In T41785#4614826, @Krenair wrote: >>>! In T41785#4614434, @faidon wrote: >> - This is of course completely up to the WMCS team, but I'd recommend not using a per-project domain... [14:59:43] (03Merged) 10jenkins-bot: Group0 to 1.32.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462709 (owner: 10Zfilipin) [14:59:47] 10Operations, 10ops-codfw: ms-be2030 spontaneous reboot - https://phabricator.wikimedia.org/T204567 (10Papaul) @fgiunchedi thank you will leave this task open until the end of the week. [15:02:24] !log zfilipin@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.32.0-wmf.23 [15:02:29] (03PS2) 10Gehel: Chaging populate admin cron [puppet] - 10https://gerrit.wikimedia.org/r/462712 (https://phabricator.wikimedia.org/T204047) (owner: 10MSantos) [15:02:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:20] (03CR) 10Gehel: [C: 032] Chaging populate admin cron [puppet] - 10https://gerrit.wikimedia.org/r/462712 (https://phabricator.wikimedia.org/T204047) (owner: 10MSantos) [15:05:28] (03PS6) 10Thcipriani: Replace "wikimedia-polygerrit-style" plugin with gerrit-theme [puppet] - 10https://gerrit.wikimedia.org/r/458523 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [15:05:41] (03PS1) 10Giuseppe Lavagetto: mediawiki::web::vhost: add variant-aliases ProxyPassMatch [puppet] - 10https://gerrit.wikimedia.org/r/462729 [15:05:51] (03PS1) 10Effie Mouzeli: Added zone wikipedia.gr [dns] - 10https://gerrit.wikimedia.org/r/462730 [15:06:18] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::web::vhost: add variant-aliases ProxyPassMatch [puppet] - 10https://gerrit.wikimedia.org/r/462729 (owner: 10Giuseppe Lavagetto) [15:06:33] <_joe_> zeljkof: I need your help on ^^ if you have a minute [15:06:55] Reedy: done with the trains for today [15:06:55] <_joe_> ah scratch that, damn [15:07:03] _joe_: what's up? [15:07:06] <_joe_> zeljkof: nevermind, PEBKAC [15:07:10] :D [15:07:10] heh [15:07:16] zeljkof: thanks! [15:08:14] (03PS3) 10Reedy: Disable EducationProgram everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447945 (https://phabricator.wikimedia.org/T125618) [15:08:32] (03CR) 10Reedy: [C: 032] Disable EducationProgram everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447945 (https://phabricator.wikimedia.org/T125618) (owner: 10Reedy) [15:08:40] (03PS2) 10Giuseppe Lavagetto: mediawiki::web::vhost: add variant-aliases ProxyPassMatch [puppet] - 10https://gerrit.wikimedia.org/r/462729 [15:09:21] (03CR) 10jerkins-bot: [V: 04-1] Disable EducationProgram everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447945 (https://phabricator.wikimedia.org/T125618) (owner: 10Reedy) [15:09:23] (03CR) 10jenkins-bot: Group0 to 1.32.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462709 (owner: 10Zfilipin) [15:09:48] (03PS1) 10Jcrespo: mariadb: Add alternative, read-only account for recomm. api db [puppet] - 10https://gerrit.wikimedia.org/r/462731 (https://phabricator.wikimedia.org/T205294) [15:09:50] (03CR) 10Reedy: [C: 032] Disable EducationProgram everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447945 (https://phabricator.wikimedia.org/T125618) (owner: 10Reedy) [15:10:06] (03PS2) 10Jcrespo: mariadb: Add alternative, read-only account for recomm. api db [puppet] - 10https://gerrit.wikimedia.org/r/462731 (https://phabricator.wikimedia.org/T205294) [15:11:44] (03Merged) 10jenkins-bot: Disable EducationProgram everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447945 (https://phabricator.wikimedia.org/T125618) (owner: 10Reedy) [15:12:57] (03CR) 10Marostegui: [C: 031] mariadb: Add alternative, read-only account for recomm. api db [puppet] - 10https://gerrit.wikimedia.org/r/462731 (https://phabricator.wikimedia.org/T205294) (owner: 10Jcrespo) [15:14:35] !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Turn off EducationProgram T188411 T125618 (duration: 00m 57s) [15:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:14:44] T125618: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 [15:14:45] T188411: Remove the EducationProgram extension from individual wikis - https://phabricator.wikimedia.org/T188411 [15:14:54] (03CR) 10Jcrespo: [C: 032] mariadb: Add alternative, read-only account for recomm. api db [puppet] - 10https://gerrit.wikimedia.org/r/462731 (https://phabricator.wikimedia.org/T205294) (owner: 10Jcrespo) [15:15:21] (03PS2) 10Zoranzoki21: Enable VisualEditor in Project namespace on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462191 (https://phabricator.wikimedia.org/T205206) [15:15:29] (03PS2) 10Tarrow: Invalidate wikidatawiki cache with wgCacheEpoch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462728 (https://phabricator.wikimedia.org/T205330) [15:16:49] (03CR) 10Tarrow: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462728 (https://phabricator.wikimedia.org/T205330) (owner: 10Tarrow) [15:17:40] (03PS3) 10Zoranzoki21: Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) [15:25:30] (03CR) 10jenkins-bot: Disable EducationProgram everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447945 (https://phabricator.wikimedia.org/T125618) (owner: 10Reedy) [15:33:41] (03PS3) 10Giuseppe Lavagetto: mediawiki::web::vhost: add variant-aliases ProxyPassMatch [puppet] - 10https://gerrit.wikimedia.org/r/462729 [15:38:16] PROBLEM - Device not healthy -SMART- on ms-be2027 is CRITICAL: cluster=swift device=cciss,13 instance=ms-be2027:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2027&var-datasource=codfw%2520prometheus%252Fops [15:44:11] 10Operations, 10DBA, 10Research, 10Services (designing): Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10bmansurov) 05Open>03Resolved a:03bmansurov @Pchelolo the database has been setup (T205294). I think this task is complete as far as storage is concerned. I'll... [15:46:45] \o Reedy, should i be worried about bumping the wgCacheEpoch of wikidata.org? :P [15:47:00] Depends how close to today you're bumping it to :P [15:47:07] 10Operations, 10DBA, 10Research, 10Services (designing): Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10jcrespo) [15:48:41] addshore: Less than a week... [15:51:56] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: move/setup/install frauth2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T204079 (10Jgreen) a:05Jgreen>03cwdent [15:51:59] Reedy: indeed [15:52:02] Reedy: thoughts? [15:52:13] Make sure SRE are aware for starters [15:52:15] Perf team too [15:52:41] I could mail to ops? whats the process / is that the process? [15:52:45] No idea if it'll be enough to cause problems... But it's definitely gonna cause some more load [15:53:32] Should work for a heads up [15:53:49] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler1002/12609/ noop" [puppet] - 10https://gerrit.wikimedia.org/r/462729 (owner: 10Giuseppe Lavagetto) [15:54:12] the last time it was bumped was https://github.com/wikimedia/operations-mediawiki-config/commit/b6039f66bc2cf5d81ca36be57e464ecefce267d3? O_o naaah, that cant be right [15:54:31] addshore: check IS [15:54:48] * addshore is trying to find the commit, it used to be in Wikibase.php [15:54:49] 'wgCacheEpoch' => [ [15:54:50] 'default' => '20130601000000', [15:54:50] 'wikidatawiki' => '20170724130500', [15:54:50] 'testwikidatawiki' => '20170724130500', [15:54:50] ], [15:55:00] I'm guessing around July or August 2017 [15:56:55] (03PS1) 10Thcipriani: Update l10nupdate-1 PHP version to match scap [puppet] - 10https://gerrit.wikimedia.org/r/462748 (https://phabricator.wikimedia.org/T205313) [15:58:17] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::web::vhost: add variant-aliases ProxyPassMatch [puppet] - 10https://gerrit.wikimedia.org/r/462729 (owner: 10Giuseppe Lavagetto) [16:00:01] Reedy: https://github.com/wikimedia/operations-mediawiki-config/commit/ff904e18f61f52dc3e86ab76850e14e49ab15e7d [16:00:04] godog and _joe_: That opportune time is upon us again. Time for a Puppet SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1600). [16:00:04] thcipriani: A patch you scheduled for Puppet SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:16] o/ [16:00:25] <_joe_> thcipriani: you need to wait a couple minutes [16:00:28] lol [16:00:29] <_joe_> then I'm all yours [16:00:40] sure thing [16:00:52] <_joe_> well maybe 5-6 :P [16:01:04] Reedy: last time it was actually bumped on the day, to the value of the day :) [16:01:09] Scary [16:01:19] The size and traffic of wikidata has increased since [16:01:30] <_joe_> also wikidata is killing memcached already [16:01:35] I still think telling SRE and perf is a good prerequisite [16:01:42] <_joe_> there is an open ticket about that I think [16:01:52] Reedy: yes, especially as Krinkle nearly reverted that one too :P https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/367853/ [16:01:53] <_joe_> yes, write to ops@ with some estimates [16:02:01] * addshore will look into that, thanks all! [16:03:54] <_joe_> thcipriani: I'm almost there [16:04:25] ack, no rush [16:05:47] <_joe_> thcipriani: ok let's start with https://gerrit.wikimedia.org/r/c/operations/puppet/+/458523 [16:05:57] sounds good [16:06:08] <_joe_> can you +1 it? [16:06:14] (03CR) 10Gehel: prometheus/elasticsearch support multiple exporters per host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [16:06:17] <_joe_> since you're not the author [16:06:28] (03CR) 10Thcipriani: [C: 031] "Moves already existing logic, lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/458523 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [16:06:43] (03CR) 10Thcipriani: [C: 031] "Working for me in test" [puppet] - 10https://gerrit.wikimedia.org/r/458833 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [16:08:02] (03CR) 10Giuseppe Lavagetto: [C: 032] Replace "wikimedia-polygerrit-style" plugin with gerrit-theme [puppet] - 10https://gerrit.wikimedia.org/r/458523 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [16:08:09] (03PS7) 10Giuseppe Lavagetto: Replace "wikimedia-polygerrit-style" plugin with gerrit-theme [puppet] - 10https://gerrit.wikimedia.org/r/458523 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [16:08:18] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Replace "wikimedia-polygerrit-style" plugin with gerrit-theme [puppet] - 10https://gerrit.wikimedia.org/r/458523 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [16:08:54] (03CR) 10Thcipriani: [V: 032 C: 032] Remove wikimedia-polygerrit-style.html plugin [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/458524 (owner: 10Paladox) [16:09:25] who broke Commons ? Getting PHP timeout errors when heading for MassDelete [16:09:48] <_joe_> the train happened earlier [16:09:56] <_joe_> NotASpy: open a ticket I guess [16:10:06] <_joe_> thcipriani: zeljkof ^^ [16:10:07] train didn't hit commons yet [16:10:11] <_joe_> oh ok [16:10:19] https://tools.wmflabs.org/versions/ [16:10:27] commons is group1 [16:11:02] <_joe_> NotASpy: a timeout of 60 seconds per request was introduced some time ago [16:11:05] <_joe_> like weeks ago [16:11:11] <_joe_> maybe you're hitting that? [16:11:33] yeah, it's Special:Nuke that seems to be hitting that limit now. Was only trying to delete two files in one go due to extreme lazyness. [16:12:07] <_joe_> NotASpy: definitely open a ticket if you have time [16:12:12] VFC works so used that. I'll see if it fixes itself in an hour, if not, I'll check Phab and file a bug. [16:12:25] <_joe_> thanks <3 [16:12:40] <_joe_> thcipriani: I ran puppet on cobalt but nothing happened [16:12:47] <_joe_> isn't cobalt the gerrit server? [16:13:03] <_joe_> in fact, that file is not referenced anywhere [16:13:13] <_joe_> so yeah tha change didn't really do anything [16:13:28] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users - https://phabricator.wikimedia.org/T204790 (10Groceryheist) 05Resolved>03Open I still don't have access to SWAP. I understand that I ne... [16:13:48] _joe_: the folder is referenced.../me digs [16:13:59] (03CR) 10Cwhite: [C: 032] monitoring: set mode on host and service configs [puppet] - 10https://gerrit.wikimedia.org/r/462024 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [16:14:08] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users - https://phabricator.wikimedia.org/T204790 (10Reedy) [16:14:09] (03PS5) 10Cwhite: monitoring: set mode on host and service configs [puppet] - 10https://gerrit.wikimedia.org/r/462024 (https://phabricator.wikimedia.org/T202782) [16:14:24] (03PS4) 10Giuseppe Lavagetto: Gerrit: Add footer link for CoC and Privacy Policy [puppet] - 10https://gerrit.wikimedia.org/r/458833 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [16:15:02] (03PS2) 10Elukey: Set analytics100[1,2] to role spare system [puppet] - 10https://gerrit.wikimedia.org/r/462720 (https://phabricator.wikimedia.org/T203635) [16:15:31] <_joe_> jenkins is making me wait [16:15:36] <_joe_> like forever [16:16:36] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/gerrit/manifests/jetty.pp#131 [16:17:43] <_joe_> thcipriani: you also need recurse => true IIRC [16:17:51] <_joe_> let me check though [16:18:32] (03PS5) 10Giuseppe Lavagetto: Gerrit: Add footer link for CoC and Privacy Policy [puppet] - 10https://gerrit.wikimedia.org/r/458833 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [16:18:41] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Gerrit: Add footer link for CoC and Privacy Policy [puppet] - 10https://gerrit.wikimedia.org/r/458833 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [16:19:27] (03PS3) 10Elukey: Set analytics100[1,2] to role spare system [puppet] - 10https://gerrit.wikimedia.org/r/462720 (https://phabricator.wikimedia.org/T203635) [16:19:28] (03PS1) 10Bstorm: Change mgmt dns from labstore1008/9 to cloudstore1008/9 [dns] - 10https://gerrit.wikimedia.org/r/462755 (https://phabricator.wikimedia.org/T193655) [16:19:55] _joe_: we have it set to "remote" since we don't want to purge that directory of unmanaged files but we do want to use source [16:19:59] <_joe_> ok who merged my change? [16:20:11] NotASpy, _joe_, thcipriani: huh, I was really worried for a minute :D [16:20:18] <_joe_> shdubsh: did you puppet-merge my change too? [16:20:32] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users - https://phabricator.wikimedia.org/T204790 (10Groceryheist) 05Open>03Resolved a:03Groceryheist Created task https://phabricator.wikime... [16:21:55] _joe_: yes, that bad? [16:22:06] 10Operations, 10Recommendation-API, 10Research, 10SCB, 10Services (next): Setup access from service to mysql - https://phabricator.wikimedia.org/T205452 (10mobrovac) The first step would be to add the user/pass combo to the private puppet repo (ping @Joe @fgiunchedi cna you help out for this step?). Afte... [16:22:07] <_joe_> not this time, but in general, better to ask [16:22:22] 10Operations, 10Recommendation-API, 10Research, 10SCB, 10Services (next): Setup access from service to mysql - https://phabricator.wikimedia.org/T205452 (10mobrovac) p:05Triage>03Normal [16:22:30] it looked pretty innocuous. I'll ask next time :) [16:22:35] <_joe_> at least, I wouldn't have been confused :P [16:23:38] <_joe_> thcipriani: ok your change had no effect [16:24:36] those 2 touched the same file, so if the first one didn't change anything neither would the 2nd, but I still don't quite get why (recurse => 'remote' and all) [16:24:39] <_joe_> I need to understand what, but frankly I need a break before trying to understand it [16:24:41] (03CR) 10Volans: [C: 031] "LGTM but having a new version of gdnsd just been deployed last week and being this a new zone check it with Brandon before merging." [dns] - 10https://gerrit.wikimedia.org/r/462730 (owner: 10Effie Mouzeli) [16:25:10] <_joe_> thcipriani: I'll get back to you in ~ 20 minutes, ok? [16:25:18] _joe_: sure, thank you! [16:25:23] <_joe_> or am I missing another patch? [16:25:26] <_joe_> let me check [16:26:16] <_joe_> thcipriani: should we merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/460914 too? [16:27:01] _joe_: yes please, that one is for the old UI [16:27:12] (03PS2) 10Giuseppe Lavagetto: Gerrit: Add CoC and Privacy Policy to old UI [puppet] - 10https://gerrit.wikimedia.org/r/460914 (https://phabricator.wikimedia.org/T196835) (owner: 10Thcipriani) [16:27:20] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Gerrit: Add CoC and Privacy Policy to old UI [puppet] - 10https://gerrit.wikimedia.org/r/460914 (https://phabricator.wikimedia.org/T196835) (owner: 10Thcipriani) [16:28:02] (03CR) 10Elukey: [C: 032] Set analytics100[1,2] to role spare system [puppet] - 10https://gerrit.wikimedia.org/r/462720 (https://phabricator.wikimedia.org/T203635) (owner: 10Elukey) [16:28:10] (03PS4) 10Elukey: Set analytics100[1,2] to role spare system [puppet] - 10https://gerrit.wikimedia.org/r/462720 (https://phabricator.wikimedia.org/T203635) [16:28:12] (03CR) 10Elukey: [V: 032 C: 032] Set analytics100[1,2] to role spare system [puppet] - 10https://gerrit.wikimedia.org/r/462720 (https://phabricator.wikimedia.org/T203635) (owner: 10Elukey) [16:28:27] 10Operations, 10Recommendation-API, 10Research, 10SCB, 10Services (next): Setup access from service to mysql - https://phabricator.wikimedia.org/T205452 (10jcrespo) > the user/pass combo to the private puppet repo That is already done. For the firewall, I need to know the source (mysql client) ips. [16:28:46] relies on the same directory declaration, so may also have no effect in a puppet run :\ [16:29:00] it should :) [16:29:05] <_joe_> thcipriani: oh got the issue [16:29:07] <_joe_> I think [16:29:13] (03PS2) 10Cwhite: icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (owner: 10Dzahn) [16:29:37] <_joe_> https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/gerrit/manifests/jetty.pp#134 [16:30:07] (03CR) 10jerkins-bot: [V: 04-1] icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (owner: 10Dzahn) [16:30:22] <_joe_> you don't do recurse => remote there [16:30:28] <_joe_> so I think puppet doesn't manage it [16:30:38] <_joe_> anyways, I'll test that hypothesis later [16:30:48] (03CR) 10Filippo Giunchedi: [C: 031] prometheus/elasticsearch support multiple exporters per host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [16:31:58] (03PS36) 10Gehel: prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [16:32:39] (03PS1) 10Elukey: camus: set proper number of consumers for Webrequest [puppet] - 10https://gerrit.wikimedia.org/r/462761 (https://phabricator.wikimedia.org/T200822) [16:33:17] (03CR) 10Gehel: [C: 032] prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [16:33:32] (03CR) 10Elukey: [C: 032] camus: set proper number of consumers for Webrequest [puppet] - 10https://gerrit.wikimedia.org/r/462761 (https://phabricator.wikimedia.org/T200822) (owner: 10Elukey) [16:33:38] (03PS2) 10Elukey: camus: set proper number of consumers for Webrequest [puppet] - 10https://gerrit.wikimedia.org/r/462761 (https://phabricator.wikimedia.org/T200822) [16:33:48] (03CR) 10Cwhite: [C: 031] icinga: remove icinga::group class [puppet] - 10https://gerrit.wikimedia.org/r/462590 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [16:33:57] (03PS3) 10Cwhite: icinga: remove icinga::group class [puppet] - 10https://gerrit.wikimedia.org/r/462590 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [16:34:07] (03CR) 10Elukey: [V: 032 C: 032] camus: set proper number of consumers for Webrequest [puppet] - 10https://gerrit.wikimedia.org/r/462761 (https://phabricator.wikimedia.org/T200822) (owner: 10Elukey) [16:37:06] PROBLEM - Check systemd state on elastic1035 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:37:09] (03CR) 10Ottomata: [C: 031] camus: set proper number of consumers for Webrequest [puppet] - 10https://gerrit.wikimedia.org/r/462761 (https://phabricator.wikimedia.org/T200822) (owner: 10Elukey) [16:37:17] PROBLEM - Check systemd state on relforge1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:37:40] damn, there's a bunch of errors coming up from icinga, nothing too bad, I'm on it! [16:38:07] PROBLEM - Check systemd state on elastic1027 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:38:16] PROBLEM - Check systemd state on elastic1039 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:38:17] RECOVERY - Device not healthy -SMART- on ms-be2027 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2027&var-datasource=codfw%2520prometheus%252Fops [16:38:27] PROBLEM - Check systemd state on elastic2010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:38:27] PROBLEM - Check systemd state on logstash1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:38:36] PROBLEM - Check systemd state on elastic2024 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:38:59] (03CR) 10Cwhite: "I think this change should be merged with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/462604/" [puppet] - 10https://gerrit.wikimedia.org/r/462593 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [16:39:17] gehel: need help? [16:39:24] ottomata: did you figure the prometheus + exported resources thing out? [16:39:46] PROBLEM - Check systemd state on elastic1028 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:39:56] PROBLEM - Check systemd state on elastic2014 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:39:57] volans: silencing icinga if it gets too noisy, fix coming up in a few minutes and nothing bad breaking (loosing a few metrics) [16:40:07] PROBLEM - Check correctness of the icinga configuration on einsteinium is CRITICAL: Icinga configuration contains errors [16:40:13] gehel: ack [16:40:16] PROBLEM - Check systemd state on elastic1046 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:40:17] PROBLEM - Check systemd state on elastic1029 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:40:23] here as well. is einsteinium related? [16:40:28] (03CR) 10Cwhite: "The systemd service is managed by the package. Is it necessary to manage here?" [puppet] - 10https://gerrit.wikimedia.org/r/462600 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [16:40:29] I'm checking [16:40:45] herron, volans: einsteinium should not be related [16:40:46] PROBLEM - Check systemd state on elastic1052 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:40:47] Error: Invalid host object directive 'mode'. [16:40:53] (03CR) 10Cwhite: [C: 031] icinga: remove nsca::firewall class, use ferm::service in profile [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [16:40:56] PROBLEM - Check systemd state on elastic1032 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:41:00] shdubsh: your change? Error: Invalid host object directive 'mode'. [16:41:12] 10Operations, 10Gadgets, 10MediaWiki-Cache, 10Performance-Team, and 2 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Krinkle) a:05Krinkle>03None [16:41:16] PROBLEM - Check systemd state on elastic2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:41:16] PROBLEM - Check systemd state on elastic2008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:41:22] 10Operations, 10Gadgets, 10MediaWiki-Cache, 10Performance-Team (Radar), and 2 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Krinkle) [16:41:36] Puppet failures? [16:42:02] shdubsh: no, icinga config broken [16:42:07] PROBLEM - Check systemd state on elastic1042 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:42:08] the mode parameter made it to the host{} define [16:42:17] in /etc/icinga/puppet_hosts.cfg [16:42:37] 10Operations, 10Gadgets, 10MediaWiki-Cache, 10Performance-Team (Radar), and 2 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Krinkle) Un-assigning for now as this doesn't appear actionable for me. I'll keep an... [16:42:47] PROBLEM - Check systemd state on elastic2026 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:42:47] shdubsh: I'd say revert for now [16:43:16] PROBLEM - Check systemd state on elastic2015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:43:37] PROBLEM - Check systemd state on elastic1034 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:43:40] (03PS1) 10Gehel: elasticsearch: fix systemd unit for prometheus elasticsearch exporter [puppet] - 10https://gerrit.wikimedia.org/r/462765 [16:43:48] shdubsh: there is a useful button on Gerrit for that ;) [16:44:07] PROBLEM - Check systemd state on elastic2034 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:44:26] PROBLEM - Check systemd state on elastic2031 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:44:27] PROBLEM - Check systemd state on elastic1037 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:44:28] (03CR) 10Gehel: [C: 032] elasticsearch: fix systemd unit for prometheus elasticsearch exporter [puppet] - 10https://gerrit.wikimedia.org/r/462765 (owner: 10Gehel) [16:44:56] (03PS1) 10Cwhite: Revert "monitoring: set mode on host and service configs" [puppet] - 10https://gerrit.wikimedia.org/r/462766 [16:45:07] PROBLEM - Check systemd state on elastic1044 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:45:17] (03CR) 10Cwhite: [C: 032] Revert "monitoring: set mode on host and service configs" [puppet] - 10https://gerrit.wikimedia.org/r/462766 (owner: 10Cwhite) [16:45:25] (03PS2) 10Cwhite: Revert "monitoring: set mode on host and service configs" [puppet] - 10https://gerrit.wikimedia.org/r/462766 [16:45:29] 10Operations, 10ops-eqiad: Degraded RAID on rdb1004 - https://phabricator.wikimedia.org/T205284 (10Cmjohnson) The disks are hardware raided, 500GB SATA, The server is out of warranty but I have spares on-site that I can replace, it looks like it's the SATA disk in slot 3 . . [16:45:41] (03CR) 10Cwhite: [V: 032 C: 032] Revert "monitoring: set mode on host and service configs" [puppet] - 10https://gerrit.wikimedia.org/r/462766 (owner: 10Cwhite) [16:45:56] PROBLEM - Check systemd state on elastic1049 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:46:00] reverted [16:46:07] RECOVERY - Check systemd state on relforge1002 is OK: OK - running: The system is fully operational [16:46:07] PROBLEM - Check systemd state on elastic2027 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:46:11] volans: fixed, elasticsearch should start recovering [16:46:16] PROBLEM - Check systemd state on elastic2025 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:47:26] PROBLEM - Check systemd state on elastic1023 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:47:47] PROBLEM - Check systemd state on elastic2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:48:27] RECOVERY - Check systemd state on elastic2025 is OK: OK - running: The system is fully operational [16:49:38] <_joe_> thcipriani: confirmed, the culprit is that one [16:51:01] <_joe_> if you declare a subpath of the one you declare with recurse => remote, it gets ignored even if it's on disk [16:52:11] (03CR) 10Addshore: [C: 04-1] "pending a small discussion to see if we are scared about this" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462728 (https://phabricator.wikimedia.org/T205330) (owner: 10Tarrow) [16:52:16] <_joe_> actually, even if you declare recurse => true and add a source [16:52:25] (03PS1) 10Gehel: elasticsearch: expose prometheus on the correct port [puppet] - 10https://gerrit.wikimedia.org/r/462768 [16:52:26] <_joe_> it also makes sense [16:52:32] <_joe_> ok, I'm logging off now [16:52:35] _joe_: interesting, thanks for the troubleshooting, it does make sense [16:52:37] (03PS2) 10Gehel: elasticsearch: expose prometheus on the correct port [puppet] - 10https://gerrit.wikimedia.org/r/462768 [16:52:56] <_joe_> you can submit another patch if you want, but beware of possible consequences [16:52:56] RECOVERY - Check systemd state on logstash1006 is OK: OK - running: The system is fully operational [16:53:03] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: expose prometheus on the correct port [puppet] - 10https://gerrit.wikimedia.org/r/462768 (owner: 10Gehel) [16:53:22] <_joe_> ping me tomorrow and we can take a look, hopefully [16:53:31] _joe_: ok, will do, thanks again [16:54:10] <_joe_> gehel: it mioght be a good idea to force a puppet run [16:54:22] <_joe_> so that we reduce the time we are without metrics [16:55:00] _joe_: that's my plan, but I just realized that my fix is ofc wrong [16:55:28] we also have a problem with icinga config, that will take some time to recover (exported resources) [16:55:33] (03PS3) 10Gehel: elasticsearch: expose prometheus on the correct port [puppet] - 10https://gerrit.wikimedia.org/r/462768 [16:56:34] volans: not related to my CR right? [16:56:47] (03CR) 10Gehel: [C: 032] elasticsearch: expose prometheus on the correct port [puppet] - 10https://gerrit.wikimedia.org/r/462768 (owner: 10Gehel) [16:56:56] gehel: no [16:57:08] volans: kool, thanks! [16:57:24] (03CR) 10Cmjohnson: [C: 032] Change mgmt dns from labstore1008/9 to cloudstore1008/9 [dns] - 10https://gerrit.wikimedia.org/r/462755 (https://phabricator.wikimedia.org/T193655) (owner: 10Bstorm) [16:57:46] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:58:46] PROBLEM - puppet last run on maps2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:58:56] 10Operations, 10Traffic: gdnsd plugin support for ACME DNS challenges - https://phabricator.wikimedia.org/T194965 (10BBlack) 05Open>03Resolved Beta releases of gdnsd (supporting this new feature) have been stable on our production authdns since mid-last-week. The code hasn't been released officially as gd... [16:58:58] 10Operations, 10Traffic, 10Patch-For-Review: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962 (10BBlack) [16:59:09] !log forcing puppet run on all elastic nodes (including logstash) to recover prometheus metric exporter [16:59:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:26] (03PS1) 10Arturo Borrero Gonzalez: mariadb: fix typo in template max_allowed_packet [puppet] - 10https://gerrit.wikimedia.org/r/462769 [16:59:42] gehel: batch it ;) [16:59:56] RECOVERY - Check systemd state on elastic2003 is OK: OK - running: The system is fully operational [16:59:57] RECOVERY - Check systemd state on elastic2002 is OK: OK - running: The system is fully operational [17:00:04] cscott, arlolra, subbu, halfak, and Amir1: #bothumor I � Unicode. All rise for Services – Graphoid / Parsoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1700). [17:00:09] volans: what could go wrong with 80 puppet run in parallel? [17:00:16] no parsoid deploy today [17:00:22] killing puppetmasters ;) [17:00:41] (03PS1) 10Thcipriani: Revert "Gerrit: Add missing resource /var/lib/gerrit2/review_site" [puppet] - 10https://gerrit.wikimedia.org/r/462770 [17:01:27] RECOVERY - Check systemd state on elastic2026 is OK: OK - running: The system is fully operational [17:01:37] RECOVERY - Check systemd state on elastic1028 is OK: OK - running: The system is fully operational [17:01:46] (03PS2) 10Thcipriani: Revert "Gerrit: Add missing resource /var/lib/gerrit2/review_site" [puppet] - 10https://gerrit.wikimedia.org/r/462770 (https://phabricator.wikimedia.org/T196835) [17:01:56] RECOVERY - Check systemd state on elastic2014 is OK: OK - running: The system is fully operational [17:02:45] (03PS2) 10Arturo Borrero Gonzalez: mariadb: fix typo in template max_allowed_packet [puppet] - 10https://gerrit.wikimedia.org/r/462769 [17:03:47] RECOVERY - Check systemd state on elastic1044 is OK: OK - running: The system is fully operational [17:03:47] RECOVERY - puppet last run on maps2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:04:27] RECOVERY - Check systemd state on elastic1049 is OK: OK - running: The system is fully operational [17:04:35] (03CR) 10Jdlrobson: [C: 04-1] "This should not be swatted until the patch is in production (Thursday)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462573 (https://phabricator.wikimedia.org/T202306) (owner: 10Jdlrobson) [17:04:46] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on einsteinium is CRITICAL: 59.47 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:05:17] RECOVERY - Check systemd state on elastic2031 is OK: OK - running: The system is fully operational [17:05:25] (03CR) 10Zhuyifei1999: "Thanks :)" [puppet] - 10https://gerrit.wikimedia.org/r/462769 (owner: 10Arturo Borrero Gonzalez) [17:05:37] RECOVERY - Check systemd state on elastic1027 is OK: OK - running: The system is fully operational [17:05:37] RECOVERY - Check systemd state on elastic1035 is OK: OK - running: The system is fully operational [17:05:47] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on einsteinium is OK: (C)60 le (W)70 le 73.25 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:05:57] RECOVERY - Check systemd state on elastic1023 is OK: OK - running: The system is fully operational [17:05:57] RECOVERY - Check systemd state on elastic2010 is OK: OK - running: The system is fully operational [17:06:07] RECOVERY - Check systemd state on elastic2024 is OK: OK - running: The system is fully operational [17:06:17] RECOVERY - Check systemd state on elastic1042 is OK: OK - running: The system is fully operational [17:06:38] (03CR) 10Paladox: "Need to update the other places too." [puppet] - 10https://gerrit.wikimedia.org/r/462770 (https://phabricator.wikimedia.org/T196835) (owner: 10Thcipriani) [17:06:56] RECOVERY - Check systemd state on elastic1039 is OK: OK - running: The system is fully operational [17:07:36] RECOVERY - Check systemd state on elastic1037 is OK: OK - running: The system is fully operational [17:08:06] (03CR) 10Paladox: [C: 031] "Once you fix ^^, it works for me." [puppet] - 10https://gerrit.wikimedia.org/r/462770 (https://phabricator.wikimedia.org/T196835) (owner: 10Thcipriani) [17:08:56] RECOVERY - Check systemd state on elastic1046 is OK: OK - running: The system is fully operational [17:08:56] RECOVERY - Check systemd state on elastic1029 is OK: OK - running: The system is fully operational [17:09:17] RECOVERY - Check systemd state on elastic2027 is OK: OK - running: The system is fully operational [17:09:26] RECOVERY - Check systemd state on elastic1052 is OK: OK - running: The system is fully operational [17:09:36] RECOVERY - Check systemd state on elastic1032 is OK: OK - running: The system is fully operational [17:09:57] RECOVERY - Check systemd state on elastic2008 is OK: OK - running: The system is fully operational [17:10:17] (03PS3) 10Thcipriani: Revert "Gerrit: Add missing resource /var/lib/gerrit2/review_site" [puppet] - 10https://gerrit.wikimedia.org/r/462770 (https://phabricator.wikimedia.org/T196835) [17:10:47] (03PS1) 10Ayounsi: Depool ulsfo for DC move [dns] - 10https://gerrit.wikimedia.org/r/462771 [17:10:56] RECOVERY - Check systemd state on elastic2015 is OK: OK - running: The system is fully operational [17:11:06] (03CR) 10Paladox: [C: 031] Revert "Gerrit: Add missing resource /var/lib/gerrit2/review_site" [puppet] - 10https://gerrit.wikimedia.org/r/462770 (https://phabricator.wikimedia.org/T196835) (owner: 10Thcipriani) [17:11:16] RECOVERY - Check systemd state on elastic1034 is OK: OK - running: The system is fully operational [17:11:47] RECOVERY - Check systemd state on elastic2034 is OK: OK - running: The system is fully operational [17:11:48] (03CR) 10Ayounsi: [C: 032] Depool ulsfo for DC move [dns] - 10https://gerrit.wikimedia.org/r/462771 (owner: 10Ayounsi) [17:12:28] !log depool ulsfo for DC move [17:12:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:01] (03CR) 10Ottomata: Refactor refine_job to use new spark_job and ConfigHelper properties (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/460417 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [17:15:12] (03PS1) 10Cmjohnson: Adding production dns for cloudstore100[89] [dns] - 10https://gerrit.wikimedia.org/r/462773 (https://phabricator.wikimedia.org/T193655) [17:15:42] 10Operations, 10Core-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle) [17:15:48] (03PS2) 10Volans: Add cumin1001 IPs and PTRs [dns] - 10https://gerrit.wikimedia.org/r/462274 (https://phabricator.wikimedia.org/T201346) [17:16:55] (03PS3) 10Volans: cumin: installation of cumin1001 [puppet] - 10https://gerrit.wikimedia.org/r/462278 (https://phabricator.wikimedia.org/T201346) [17:18:27] (03CR) 10Arturo Borrero Gonzalez: [C: 031] "LGTM" [dns] - 10https://gerrit.wikimedia.org/r/462773 (https://phabricator.wikimedia.org/T193655) (owner: 10Cmjohnson) [17:19:04] (03PS2) 10Cmjohnson: Adding production dns for cloudstore100[89] [dns] - 10https://gerrit.wikimedia.org/r/462773 (https://phabricator.wikimedia.org/T193655) [17:21:46] PROBLEM - Varnish traffic drop between 30min ago and now at ulsfo on einsteinium is CRITICAL: 59.24 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:23:27] (03CR) 10Volans: [C: 032] Add cumin1001 IPs and PTRs [dns] - 10https://gerrit.wikimedia.org/r/462274 (https://phabricator.wikimedia.org/T201346) (owner: 10Volans) [17:26:03] (03CR) 10Volans: [C: 032] cumin: installation of cumin1001 [puppet] - 10https://gerrit.wikimedia.org/r/462278 (https://phabricator.wikimedia.org/T201346) (owner: 10Volans) [17:26:04] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Cmjohnson) a:05Cmjohnson>03None please let me know the partman recipe you want current labstore1006/7 is dumps... [17:28:21] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:29:12] RECOVERY - Check correctness of the icinga configuration on einsteinium is OK: Icinga configuration is correct [17:33:30] 10Operations, 10ops-eqiad, 10Operations-Software-Development, 10Patch-For-Review: rack/setup/install clustermgmt1001.eqiad.wmnet (new cumin master) - https://phabricator.wikimedia.org/T201346 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by volans on cumin2001.codfw.wmnet for hosts: ``` cumin... [17:33:53] (03PS1) 10Bstorm: cloudstore: Change the names of labstore1008/9 to cloudstore1008/9 [puppet] - 10https://gerrit.wikimedia.org/r/462776 (https://phabricator.wikimedia.org/T193655) [17:34:33] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Bstorm) It's already there, just under the wrong names. Fixing that. [17:34:48] (03PS7) 10Ottomata: Refactor refine_job to use new spark_job and ConfigHelper properties [puppet] - 10https://gerrit.wikimedia.org/r/460417 (https://phabricator.wikimedia.org/T203804) [17:35:55] (03PS2) 10Bstorm: cloudstore: Change the names of labstore1008/9 to cloudstore1008/9 [puppet] - 10https://gerrit.wikimedia.org/r/462776 (https://phabricator.wikimedia.org/T193655) [17:39:34] !log otto@deploy1001 Started deploy [analytics/refinery@ce8f0b3]: Deploying refinery-source 0.0.75 for ConfigHelper Refine - T203804 [17:39:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:42] T203804: Refactor Refine job scalaopt to use property files and CLI overrides - https://phabricator.wikimedia.org/T203804 [17:40:32] (03PS3) 10Bstorm: cloudstore: Change the names of labstore1008/9 to cloudstore1008/9 [puppet] - 10https://gerrit.wikimedia.org/r/462776 (https://phabricator.wikimedia.org/T193655) [17:42:00] (03CR) 10Bstorm: [C: 032] cloudstore: Change the names of labstore1008/9 to cloudstore1008/9 [puppet] - 10https://gerrit.wikimedia.org/r/462776 (https://phabricator.wikimedia.org/T193655) (owner: 10Bstorm) [17:43:08] (03CR) 10Cmjohnson: [C: 032] Adding production dns for cloudstore100[89] [dns] - 10https://gerrit.wikimedia.org/r/462773 (https://phabricator.wikimedia.org/T193655) (owner: 10Cmjohnson) [17:43:14] (03PS3) 10Cmjohnson: Adding production dns for cloudstore100[89] [dns] - 10https://gerrit.wikimedia.org/r/462773 (https://phabricator.wikimedia.org/T193655) [17:48:39] 10Operations, 10Datacenter-Switchover-2018, 10Discovery-Search (Current work), 10Patch-For-Review: Warn when CirrusSearch is not configured to use local DC for an extended time - https://phabricator.wikimedia.org/T204135 (10TJones) [17:49:55] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:49:57] !log otto@deploy1001 Finished deploy [analytics/refinery@ce8f0b3]: Deploying refinery-source 0.0.75 for ConfigHelper Refine - T203804 (duration: 10m 22s) [17:50:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:04] T203804: Refactor Refine job scalaopt to use property files and CLI overrides - https://phabricator.wikimedia.org/T203804 [17:50:52] (03PS8) 10Ottomata: Refactor refine_job to use new spark_job and ConfigHelper properties [puppet] - 10https://gerrit.wikimedia.org/r/460417 (https://phabricator.wikimedia.org/T203804) [17:51:55] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:53:19] (03PS9) 10Ottomata: Refactor refine_job to use new spark_job and ConfigHelper properties [puppet] - 10https://gerrit.wikimedia.org/r/460417 (https://phabricator.wikimedia.org/T203804) [17:55:17] (03CR) 10Ottomata: [C: 032] Refactor refine_job to use new spark_job and ConfigHelper properties [puppet] - 10https://gerrit.wikimedia.org/r/460417 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [17:57:56] (03PS1) 10Ottomata: Fix rebase conflict error in refine.pp [puppet] - 10https://gerrit.wikimedia.org/r/462782 (https://phabricator.wikimedia.org/T203804) [17:58:40] (03CR) 10Ottomata: [V: 032 C: 032] Fix rebase conflict error in refine.pp [puppet] - 10https://gerrit.wikimedia.org/r/462782 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [18:01:13] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Bstorm) So at this point, we've got the naming solid (and have added prod DNS). However, we still have the problem... [18:01:16] (03PS4) 10Andrew Bogott: keystone: Create top-level domain for each new project [puppet] - 10https://gerrit.wikimedia.org/r/375089 (https://phabricator.wikimedia.org/T162977) (owner: 10Alex Monk) [18:02:00] (03CR) 10jerkins-bot: [V: 04-1] keystone: Create top-level domain for each new project [puppet] - 10https://gerrit.wikimedia.org/r/375089 (https://phabricator.wikimedia.org/T162977) (owner: 10Alex Monk) [18:04:12] (03PS5) 10Andrew Bogott: keystone: Create top-level domain for each new project [puppet] - 10https://gerrit.wikimedia.org/r/375089 (https://phabricator.wikimedia.org/T162977) (owner: 10Alex Monk) [18:05:04] (03PS6) 10Andrew Bogott: keystone: Create top-level domain for each new project [puppet] - 10https://gerrit.wikimedia.org/r/375089 (https://phabricator.wikimedia.org/T162977) (owner: 10Alex Monk) [18:07:37] (03PS1) 10Ottomata: Sort refine job config properties keys [puppet] - 10https://gerrit.wikimedia.org/r/462786 (https://phabricator.wikimedia.org/T203804) [18:08:33] 10Operations, 10ops-eqiad, 10Operations-Software-Development, 10Patch-For-Review: rack/setup/install clustermgmt1001.eqiad.wmnet (new cumin master) - https://phabricator.wikimedia.org/T201346 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cumin1001.eqiad.wmnet'] ``` and were **ALL** success... [18:09:35] banyek, jijiki, onimisionipe, shdubsh, gtirloni: just finieshed ^^^ [18:10:10] *mic drop* [18:10:43] :D [18:11:00] Nice! [18:11:04] volans: nice! thanks for the training [18:11:39] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Cmjohnson) @bstorm cloudstore1008 and 1009 were in the wrong vlans on the switch port. I updated the ports. you sho... [18:14:07] volans: is that recorded? I don't remember attending any training session :-P [18:17:04] arturo: long story, so yeah is something new we're trying. At the last offsite it was decided to improve our onboarding in general [18:17:55] AFAIK we're not recording them and that is on purpose to emphasize on the interactivity of the sessions and Q&A, but I'm just a presenter :) [18:18:22] ok fair enough :-) but recording can be good anyway [18:19:55] (03CR) 10Ottomata: [C: 032] Sort refine job config properties keys [puppet] - 10https://gerrit.wikimedia.org/r/462786 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [18:20:00] arturo: you might want to ping guill.aume and/or jo.el for that [18:20:13] ack [18:20:50] 10Operations, 10Puppet, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Niedzielski) @Ottomata, @fgiunchedi hello! We're still f... [18:20:50] sensors say my fan is really fast: fan1: 65535 RPM [18:20:59] but that number is suspicious :) [18:21:12] and each time i boot i have to bypass a fan error , heh [18:23:30] 10Operations, 10Cloud-Services, 10Mail: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10Krenair) >>! In T41785#4615256, @herron wrote: >>>! In T41785#4614826, @Krenair wrote: >>>>! In T41785#4614434, @faidon wrote: >>> - This is of course completely up to the WMCS team, but I'... [18:27:41] (03PS4) 10Dzahn: icinga: remove icinga::group class [puppet] - 10https://gerrit.wikimedia.org/r/462590 (https://phabricator.wikimedia.org/T202782) [18:28:05] (03CR) 10Dzahn: [C: 032] icinga: remove icinga::group class [puppet] - 10https://gerrit.wikimedia.org/r/462590 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [18:29:35] (03CR) 10Smalyshev: [C: 031] wdqs: activate throttling of log messages to logstash [puppet] - 10https://gerrit.wikimedia.org/r/462661 (https://phabricator.wikimedia.org/T204364) (owner: 10Gehel) [18:29:43] 10Operations, 10Puppet, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Ottomata) They should be. Something is not working thoug... [18:29:46] RECOVERY - Varnish traffic drop between 30min ago and now at ulsfo on einsteinium is OK: (C)60 le (W)70 le 77.86 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [18:30:25] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [18:30:42] ottomata: ^ i just got the "multiple changes" warning [18:30:54] merging yours too? [18:31:02] oh yes [18:31:04] sorry thank mutante [18:31:23] no problem, done [18:31:34] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [18:33:24] volans, still busy? [18:37:52] (03PS1) 10Ottomata: Use --files to upload job properties file for refine spark job [puppet] - 10https://gerrit.wikimedia.org/r/462789 (https://phabricator.wikimedia.org/T203804) [18:38:55] 10Operations, 10Puppet, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Ottomata) BTW, I updated https://wikitech.wikimedia.org/w... [18:39:09] (03CR) 10Ottomata: [C: 032] Use --files to upload job properties file for refine spark job [puppet] - 10https://gerrit.wikimedia.org/r/462789 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [18:39:15] (03PS2) 10Ottomata: Use --files to upload job properties file for refine spark job [puppet] - 10https://gerrit.wikimedia.org/r/462789 (https://phabricator.wikimedia.org/T203804) [18:39:17] (03CR) 10Ottomata: [V: 032 C: 032] Use --files to upload job properties file for refine spark job [puppet] - 10https://gerrit.wikimedia.org/r/462789 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [18:40:18] (03PS1) 10Cwhite: naggen2: restrict generated defines to valid options [puppet] - 10https://gerrit.wikimedia.org/r/462791 (https://phabricator.wikimedia.org/T202782) [18:46:24] (03CR) 10Krinkle: [C: 031] Install subversion on application servers [puppet] - 10https://gerrit.wikimedia.org/r/462673 (https://phabricator.wikimedia.org/T204801) (owner: 10Muehlenhoff) [18:47:21] (03Abandoned) 10EBernhardson: [WIP] Update cirrus server counts to match reality [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359376 (owner: 10EBernhardson) [18:49:05] (03PS1) 10Cwhite: Revert "Revert "monitoring: set mode on host and service configs"" https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/462791/ should fix the problems this caused. [puppet] - 10https://gerrit.wikimedia.org/r/462793 [18:51:36] 10Operations, 10Cloud-Services, 10Mail: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10Andrew) I don't care a whole lot about having one infrastructure project or a bunch of small ones but it would be moderately easier to manage security policy if there's one big one. So, pu... [18:52:01] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review, 10Wikimedia-production-error: Exec error "Possibly missing executable file: svn diff" from Special:Code - https://phabricator.wikimedia.org/T204801 (10Reedy) Isn't the problem here that the code review proxy (and/or it's config) is gone... [18:52:24] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "monitoring: set mode on host and service configs"" https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/462791/ should fix the problems this caused. [puppet] - 10https://gerrit.wikimedia.org/r/462793 (owner: 10Cwhite) [19:00:04] Deploy window MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T1900) [19:01:41] (03PS2) 10Gehel: wdqs: activate throttling of log messages to logstash [puppet] - 10https://gerrit.wikimedia.org/r/462661 (https://phabricator.wikimedia.org/T204364) [19:01:57] (03PS2) 10Gehel: Switch internal cluster to Kafka event source [puppet] - 10https://gerrit.wikimedia.org/r/462564 (owner: 10Smalyshev) [19:03:27] (03CR) 10Gehel: [C: 032] Switch internal cluster to Kafka event source [puppet] - 10https://gerrit.wikimedia.org/r/462564 (owner: 10Smalyshev) [19:05:48] (03PS3) 10Gehel: wdqs: activate throttling of log messages to logstash [puppet] - 10https://gerrit.wikimedia.org/r/462661 (https://phabricator.wikimedia.org/T204364) [19:07:27] (03PS2) 10Thcipriani: Update l10nupdate-1 PHP version to match scap [puppet] - 10https://gerrit.wikimedia.org/r/462748 (https://phabricator.wikimedia.org/T205313) [19:07:36] (03CR) 10Gehel: [C: 032] wdqs: activate throttling of log messages to logstash [puppet] - 10https://gerrit.wikimedia.org/r/462661 (https://phabricator.wikimedia.org/T204364) (owner: 10Gehel) [19:12:49] (03PS1) 10Jforrester: [GovernanceWiki] Enable BotPasswords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462799 (https://phabricator.wikimedia.org/T205368) [19:14:11] (03CR) 10Reedy: [C: 031] [GovernanceWiki] Enable BotPasswords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462799 (https://phabricator.wikimedia.org/T205368) (owner: 10Jforrester) [19:14:45] (03PS2) 10Cwhite: Revert "Revert "monitoring: set mode on host and service configs"" [puppet] - 10https://gerrit.wikimedia.org/r/462793 [19:17:03] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Cmjohnson) For cloudstore1008, I updated asw2-a5-eqiad to put this server in the public vlan. Everything was accept... [19:17:19] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Cmjohnson) [19:17:35] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Cmjohnson) [19:17:45] Krenair: what's up? [19:17:54] !log catrope@deploy1001 Synchronized php-1.32.0-wmf.23/extensions/ORES/maintenance/BackfillPageTriageQueue.php: I3f1ae92d8645 (duration: 00m 58s) [19:18:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:50] volans, been looking at a weird puppetdb thing with ottomata, found cumin could be a handy tool for checking stuff in there [19:18:57] but I noticed this [19:19:01] krenair@deployment-cumin:~$ sudo cumin "P{R:Class = profile::kafka::broker::monitoring}" [19:19:05] 4 hosts will be targeted: [19:19:06] deployment-kafka-jumbo-[1-2].deployment-prep.eqiad.wmflabs,deployment-kafka-main-[1-2].deployment-prep.eqiad.wmflabs [19:19:11] krenair@deployment-cumin:~$ sudo cumin "P{R:Prometheus::Jmx_exporter_instance ~ kafka_broker_.*}" [19:19:15] 4 hosts will be targeted: [19:19:15] deployment-kafka-jumbo-[1-2].deployment-prep.eqiad.wmflabs,deployment-kafka-main-[1-2].deployment-prep.eqiad.wmflabs [19:19:21] so far so good [19:19:21] but [19:19:25] krenair@deployment-cumin:~$ sudo cumin "P{(R:Class = profile::kafka::broker::monitoring) and (R:Prometheus::Jmx_exporter_instance ~ kafka_broker_.*)}" [19:19:29] No hosts found that matches the query [19:19:53] you need 2 puppetdn queries, so P{} and P{} [19:20:03] bah [19:20:06] right [19:20:07] por tip R:Class = profile:: --> P: [19:20:25] so P{P:kafka::broker::monitoring} and ... [19:20:30] yes [19:21:02] this works [19:21:06] :) [19:21:30] I wonder what query_resources is actually running against puppet db then and how it differs from what I've got cumin to show [19:21:39] anyway [19:21:40] thanks volans [19:21:59] Krenair: to see the puppetdb query run cumin with -d [19:22:08] and check the last line in /var/log/cumin/cumin.log [19:23:28] ["and", ["=", "type", "Class"], ["=", "title", "Profile::Kafka::Broker::Monitoring"]] [19:23:34] ["and", ["=", "type", "Prometheus::Jmx_exporter_instance"], ["~", "title", "kafka_broker_.*"]] [19:23:36] is what cumin does [19:24:06] 10Operations, 10Maps (Tilerator), 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): investigate tilerator crash on maps eqiad - https://phabricator.wikimedia.org/T204047 (10MSantos) [19:24:07] so I guess you want to generate the same in query_resources :) [19:24:13] gotta dig into what query_resources is up to [19:24:17] yeah [19:24:25] yeah don't recall by memory, sorry [19:24:29] no worries [19:24:34] I did it once before I'll figure it out again [19:25:47] 10Operations, 10Traffic, 10Maps (Tilerator), 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776 (10MSantos) [19:25:56] 10Operations, 10Maps, 10Traffic, 10Reading-Infrastructure-Team-Backlog (Kanban): Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10MSantos) [19:28:18] 10Operations, 10ops-codfw, 10Maps, 10Reading-Infrastructure-Team-Backlog, and 2 others: Decommission maps-test cluster - https://phabricator.wikimedia.org/T202898 (10MSantos) [19:30:10] 10Operations, 10Maps, 10Reading-Infrastructure-Team-Backlog, 10Patch-For-Review: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622 (10MSantos) [19:31:51] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Maps (Tilerator): Increase frequency of OSM replication - https://phabricator.wikimedia.org/T137939 (10MSantos) [19:37:07] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review, 10Wikimedia-production-error: Exec error "Possibly missing executable file: svn diff" from Special:Code - https://phabricator.wikimedia.org/T204801 (10ArielGlenn) See T116948: this extension is long gone. It had a wonderful life, and now... [19:41:27] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review, 10Wikimedia-production-error: Exec error "Possibly missing executable file: svn diff" from Special:Code - https://phabricator.wikimedia.org/T204801 (10Krinkle) @Reedy Indeed, they didn't have a full clone for CodeReview purposes, but from... [19:43:03] 10Operations, 10ops-eqiad, 10Operations-Software-Development, 10Patch-For-Review: rack/setup/install clustermgmt1001.eqiad.wmnet (new cumin master) - https://phabricator.wikimedia.org/T201346 (10Volans) [19:43:49] Can I not do a text search on pastes in phab? [19:45:36] 10Operations, 10ops-eqiad, 10Operations-Software-Development, 10Patch-For-Review: rack/setup/install clustermgmt1001.eqiad.wmnet (new cumin master) - https://phabricator.wikimedia.org/T201346 (10Volans) Host installed, keyholder armed and a quick test with cumin worked as expected. @MoritzMuehlenhoff we n... [19:47:45] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review, 10Wikimedia-production-error: Exec error "Possibly missing executable file: svn diff" from Special:Code - https://phabricator.wikimedia.org/T204801 (10ArielGlenn) @Krinkle Of course. But let's not reinstall svn everywhere just to get diff... [19:48:39] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review, 10Wikimedia-production-error: Exec error "Possibly missing executable file: svn diff" from Special:Code - https://phabricator.wikimedia.org/T204801 (10Krinkle) Agreed. If that's easier, fine by me. [19:52:15] irb(main):029:0* resquery [19:52:16] => ["and", ["=", "type", "Prometheus::Jmx_exporter_instance"], ["~", "title", "kafka_broker_.*"], ["=", "exported", false]] [19:52:16] irb(main):030:0> nodequery [19:52:16] => ["in", "certname", ["extract", "certname", ["select_resources", ["and", ["=", "type", "Class"], ["=", "title", "Profile::Kafka::Broker::Monitoring"], ["=", "exported", false]]]]] [19:52:29] the exported = false is interesting [19:53:24] (03PS2) 10Cwhite: naggen2: restrict generated defines to valid options [puppet] - 10https://gerrit.wikimedia.org/r/462791 (https://phabricator.wikimedia.org/T202782) [19:58:20] (03PS3) 10Dzahn: icinga: remove nsca::firewall class, use ferm::service in profile [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) [19:58:22] (03PS43) 10Gehel: Split instance define out of elasticsearch class [puppet] - 10https://gerrit.wikimedia.org/r/441338 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [19:58:24] (03PS70) 10Gehel: Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [19:59:29] (03CR) 10jerkins-bot: [V: 04-1] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [20:02:27] (03CR) 10Dzahn: "it needs to allow connections from "frack" the fundraising rack, and that would be a different domain, so domain_networks probably not goo" [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [20:03:54] ottomata, well, to me it looks like query_resources should be returning results to us [20:04:06] (03PS14) 10Bstorm: wiki replicas - prepare for refactored actor storage [puppet] - 10https://gerrit.wikimedia.org/r/431823 (https://phabricator.wikimedia.org/T195747) [20:05:28] say come the thought of it [20:05:36] this template it's used to populate [20:05:44] relies on get_clusters returning stuff [20:06:08] (03CR) 10Bstorm: [C: 032] wiki replicas - prepare for refactored actor storage [puppet] - 10https://gerrit.wikimedia.org/r/431823 (https://phabricator.wikimedia.org/T195747) (owner: 10Bstorm) [20:06:24] I wonder if that's working [20:07:15] (03CR) 10Smalyshev: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/461862 (https://phabricator.wikimedia.org/T202830) (owner: 10Smalyshev) [20:08:37] (03CR) 10Smalyshev: "> Patch Set 3:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/461862 (https://phabricator.wikimedia.org/T202830) (owner: 10Smalyshev) [20:08:40] ottomata, oooh yeah here we are [20:08:57] I went onto the puppetmaster and changed modules/prometheus/templates/jmx_exporter_config.erb to have this at the bottom: [20:09:03] # sites: <%= scope.function_ordered_yaml([@site_clusters]) %> [20:09:06] (03CR) 10Bstorm: "Are the comments on this still valid? It looks as if the referenced change was resolved." [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [20:09:10] ran pupept on deployment-prometheus01 [20:09:16] +# sites: [20:09:17] + [20:09:48] (03CR) 10Alex Monk: [C: 04-1] "We never figured out why prod behaviour was different." [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [20:10:23] (03PS4) 10Dzahn: icinga: remove nsca::firewall class, use ferm::service in profile [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) [20:10:43] (03CR) 10Bstorm: "Ah ok, thanks" [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [20:13:41] (03CR) 10Gehel: "puppet compiler looks reasonable: https://puppet-compiler.wmflabs.org/compiler1002/12613/" [puppet] - 10https://gerrit.wikimedia.org/r/441338 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [20:14:54] Error: Could not retrieve catalog from remote server: Error 502 on SERVER:

Incomplete response received from application

[20:14:59] today's useful puppet syntax error [20:17:15] (03CR) 10Dzahn: "from the compiler output. this is what DOMAIN_NETWORKS resolves to:" [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [20:21:08] ottomata, found the problem [20:21:18] get_clusters checks everything with class profile::cumin::target [20:21:25] but guess what includes that [20:21:39] this block at the top of modules/standard/manifests/init.pp [20:21:43] if $::realm == 'production' { [20:21:44] include ::profile::cumin::target [20:22:46] Krenair: labs instances have their own: include ::profile::openstack::main::cumin::target [20:22:54] in role::labs::instance [20:23:15] (03CR) 10Dzahn: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/12615/einsteinium.wikimedia.org/change.einsteinium.wikimedia.org.pson" [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [20:23:32] but yeah, get_clusters is for prod only IIRC [20:24:16] (03PS5) 10Dzahn: icinga: remove nsca::firewall class, use ferm::service in profile [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) [20:24:50] 10Operations, 10Puppet, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Krenair) (for context: `modules/prometheus/manifests/jmx_... [20:28:15] (03CR) 10Dzahn: [C: 032] "only FRACK uses passive checks with NSCA afaict, but PROD networks are also allowed, as before" [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [20:28:30] trying this [20:28:31] - function_query_resources([false, 'Class["Profile::Cumin::Target"]', false]) [20:28:31] + function_query_resources([false, 'Class["Profile::Cumin::Target"] or Class["Profile::Openstack::Main::Cumin::Target"]', false]) [20:28:49] (03CR) 10Dzahn: [C: 032] "what isn't allowed anymore since yesterday .. labs networks.. and they shouldn't" [puppet] - 10https://gerrit.wikimedia.org/r/462607 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [20:38:03] yeah that wasn't gonna work [20:39:30] the stuff underneath that line looks at the site and cluster parameters [20:39:41] which aren't in the labs on [20:39:42] one [20:45:17] ohohhhhh Krenair nice find [20:45:18] ottomata, take a look at deployment-prometheus01:/srv/prometheus/beta/targets/jmx_kafka_mirrormaker_beta_eqiad.yaml now [20:45:33] this is with a couple of live hacks on the puppetmaster [20:45:45] ok [20:45:47] (03PS71) 10EBernhardson: Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049 (https://phabricator.wikimedia.org/T198351) [20:45:50] dealing with the interaction between get_clusters and Profile::Openstack::Main::Cumin::Target [20:48:20] 10Operations, 10Puppet, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Krenair) With these puppet changes: ```diff --git a/modul... [20:50:12] Krenair: might be able to also work around that by changing jmx_exporter_config.erb and/or jmx_exporter_config.pp and vary on $::realm [20:50:18] not sure which is better [20:50:28] ew [20:50:34] i think probably jmx_exporter_config.erb shouldn't use get_clusters in labs [20:50:50] something like [20:50:57] if labs site_clusters = undef [20:50:59] and in .erb [20:51:07] don't bother filtering if not site_clusters [20:51:08] I feel like get_clusters should be able to work like prod within deployment-prep [20:51:13] just use all the resources [20:51:17] that would be good [20:51:25] but clusters aren't defined in labs? [20:51:31] but we can define them [20:51:32] or, maybe cluster == project? [20:52:07] (03CR) 1020after4: [C: 032] Unlink the Unix domain socket when exiting [software/keyholder] - 10https://gerrit.wikimedia.org/r/458247 (owner: 10Faidon Liambotis) [20:52:39] $cluster = hiera('cluster', $::labsproject), [20:52:41] ? [20:54:35] (03CR) 1020after4: [C: 031] Add compatibility with Construct 2.8.22 and 2.9.45 [software/keyholder] - 10https://gerrit.wikimedia.org/r/458245 (owner: 10Faidon Liambotis) [20:54:54] ottomata, does the concept of a cluster in prod really line up with a labs project? [20:54:57] yeah Krenair your solution is better [20:54:59] Krenair: no not really [20:55:15] but its better than misc? for a default? or i guess that is the prod default cluster? [20:55:21] I'm leaning towards the default being 'misc' and letting people override [20:55:26] oh it is in profile::cumin::target [20:55:27] o yeah [20:55:28] ok yeah [20:55:28] makes sense [20:55:55] Krenair: you want to make a patch or shall I? [20:56:37] I will [20:56:44] k [20:56:48] thanks so much! [20:57:08] ottomata: keep in mind that profile::cumin::target is not included and should not be included in labs [20:57:53] (03CR) 1020after4: [C: 031] Implement SSH_AGENTC_LOCK/SSH_AGENTC_UNLOCK [software/keyholder] - 10https://gerrit.wikimedia.org/r/458242 (owner: 10Faidon Liambotis) [20:58:38] (03PS1) 10Alex Monk: Try to make get_clusters work inside labs [puppet] - 10https://gerrit.wikimedia.org/r/462810 (https://phabricator.wikimedia.org/T204088) [20:58:39] volans: aye, Krenair's change is just selecting nodes that have either included, making the get_clusters.rb function work in labs [20:58:46] it isn't actually including anything new [20:59:38] ack, I thought you were planning to change the $cluster var in there [20:59:58] (03CR) 1020after4: [C: 031] Add permission checks for various commands [software/keyholder] - 10https://gerrit.wikimedia.org/r/458240 (owner: 10Faidon Liambotis) [21:00:12] naw, its mostly this: [21:00:13] - function_query_resources([false, 'Class["Profile::Cumin::Target"]', false]) [21:00:13] + function_query_resources([false, 'Class["Profile::Cumin::Target"] or Class["Profile::Openstack::Main::Cumin::Target"]', false]) [21:00:21] in get_clusters.rb [21:00:34] PROBLEM - Check systemd state on analytics1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:00:38] oooo [21:00:39] looking [21:02:08] jouncebot: now [21:02:09] No deployments scheduled for the next 1 hour(s) and 57 minute(s) [21:03:30] (03CR) 10Ottomata: [C: 031] Try to make get_clusters work inside labs [puppet] - 10https://gerrit.wikimedia.org/r/462810 (https://phabricator.wikimedia.org/T204088) (owner: 10Alex Monk) [21:04:10] 10Operations, 10Puppet, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Ottomata) @fgiunchedi Alex's change ^ should do it. Coul... [21:05:54] RECOVERY - Check systemd state on analytics1003 is OK: OK - running: The system is fully operational [21:12:47] (03CR) 10Alex Monk: "Note this patch doesn't include the defaults for cluster and site inside get_clusters itself like the version currently on deployment-pupp" [puppet] - 10https://gerrit.wikimedia.org/r/462810 (https://phabricator.wikimedia.org/T204088) (owner: 10Alex Monk) [21:14:17] 10Operations, 10Puppet, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, and 2 others: Prometheus resources in deployment-prep to create grafana graphs of EventLogging - https://phabricator.wikimedia.org/T204088 (10Krenair) [21:24:15] (03PS1) 10Ottomata: Fix input regex and capture group configs for Refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/462816 (https://phabricator.wikimedia.org/T203804) [21:24:28] (03PS2) 10Ottomata: Fix input regex and capture group configs for Refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/462816 (https://phabricator.wikimedia.org/T203804) [21:26:54] (03CR) 10Ottomata: [C: 032] Fix input regex and capture group configs for Refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/462816 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [21:34:36] (03PS3) 10Cwhite: naggen2: restrict generated defines to valid options [puppet] - 10https://gerrit.wikimedia.org/r/462791 (https://phabricator.wikimedia.org/T202782) [21:35:52] (03PS4) 10Dzahn: icinga: (if on stretch) don't let puppet manage user/group [puppet] - 10https://gerrit.wikimedia.org/r/462593 (https://phabricator.wikimedia.org/T202782) [21:41:36] robh, mutante: groceryheist's SWAP work is blocked on LDAP access, any chance we could expedite https://phabricator.wikimedia.org/T205454 ? [21:42:03] (03CR) 10Dzahn: "comments at https://phabricator.wikimedia.org/T204801#4616591 should be addressed" [puppet] - 10https://gerrit.wikimedia.org/r/462673 (https://phabricator.wikimedia.org/T204801) (owner: 10Muehlenhoff) [21:45:24] HaeB: would it be ok if i raise the priority and ping Moritz so he sees it in the European morning. he is handling them this week, it's a rotation [21:47:55] mutante: ok, thanks, i think that's fine (CC groceryheist ) [21:50:36] (03PS1) 10Imarlier: Merge branch 'master' of ssh://gerrit.wikimedia.org/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462818 [21:50:38] (03PS1) 10Imarlier: Beta: enable MobileFrontend and move some config in to labs settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462819 (https://phabricator.wikimedia.org/T205495) [21:51:51] (03CR) 10jerkins-bot: [V: 04-1] Beta: enable MobileFrontend and move some config in to labs settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462819 (https://phabricator.wikimedia.org/T205495) (owner: 10Imarlier) [21:53:14] (03CR) 10Dzahn: [C: 032] "compiler shows it's noop on einsteinium/tegmen and only affects icinga1001 https://puppet-compiler.wmflabs.org/compiler1002/12616/" [puppet] - 10https://gerrit.wikimedia.org/r/462593 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [21:53:23] (03Abandoned) 10Imarlier: Merge branch 'master' of ssh://gerrit.wikimedia.org/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462818 (owner: 10Imarlier) [21:53:44] (03PS2) 10Imarlier: Beta: enable MobileFrontend and move some config in to labs settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462819 (https://phabricator.wikimedia.org/T205495) [21:55:00] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users - https://phabricator.wikimedia.org/T204790 (10Dzahn) [21:55:07] (03CR) 10jerkins-bot: [V: 04-1] Beta: enable MobileFrontend and move some config in to labs settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462819 (https://phabricator.wikimedia.org/T205495) (owner: 10Imarlier) [21:55:41] (03CR) 10Krinkle: [C: 04-1] "Yeah, looks like we'll accept this not working for now. Instead, to generate the one-off html dumps, probably someone will install it loca" [puppet] - 10https://gerrit.wikimedia.org/r/462673 (https://phabricator.wikimedia.org/T204801) (owner: 10Muehlenhoff) [21:55:46] (03PS3) 10Imarlier: Beta: enable MobileFrontend and move some config in to labs settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462819 (https://phabricator.wikimedia.org/T205495) [21:57:34] (03CR) 10Krinkle: [C: 04-1] sitemaps: Generalize varnish rule for sitemaps, to apply to all domains [puppet] - 10https://gerrit.wikimedia.org/r/456169 (https://phabricator.wikimedia.org/T198965) (owner: 10Imarlier) [22:30:29] (03PS3) 10Dzahn: icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) [22:30:54] (03CR) 10jerkins-bot: [V: 04-1] icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [22:35:53] (03PS4) 10Dzahn: icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) [22:36:30] (03CR) 10jerkins-bot: [V: 04-1] icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [22:42:05] (03PS5) 10Dzahn: icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) [22:42:26] (03PS2) 10Dzahn: rsync::server: add parameter to use IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/456522 [22:42:32] (03CR) 10Dzahn: [C: 032] rsync::server: add parameter to use IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/456522 (owner: 10Dzahn) [22:42:37] (03CR) 10jerkins-bot: [V: 04-1] icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [22:43:28] 10Operations, 10Community-Tech, 10MediaWiki-Parser, 10Traffic: Show SVGs in page language if available - https://phabricator.wikimedia.org/T205040 (10MaxSem) [22:44:24] (03PS6) 10Dzahn: icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) [22:45:48] (03CR) 10Dzahn: [C: 032] trafficserver: replace mwmaint1001 with 1002 as noc.wm.org backend [puppet] - 10https://gerrit.wikimedia.org/r/461490 (https://phabricator.wikimedia.org/T201343) (owner: 10Dzahn) [22:45:57] (03PS3) 10Dzahn: trafficserver: replace mwmaint1001 with 1002 as noc.wm.org backend [puppet] - 10https://gerrit.wikimedia.org/r/461490 (https://phabricator.wikimedia.org/T201343) [22:46:08] (03CR) 10Dzahn: [C: 032] "this is still in testing, not affecting production "noc"" [puppet] - 10https://gerrit.wikimedia.org/r/461490 (https://phabricator.wikimedia.org/T201343) (owner: 10Dzahn) [22:52:59] !log ppchelko@deploy1001 Started deploy [restbase/deploy@0fa695e] (dev-cluster): Deployinng content negotiation to dev cluster for ab testing [22:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:55:52] !log ppchelko@deploy1001 Finished deploy [restbase/deploy@0fa695e] (dev-cluster): Deployinng content negotiation to dev cluster for ab testing (duration: 02m 53s) [22:55:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:58:48] (03CR) 10Dzahn: [C: 031] naggen2: restrict generated defines to valid options [puppet] - 10https://gerrit.wikimedia.org/r/462791 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [22:59:29] (03PS7) 10Dzahn: icinga: make icinga user configurable [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate Evening SWAT (Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180925T2300). [23:00:04] Smalyshev, Jdlrobson, James_F, and MaxSem: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:07] * James_F waves. [23:00:22] * MaxSem waves too [23:01:06] \o here [23:01:15] I'll do the deed [23:02:07] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1002/12617/einsteinium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:02:22] SMalyshev: yt? [23:03:15] skipping for now [23:04:10] (03CR) 10MaxSem: [C: 032] [GovernanceWiki] Enable BotPasswords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462799 (https://phabricator.wikimedia.org/T205368) (owner: 10Jforrester) [23:04:22] MaxSem: here [23:04:33] sorry was typing in other window :) [23:04:45] MaxSem: Nothing to test for the GovWiki config change. [23:05:23] O RLY? [23:05:36] not even bot password creation? :P [23:05:43] MaxSem: Well, fine. [23:06:07] Pull it to mwdebug2001? [23:06:10] (03Merged) 10jenkins-bot: [GovernanceWiki] Enable BotPasswords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462799 (https://phabricator.wikimedia.org/T205368) (owner: 10Jforrester) [23:06:35] pulled, James_F [23:07:20] Hmm. [23:08:03] MaxSem: I get an MW fatal. [23:09:51] Did you remember to create the table? [23:10:09] bawolff: Currently we're scouring MW.org for the documentation. [23:10:39] Well if you're running the installer, you don't have to create the table ;) [23:10:50] so mw.org is probably going to be different from wikimedia [23:10:51] In production? [23:11:38] Basically, most of production uses the table at metawiki (So bot passwords are global) [23:11:47] Yeah, but we can't do that for non-SUL wikis. [23:11:51] (03CR) 10Dzahn: [C: 032] "being extra careful and just applying on tegmen first" [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:12:01] And migrating governance wiki to SUL is going to take a while. [23:12:01] so wikis don't have the table [23:12:01] but foundationwiki doesn't, so its going to need the table created [23:12:50] So normally, adding the table is easy. But I don't know if anything special is going on due to db maintance during the switchover [23:13:08] Creating new empty tables is meant to be fine, it's just major data changes that are blocked. [23:13:20] grrrmbl, 2 patches to run [23:13:28] Sorry, MaxSem. [23:13:35] bawolff: Thanks. :-) [23:14:28] so yeah, maintenance/archives/patch-bot_passwords.sql is the file [23:14:53] (03CR) 10jenkins-bot: [GovernanceWiki] Enable BotPasswords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462799 (https://phabricator.wikimedia.org/T205368) (owner: 10Jforrester) [23:15:16] https://wikitech.wikimedia.org/wiki/Creating_new_tables is impressively useless. [23:16:02] i think you just run it through the sql.php maintenance script (which is different from sql tool) [23:16:11] lol, at that page [23:16:33] Database is read-only: You can't edit now. This is because of maintenance. Copy and save your text and try again in a few minutes. [23:16:43] So I think you'd do something like mwscript foundationwiki sql.php < /path/to/patch [23:17:16] MaxSem: mine will require a bit of testing (10 mins minimum) once it's live, so please ping me when you're ready to do that one [23:17:49] (03PS1) 10MaxSem: Revert "[GovernanceWiki] Enable BotPasswords" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462828 [23:17:53] (03CR) 10MaxSem: [C: 032] Revert "[GovernanceWiki] Enable BotPasswords" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462828 (owner: 10MaxSem) [23:18:09] https://wikitech.wikimedia.org/wiki/How_to_do_a_schema_change#sql.php [23:18:16] It's why I created the WikimediaMaintenance createExtensionTables.php [23:18:36] but it's not an exteeeension! [23:18:44] I know [23:18:51] But the point that running sql files is awkward [23:19:08] Such is life. [23:19:13] Reedy: Update the docs! :P [23:19:20] (03Merged) 10jenkins-bot: Revert "[GovernanceWiki] Enable BotPasswords" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462828 (owner: 10MaxSem) [23:19:48] (03PS3) 10MaxSem: Add phrase rescoring to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462347 (https://phabricator.wikimedia.org/T163642) (owner: 10Smalyshev) [23:19:53] (03CR) 10MaxSem: [C: 032] Add phrase rescoring to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462347 (https://phabricator.wikimedia.org/T163642) (owner: 10Smalyshev) [23:22:57] (03Merged) 10jenkins-bot: Add phrase rescoring to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462347 (https://phabricator.wikimedia.org/T163642) (owner: 10Smalyshev) [23:23:42] SMalyshev: pulled on mwdebug2001, is it testable? [23:23:55] kinda... let me see what I can see there [23:24:14] it defines config not enables it but it may be possible to see something with debug options [23:25:40] (03PS1) 10Dzahn: icinga: use $icinga::icinga_user in all classes [puppet] - 10https://gerrit.wikimedia.org/r/462829 (https://phabricator.wikimedia.org/T202782) [23:27:11] ok, the config seems to be loading fine, but the values are for some reason not what I expected... Since it's not enabled yet, I think it's ok [23:27:24] so sync? [23:27:28] I'll dig into where the values coming from and probably will need another config tweak [23:27:50] MaxSem: yes you can sync this one, since it's passive config it's ok [23:28:02] I'll figure the value before activating it [23:29:25] !log maxsem@deploy1001 Synchronized wmf-config/WikibaseSearchSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/462347/ (duration: 00m 57s) [23:29:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:35] (03CR) 10Dzahn: "let me know if you think i should add it as a parameter to each of the classes instead or this is fine. i just wanted to be consistent abo" [puppet] - 10https://gerrit.wikimedia.org/r/462829 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:29:49] (03CR) 10Dzahn: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/12618/einsteinium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/462829 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:30:49] James_F: I think it might've been ok as long it was run from mwmain2001 [23:31:06] Reedy: That's what I told MaxSem but he wasn't having it. [23:31:13] heh [23:31:21] definitely wouldn't work from deploy1001 [23:31:27] And as always Deployer Decides™. [23:31:49] we have a busy window, no time to investigate [23:31:55] Yeah. [23:32:23] jdlrobson: on mwdebug2001 [23:35:45] looking [23:36:28] sync away MaxSem [23:36:36] (03CR) 10Dzahn: [C: 032] "first noop on tegmen & icinga1001..and, step 2, also on einsteinium" [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:37:36] (03CR) 10Dzahn: [C: 032] "follow-up https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/462829/" [puppet] - 10https://gerrit.wikimedia.org/r/462604 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:38:05] (03CR) 10jenkins-bot: Revert "[GovernanceWiki] Enable BotPasswords" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462828 (owner: 10MaxSem) [23:38:07] (03CR) 10jenkins-bot: Add phrase rescoring to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462347 (https://phabricator.wikimedia.org/T163642) (owner: 10Smalyshev) [23:39:00] !log maxsem@deploy1001 Synchronized php-1.32.0-wmf.22/skins/MinervaNeue/: https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/462705/ (duration: 00m 57s) [23:39:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:09] jdlrobson: ^ [23:39:42] \o/ [23:41:15] James_F: the wmf.23 change is on mwdebug2001 [23:41:33] Ta. [23:43:42] MaxSem: Working, please sync. [23:46:08] !log maxsem@deploy1001 Synchronized php-1.32.0-wmf.23/extensions/UploadWizard/: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/UploadWizard/+/462780/ (duration: 00m 56s) [23:46:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:56] !log maxsem@deploy1001 Synchronized php-1.32.0-wmf.22/extensions/UploadWizard/: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/UploadWizard/+/462781/ (duration: 00m 56s) [23:50:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:40] Thanks, MaxSem . [23:52:52] (03CR) 10Dzahn: [C: 032] "noop on all 3 icinga servers" [puppet] - 10https://gerrit.wikimedia.org/r/462829 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:53:27] thanks MaxSem ! [23:53:33] things are looking okay [23:55:03] (03PS1) 10Dzahn: icinga: make group configurable, make user configurable pt2 [puppet] - 10https://gerrit.wikimedia.org/r/462833 (https://phabricator.wikimedia.org/T202782) [23:55:31] (03CR) 10jerkins-bot: [V: 04-1] icinga: make group configurable, make user configurable pt2 [puppet] - 10https://gerrit.wikimedia.org/r/462833 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:56:54] (03PS2) 10Dzahn: icinga: make group configurable, make user configurable pt2 [puppet] - 10https://gerrit.wikimedia.org/r/462833 (https://phabricator.wikimedia.org/T202782) [23:58:54] 10Operations, 10DNS, 10Traffic, 10WMF-Communications, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10Jdforrester-WMF) 05Open>03Resolved [23:59:41] 10Operations, 10DNS, 10Traffic, 10WMF-Communications, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10Jdforrester-WMF) [23:59:46] !log maxsem@deploy1001 Synchronized php-1.32.0-wmf.22/extensions/GlobalPreferences/: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/GlobalPreferences/+/462822/ (duration: 00m 56s) [23:59:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log