[03:38:41] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 984.11 seconds [04:21:21] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 204.80 seconds [04:27:58] (03PS2) 10Bstorm: toolforge: disable lighttpd service on webgrid nodes [puppet] - 10https://gerrit.wikimedia.org/r/481344 (https://phabricator.wikimedia.org/T105059) (owner: 10BryanDavis) [04:29:13] (03CR) 10Bstorm: [C: 03+2] toolforge: disable lighttpd service on webgrid nodes [puppet] - 10https://gerrit.wikimedia.org/r/481344 (https://phabricator.wikimedia.org/T105059) (owner: 10BryanDavis) [06:15:14] !log Fix last chunks on db1124:338 - T212574 [06:15:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:15:17] T212574: Recheck wikidatawiki.pagelinks table across all hosts in s8 - https://phabricator.wikimedia.org/T212574 [06:22:28] 10Operations: Please import php-xdebug to apt.wm.o thirdparty/php72 - https://phabricator.wikimedia.org/T212757 (10Legoktm) [06:22:48] 10Operations: Please import php-xdebug to apt.wm.o thirdparty/php72 - https://phabricator.wikimedia.org/T212757 (10Legoktm) [06:22:55] (03PS1) 10Giuseppe Lavagetto: stdlib: use Stdlib::Port in place of our local definition [puppet] - 10https://gerrit.wikimedia.org/r/481818 [06:25:38] !log Drop valid_tag from s6 - T212254 [06:25:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:41] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [06:26:59] 10Operations: Please import php-xdebug to apt.wm.o thirdparty/php72 - https://phabricator.wikimedia.org/T212757 (10Joe) p:05Triage→03Normal a:03Joe [06:28:31] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:28:33] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.009 second response time [06:33:43] PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:11] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [06:38:15] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.548 second response time [06:49:21] !log Drop empty valid_tag table from s5 - T212254 [06:49:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:49:23] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [06:54:29] !log Drop empty valid_tag table from labswiki labtestwiki - T212254 [06:54:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:31] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [06:59:43] RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:00:59] !log Deploy schema change on s1 codfw master (lag will be generated on s1 codfw) - T202167 T86338 [07:01:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:01:02] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:01:03] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [07:26:12] (03PS1) 10Marostegui: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481820 (https://phabricator.wikimedia.org/T212692) [07:28:30] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481820 (https://phabricator.wikimedia.org/T212692) (owner: 10Marostegui) [07:29:38] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481820 (https://phabricator.wikimedia.org/T212692) (owner: 10Marostegui) [07:30:46] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1078 - T212692 (duration: 00m 47s) [07:30:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:30:49] !log Fix login.logging table on db1078 - T212692 [07:30:49] T212692: DBQueryTimeoutError on Wikimedia Login's Special:CheckUser - https://phabricator.wikimedia.org/T212692 [07:30:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:41] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481821 [07:35:54] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481821 (owner: 10Marostegui) [07:36:57] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481821 (owner: 10Marostegui) [07:37:29] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481820 (https://phabricator.wikimedia.org/T212692) (owner: 10Marostegui) [07:37:31] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481821 (owner: 10Marostegui) [07:38:24] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1078 - T212692 (duration: 00m 46s) [07:38:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:27] T212692: DBQueryTimeoutError on Wikimedia Login's Special:CheckUser - https://phabricator.wikimedia.org/T212692 [08:14:39] (03PS2) 10KartikMistry: Add Google Translate MT config [puppet] - 10https://gerrit.wikimedia.org/r/471698 (https://phabricator.wikimedia.org/T90208) [08:54:12] log Drop valid_tag from s8 - T212254 [08:54:12] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [08:56:33] !log installing libarchive security updates [08:56:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:26] (03PS1) 10Muehlenhoff: Add library hint for libarchive [puppet] - 10https://gerrit.wikimedia.org/r/481824 [09:01:32] 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.12; 2019-01-08), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) Did a quick test this morning to have an... [09:04:13] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for libarchive [puppet] - 10https://gerrit.wikimedia.org/r/481824 (owner: 10Muehlenhoff) [09:06:03] !log eqiad-prod: final weight for ms-be10[44-50].eqiad.wmnet - T209618 [09:06:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:05] T209618: rack/setup/install ms-be10[44-50].eqiad.wmnet - https://phabricator.wikimedia.org/T209618 [09:07:15] marostegui: You forgot to add ! [09:07:27] !log Drop valid_tag from s8 by Marostegui - T212254 [09:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:29] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [09:07:29] Zoranzoki21: Ups! True! Thanks [09:07:30] :) [09:07:35] marostegui: I done it [09:07:39] Thank you! [09:07:55] marostegui: Your welcome, no problems :) [09:17:31] (03PS1) 10Filippo Giunchedi: Onboard group1 to new logging infrastructure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481825 (https://phabricator.wikimedia.org/T211124) [09:18:47] !log installing c3p0 security updates [09:18:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:27] PROBLEM - Hadoop NodeManager on analytics1028 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:20:54] this is me --^ [09:21:03] (03PS2) 10Filippo Giunchedi: logstash: output webrequest 5xx metrics [puppet] - 10https://gerrit.wikimedia.org/r/480943 (https://phabricator.wikimedia.org/T205870) [09:25:50] (03CR) 10Filippo Giunchedi: [C: 04-1] "http_status needs conversion to number first" [puppet] - 10https://gerrit.wikimedia.org/r/480943 (https://phabricator.wikimedia.org/T205870) (owner: 10Filippo Giunchedi) [09:34:24] (03CR) 10DCausse: [C: 03+1] elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434) (owner: 10Mathew.onipe) [09:42:53] (03CR) 10Filippo Giunchedi: [C: 03+1] swift: new cert for ms-fe.svc.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/481136 (https://phabricator.wikimedia.org/T212215) (owner: 10Ema) [09:46:07] (03PS1) 10Marostegui: db-codfw.php: Depool db2096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481828 [09:47:29] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Depool db2096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481828 (owner: 10Marostegui) [09:48:36] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481828 (owner: 10Marostegui) [09:48:50] (03CR) 10jenkins-bot: db-codfw.php: Depool db2096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481828 (owner: 10Marostegui) [09:48:52] !log marostegui@deploy1001 sync-file aborted: Depool db2096 (duration: 00m 01s) [09:48:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:44] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool db2096 (duration: 00m 45s) [09:49:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:29] 10Operations, 10Traffic, 10netops: ideen.wikimedia.de can’t be reached - https://phabricator.wikimedia.org/T212764 (10Smarti) [09:50:37] !log Stop MySQL on db2096 for kernel and mysql upgrade [09:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:00] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2096" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481829 [09:59:41] (03PS1) 10Banyek: mariadb: depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481830 (https://phabricator.wikimedia.org/T85757) [09:59:51] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received [10:00:57] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [10:01:37] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool db2096" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481829 (owner: 10Marostegui) [10:02:41] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2096" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481829 (owner: 10Marostegui) [10:02:56] (03PS1) 10Banyek: mariadb: depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481831 (https://phabricator.wikimedia.org/T85757) [10:03:42] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2096 (duration: 00m 44s) [10:03:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:51] (03PS1) 10Banyek: mariadb: depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481832 (https://phabricator.wikimedia.org/T85757) [10:07:13] (03PS3) 10Filippo Giunchedi: apache basic auth: Use term "developer account" [puppet] - 10https://gerrit.wikimedia.org/r/467723 (https://phabricator.wikimedia.org/T179461) (owner: 10BryanDavis) [10:09:26] (03CR) 10Filippo Giunchedi: [C: 03+2] apache basic auth: Use term "developer account" [puppet] - 10https://gerrit.wikimedia.org/r/467723 (https://phabricator.wikimedia.org/T179461) (owner: 10BryanDavis) [10:10:51] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1002/14091/" [puppet] - 10https://gerrit.wikimedia.org/r/467723 (https://phabricator.wikimedia.org/T179461) (owner: 10BryanDavis) [10:13:24] 10Operations, 10CirrusSearch, 10Discovery-Search (Current work): Add chi, psi and omega selector to the elasticsearch dashboards in grafana - https://phabricator.wikimedia.org/T211956 (10dcausse) Thanks! this selector needs to be added to all other elasticsearch dashboards: - https://grafana.wikimedia.org/d/... [10:14:55] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2096" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481829 (owner: 10Marostegui) [10:16:27] (03CR) 10Volans: "Change looks ok, but maybe it will be nicer to use the Privileged and Unprivileged version of the Port type when applicable. Totally optio" [puppet] - 10https://gerrit.wikimedia.org/r/481818 (owner: 10Giuseppe Lavagetto) [10:17:36] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: logstash HTTP Basic Auth prompt says "WMF Labs" - https://phabricator.wikimedia.org/T207178 (10Tgr) Note that Chrome does not display a basic auth prompt. Maybe the same message could be put in the error shown after authentication failure? (Although tha... [10:17:39] (03PS1) 10Volans: zone_validator: catch parse errors [dns] - 10https://gerrit.wikimedia.org/r/481833 [10:21:56] (03CR) 10Volans: [C: 03+2] "Tested on boron." [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/480885 (owner: 10Volans) [10:28:05] (03Merged) 10jenkins-bot: Upstream release v0.0.10 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/480885 (owner: 10Volans) [10:34:03] (03PS1) 10Banyek: mariadb: depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481837 (https://phabricator.wikimedia.org/T85757) [10:38:58] (03PS1) 10Banyek: mariadb: depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481839 (https://phabricator.wikimedia.org/T85757) [10:40:44] 10Operations, 10User-fgiunchedi: Upgrade mysqld_exporter to 0.10.0 - https://phabricator.wikimedia.org/T161296 (10fgiunchedi) Buster will likely ship with >= 0.11, changelog here: https://github.com/prometheus/mysqld_exporter/blob/master/CHANGELOG.md#0110--2018-06-29 [10:41:07] spambots :-( [10:42:01] !log uploaded spicerack_0.0.10-1_amd64.deb to apt.wikimedia.org stretch-wikimedia [10:42:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:08] 10Operations, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10fgiunchedi) [10:42:12] (03PS1) 10Banyek: mariadb: depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481840 (https://phabricator.wikimedia.org/T85757) [10:43:25] !log removed labvirt1013 from debmonitor, got renamed in T212513 [10:43:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:28] T212513: Rename labvirt1013 to cloudvirt1013, move to eqiad1 - https://phabricator.wikimedia.org/T212513 [10:45:51] !log ms-be2018 Flashing Smart Array P840 in Slot 3 [ 3.00 -> 6.60 ] [10:45:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:23] !log testing the new spicerack package on cumin2001, in the unlikely event you need to use spicerack cookbooks today please use cumin1001 [10:48:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:45] (03CR) 10Marostegui: [C: 04-1] mariadb: depool db1086 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481830 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:48:57] (03CR) 10Marostegui: [C: 03+1] mariadb: depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481839 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:49:31] (03CR) 10Marostegui: [C: 04-1] mariadb: depool db1079 (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481840 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:49:44] (03CR) 10Marostegui: [C: 03+1] mariadb: depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481837 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:50:05] (03CR) 10Marostegui: [C: 04-1] mariadb: depool db1094 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481832 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:50:21] (03PS2) 10Banyek: mariadb: depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481830 (https://phabricator.wikimedia.org/T85757) [10:51:02] (03CR) 10Marostegui: [C: 04-1] mariadb: depool db1090:3317 (034 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481831 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:51:16] (03PS2) 10Banyek: mariadb: depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481832 (https://phabricator.wikimedia.org/T85757) [10:51:20] (03CR) 10Marostegui: [C: 03+1] mariadb: depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481830 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:51:41] (03CR) 10Marostegui: [C: 03+1] mariadb: depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481832 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:52:25] (03PS2) 10Banyek: mariadb: depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481840 (https://phabricator.wikimedia.org/T85757) [10:52:30] !log rebooting centrallog1001 for kernel security update [10:52:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:45] (03CR) 10Marostegui: [C: 03+1] mariadb: depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481840 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:54:20] (03PS2) 10Banyek: mariadb: depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481831 (https://phabricator.wikimedia.org/T85757) [10:55:18] (03CR) 10Marostegui: [C: 03+1] mariadb: depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481831 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [10:56:23] (03Abandoned) 10Mathew.onipe: Revert "Revoke ladsgroup access due to lost laptop" [puppet] - 10https://gerrit.wikimedia.org/r/481604 (owner: 10Mathew.onipe) [10:59:39] !log replace TLS certificates on ms-fe codfw hosts T212215 [10:59:40] 10Operations, 10ops-codfw, 10media-storage: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 (10fgiunchedi) [10:59:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:42] T212215: Update Subject Alternative Name field in TLS certificates for swift - https://phabricator.wikimedia.org/T212215 [11:03:27] PROBLEM - Check systemd state on ms-be1028 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:05:51] (03PS2) 10Ema: swift: new cert for ms-fe.svc.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/481136 (https://phabricator.wikimedia.org/T212215) [11:06:25] (03CR) 10Ema: [C: 03+2] swift: new cert for ms-fe.svc.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/481136 (https://phabricator.wikimedia.org/T212215) (owner: 10Ema) [11:10:45] RECOVERY - Check systemd state on ms-be1028 is OK: OK - running: The system is fully operational [11:10:56] !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe2006.codfw.wmnet [11:10:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:45] 10Operations: Track remaining trusty servers in production - https://phabricator.wikimedia.org/T212772 (10MoritzMuehlenhoff) [11:17:33] !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=ms-fe2006.codfw.wmnet [11:17:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:02] (03Abandoned) 10Filippo Giunchedi: WIP: temporary workaround co-installability of two roles [puppet] - 10https://gerrit.wikimedia.org/r/470346 (owner: 10Filippo Giunchedi) [11:20:21] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481842 (https://phabricator.wikimedia.org/T202167) [11:21:35] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481842 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [11:22:39] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481842 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [11:23:42] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 T86338 T202167 (duration: 00m 45s) [11:23:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:46] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [11:23:46] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [11:24:34] 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Joe) [11:24:36] 10Operations, 10Core Platform Team Backlog (Watching / External), 10HHVM, 10Patch-For-Review, and 3 others: Correctly collect logs from php-fpm pools - https://phabricator.wikimedia.org/T211184 (10Joe) 05Open→03Resolved [11:24:37] !log Deploy schema change on db1099:3311 - T86338 T202167 [11:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:26] 10Operations, 10Release-Engineering-Team, 10Scap, 10Patch-For-Review, 10User-ArielGlenn: Make scap and opcache work consistently together - https://phabricator.wikimedia.org/T211964 (10Joe) a:03Joe [11:26:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481842 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [11:27:16] akosiaris: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/481341/ <--- ping [11:27:27] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] wmcs: Add postgres maps users for eqiad1-r region [puppet] - 10https://gerrit.wikimedia.org/r/481341 (https://phabricator.wikimedia.org/T212596) (owner: 10BryanDavis) [11:29:18] (03CR) 10Filippo Giunchedi: [C: 03+2] Fix Build-Depends [debs/python-etcd] - 10https://gerrit.wikimedia.org/r/478985 (https://phabricator.wikimedia.org/T209136) (owner: 10Filippo Giunchedi) [11:41:23] !log rebooting labtestweb2001 for kernel security update [11:41:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:36] (03CR) 10Filippo Giunchedi: [C: 03+1] swift: new cert for ms-fe.svc.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/481137 (https://phabricator.wikimedia.org/T212215) (owner: 10Ema) [11:46:18] (03PS2) 10Ema: swift: new cert for ms-fe.svc.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/481137 (https://phabricator.wikimedia.org/T212215) [11:46:39] !log replace TLS certificates on ms-fe eqiad hosts T212215 [11:46:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:41] T212215: Update Subject Alternative Name field in TLS certificates for swift - https://phabricator.wikimedia.org/T212215 [11:47:33] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: python3-etcd needs python3-dnspython - https://phabricator.wikimedia.org/T209136 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Updated and uploaded python-etcd to jessie/stretch: ` root@install1002:~# apt-cache show python-etcd | g... [11:48:32] (03CR) 10Ema: [C: 03+2] swift: new cert for ms-fe.svc.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/481137 (https://phabricator.wikimedia.org/T212215) (owner: 10Ema) [11:49:59] !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe1006.codfw.wmnet [11:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:50] ha, almost! [11:51:06] !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe1006.eqiad.wmnet [11:51:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:23] !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=ms-fe1006.eqiad.wmnet [11:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:54] !log rebooting labtestpuppetmaster2001 for kernel security update [12:00:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:13] 10Operations, 10Traffic, 10Patch-For-Review: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [12:02:17] 10Operations, 10Traffic, 10media-storage, 10Patch-For-Review: Update Subject Alternative Name field in TLS certificates for swift - https://phabricator.wikimedia.org/T212215 (10ema) 05Open→03Resolved New certificates deployed both in codfw and in eqiad. [12:09:51] (03PS2) 10Muehlenhoff: Clarify expected format of service name in wmf-auto-restart [puppet] - 10https://gerrit.wikimedia.org/r/480520 (https://phabricator.wikimedia.org/T212219) [12:11:51] (03CR) 10jerkins-bot: [V: 04-1] Clarify expected format of service name in wmf-auto-restart [puppet] - 10https://gerrit.wikimedia.org/r/480520 (https://phabricator.wikimedia.org/T212219) (owner: 10Muehlenhoff) [12:18:33] (03PS3) 10Muehlenhoff: Clarify expected format of service name in wmf-auto-restart [puppet] - 10https://gerrit.wikimedia.org/r/480520 (https://phabricator.wikimedia.org/T212219) [12:19:17] (03CR) 10jerkins-bot: [V: 04-1] Clarify expected format of service name in wmf-auto-restart [puppet] - 10https://gerrit.wikimedia.org/r/480520 (https://phabricator.wikimedia.org/T212219) (owner: 10Muehlenhoff) [12:23:46] (03PS2) 10Muehlenhoff: Remove obsolete Hiera entries for labstore1001/labstore1002 [puppet] - 10https://gerrit.wikimedia.org/r/481159 (https://phabricator.wikimedia.org/T187456) [12:33:55] (03PS4) 10Muehlenhoff: Clarify expected format of service name in wmf-auto-restart [puppet] - 10https://gerrit.wikimedia.org/r/480520 (https://phabricator.wikimedia.org/T212219) [12:34:33] (03CR) 10jerkins-bot: [V: 04-1] Clarify expected format of service name in wmf-auto-restart [puppet] - 10https://gerrit.wikimedia.org/r/480520 (https://phabricator.wikimedia.org/T212219) (owner: 10Muehlenhoff) [12:36:30] (03PS5) 10Muehlenhoff: Clarify expected format of service name in wmf-auto-restart [puppet] - 10https://gerrit.wikimedia.org/r/480520 (https://phabricator.wikimedia.org/T212219) [12:40:29] (03CR) 10Ema: [C: 03+1] Clarify expected format of service name in wmf-auto-restart [puppet] - 10https://gerrit.wikimedia.org/r/480520 (https://phabricator.wikimedia.org/T212219) (owner: 10Muehlenhoff) [12:41:36] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete Hiera entries for labstore1001/labstore1002 [puppet] - 10https://gerrit.wikimedia.org/r/481159 (https://phabricator.wikimedia.org/T187456) (owner: 10Muehlenhoff) [12:50:26] (03PS1) 10Muehlenhoff: Record extended access for jsamra [puppet] - 10https://gerrit.wikimedia.org/r/481844 [12:52:06] (03CR) 10Muehlenhoff: [C: 03+2] Record extended access for jsamra [puppet] - 10https://gerrit.wikimedia.org/r/481844 (owner: 10Muehlenhoff) [12:57:37] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481846 [12:59:07] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481846 (owner: 10Marostegui) [13:00:13] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481846 (owner: 10Marostegui) [13:01:12] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 T86338 T202167 (duration: 00m 47s) [13:01:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:16] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [13:01:17] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [13:02:15] (03PS1) 10Marostegui: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481847 (https://phabricator.wikimedia.org/T86338) [13:03:35] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481847 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [13:03:41] (03PS5) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [13:04:09] (03CR) 10jerkins-bot: [V: 04-1] openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [13:04:39] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481847 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [13:05:48] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1119 T86338 T202167 (duration: 00m 45s) [13:05:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:47] !log Deploy schema change on db1119 - T86338 T202167 [13:06:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:51] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [13:06:51] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [13:08:44] (03PS6) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [13:09:11] (03CR) 10jerkins-bot: [V: 04-1] openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [13:11:07] (03PS7) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [13:11:16] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481846 (owner: 10Marostegui) [13:11:18] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481847 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [13:11:33] (03CR) 10jerkins-bot: [V: 04-1] openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [13:13:39] !log stopping replication on db2077 prior to executing schema change on codfw s7 master (db2040) - T85757 [13:13:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:42] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [13:17:25] !log executing schema change on db2040 (s7 codfw master) replication lag could be expected on codfw - T85757 [13:17:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:34] (03PS8) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [13:17:42] !log installing openjpeg2 security updates [13:17:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:10] (03CR) 10jerkins-bot: [V: 04-1] openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [13:24:20] (03PS9) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [13:27:21] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481848 [13:28:33] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481848 (owner: 10Marostegui) [13:29:43] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481848 (owner: 10Marostegui) [13:30:41] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1119 T86338 T202167 (duration: 00m 44s) [13:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:30:45] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [13:30:45] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [13:33:53] !log installing libav security updates [13:33:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:06] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481848 (owner: 10Marostegui) [13:39:10] !log installing ghostscript update for stretch [13:39:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:59] PROBLEM - MariaDB Slave Lag: s7 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1243.17 seconds [13:45:35] PROBLEM - MariaDB Slave SQL: s7 on db2095 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Could not execute Update_rows_v1 event on table arwiki.user: Unknown column user_options in NEW, Error_code: 1054: handler error HA_ERR_GENERIC: the events master log db2077-bin.001819, end_log_pos 591716980 [13:47:17] PROBLEM - MariaDB Slave Lag: s7 on db2054 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 438.04 seconds [13:47:25] PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 434.03 seconds [13:47:29] banyek: ^ [13:47:50] I logged the lags will happen [13:48:04] the trigger fix also expected [13:48:23] banyek: I thought you'd downtime the hosts to avoid noise [13:48:31] RECOVERY - MariaDB Slave Lag: s7 on db2054 is OK: OK slave_sql_lag Replication lag: 0.21 seconds [13:48:39] RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 26.42 seconds [13:48:43] the script is loaded, I'll fire it when the current run finished [13:49:14] marostegui: downtime all hosts in codfw s7? [13:49:21] I guess so [13:55:15] ACKNOWLEDGEMENT - MariaDB Slave Lag: s7 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1823.15 seconds Banyek T85757 [13:55:15] ACKNOWLEDGEMENT - MariaDB Slave SQL: s7 on db2095 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Could not execute Update_rows_v1 event on table arwiki.user: Unknown column user_options in NEW, Error_code: 1054: handler error HA_ERR_GENERIC: the events master log db2077-bin.001819, end_log_pos 591716980 Banyek T85757 [13:58:42] 10Operations, 10Operations-Software-Development: Upgrade Cumin masters to stretch - https://phabricator.wikimedia.org/T177385 (10Volans) 05Open→03Resolved a:03Volans Migration has been completed and `cumin[12]001` are fully in service since few weeks. Resolving. [14:04:51] (03PS2) 10Giuseppe Lavagetto: stdlib: use Stdlib::Port in place of our local definition [puppet] - 10https://gerrit.wikimedia.org/r/481818 [14:09:45] RECOVERY - MariaDB Slave SQL: s7 on db2095 is OK: OK slave_sql_state Slave_SQL_Running: Yes [14:15:13] !log cp hosts: upgrade OpenSSL from 1.1.0f to 1.1.0j [14:15:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:39] PROBLEM - MariaDB Slave Lag: s7 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 716.31 seconds [14:17:43] PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 719.53 seconds [14:17:43] PROBLEM - MariaDB Slave Lag: s7 on db2077 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 719.54 seconds [14:17:44] (03PS10) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [14:18:11] (03CR) 10jerkins-bot: [V: 04-1] openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [14:18:35] ACKNOWLEDGEMENT - MariaDB Slave Lag: s7 on db2054 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 711.16 seconds Banyek T85757 [14:18:35] ACKNOWLEDGEMENT - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 719.53 seconds Banyek T85757 [14:18:35] ACKNOWLEDGEMENT - MariaDB Slave Lag: s7 on db2077 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 719.54 seconds Banyek T85757 [14:18:35] ACKNOWLEDGEMENT - MariaDB Slave Lag: s7 on db2086 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 700.56 seconds Banyek T85757 [14:18:35] ACKNOWLEDGEMENT - MariaDB Slave Lag: s7 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 716.31 seconds Banyek T85757 [14:21:08] (03PS11) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [14:21:35] (03CR) 10jerkins-bot: [V: 04-1] openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [14:24:23] (03PS1) 10Volans: admin_reason: fix default value for task [software/spicerack] - 10https://gerrit.wikimedia.org/r/481854 [14:24:25] (03PS1) 10Volans: dns: include NXDOMAIN in the DnsNotFound exception [software/spicerack] - 10https://gerrit.wikimedia.org/r/481855 [14:24:27] (03PS1) 10Volans: puppet: fix subprocess call to check_output() [software/spicerack] - 10https://gerrit.wikimedia.org/r/481856 [14:24:29] (03PS1) 10Volans: icinga: fix command_file property [software/spicerack] - 10https://gerrit.wikimedia.org/r/481857 [14:24:31] (03PS1) 10Volans: remote: suppress Cumin's output [software/spicerack] - 10https://gerrit.wikimedia.org/r/481858 (https://phabricator.wikimedia.org/T212783) [14:24:59] (03PS12) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [14:26:09] RECOVERY - MariaDB Slave Lag: s7 on db2087 is OK: OK slave_sql_lag Replication lag: 0.44 seconds [14:29:49] RECOVERY - MariaDB Slave Lag: s7 on db2077 is OK: OK slave_sql_lag Replication lag: 0.37 seconds [14:30:50] (03PS13) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [14:34:41] RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 17.02 seconds [14:35:49] RECOVERY - MariaDB Slave Lag: s7 on db2095 is OK: OK slave_sql_lag Replication lag: 0.32 seconds [14:35:56] (03PS14) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [14:38:13] !log depooling db1086 for schema change (T85757) [14:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:16] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [14:40:06] (03CR) 10Banyek: [C: 03+2] mariadb: depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481830 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:41:40] (03CR) 10Arturo Borrero Gonzalez: "Compilation result looks sane: https://puppet-compiler.wmflabs.org/compiler1001/14098/" [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [14:41:50] (03PS5) 10Ottomata: Adjust params for Analytics data_purge EventLoggingSanitization job [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [14:42:58] (03PS3) 10Banyek: mariadb: depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481830 (https://phabricator.wikimedia.org/T85757) [14:44:22] (03CR) 10Ottomata: [C: 03+2] Adjust params for Analytics data_purge EventLoggingSanitization job [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [14:44:46] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Adjust params for Analytics data_purge EventLoggingSanitization job [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [14:47:23] shdubsh: O glorious on-duty SRE-er, can I bring your attention to https://gerrit.wikimedia.org/r/c/operations/dns/+/481795 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/481796 (CC Reedy). [14:47:55] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: depool db1086 for schema change - T85757 (duration: 00m 45s) [14:47:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:58] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [14:48:52] !log installing ghostscript security update for jessie [14:48:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:24] (03PS1) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 [14:49:34] !log executing schema change on db1086 - T85757 [14:49:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:33] (03CR) 10Giuseppe Lavagetto: [C: 04-1] Add test-commons.wikimedia.org to prod_sites.pp (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/481796 (https://phabricator.wikimedia.org/T197616) (owner: 10Reedy) [14:50:47] <_joe_> James_F: I didn't realize you already had patches [14:51:52] (03PS2) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 [14:52:10] (03PS4) 10Paladox: wmflib: Add support for puppet6 in require_package [puppet] - 10https://gerrit.wikimedia.org/r/481271 [14:52:33] (03PS3) 10Paladox: php: Add support for puppet6 [puppet] - 10https://gerrit.wikimedia.org/r/481269 [14:52:43] (03PS6) 10Paladox: ircecho: Migrate from OptionParser to ArgumentParser [puppet] - 10https://gerrit.wikimedia.org/r/480760 [14:52:55] (03PS3) 10Paladox: ircecho: Drop sysvinit support [puppet] - 10https://gerrit.wikimedia.org/r/480789 [14:53:04] (03CR) 10Elukey: "no op for pcc https://puppet-compiler.wmflabs.org/compiler1002/14100/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 (owner: 10Elukey) [14:53:13] (03PS7) 10Paladox: wmlib: Fix support for puppet6 in php_ini.rb, ini.rb and ordered_yaml.rb [puppet] - 10https://gerrit.wikimedia.org/r/481254 [14:54:11] _joe_: Ha, sorry for not being clear. [14:54:49] <_joe_> paladox: why fix puppet6 incompatibilities when we're still on 4.8? [14:55:05] <_joe_> I'd rather get rid of ordered_yaml, for instance [14:55:30] _joe_ i've been upgrading to puppet6 at some where else (we use those modules) [14:55:47] (03CR) 10jenkins-bot: mariadb: depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481830 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:55:53] <_joe_> well I'm not even sure what you're trying to do there [14:56:08] <_joe_> I'd rather convert those functions to the new api [14:56:27] oh [14:56:29] (03CR) 10GTirloni: [C: 03+2] ircecho: Drop sysvinit support [puppet] - 10https://gerrit.wikimedia.org/r/480789 (owner: 10Paladox) [14:56:31] <_joe_> also I don't think we need to have our code more complex for supporting a platform we don't use [14:57:01] (03PS4) 10Jforrester: Add test-commons.wikimedia.org to prod_sites.pp [puppet] - 10https://gerrit.wikimedia.org/r/481796 (https://phabricator.wikimedia.org/T197616) (owner: 10Reedy) [14:57:28] _joe_: Not sure if that fix of mine is right. [14:57:39] (03CR) 10Jforrester: Add test-commons.wikimedia.org to prod_sites.pp (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/481796 (https://phabricator.wikimedia.org/T197616) (owner: 10Reedy) [14:57:58] _joe_ i was only trying to help :) [14:58:22] <_joe_> paladox: I know, and thanks for that :) [14:58:25] (03CR) 10GTirloni: [C: 03+1] ircecho: Drop sysvinit support [puppet] - 10https://gerrit.wikimedia.org/r/480789 (owner: 10Paladox) [14:58:49] <_joe_> I'm just saying I'd go in a different direction (the new api that should work well on all puppet versions AIUI) [14:59:04] (03CR) 10GTirloni: [C: 03+1] "Any idea why the rebase didn't fix the merge conflict?" [puppet] - 10https://gerrit.wikimedia.org/r/480789 (owner: 10Paladox) [14:59:08] <_joe_> wmflib is very delicate for us, so I'm a bit conservative in general [14:59:19] ah ok [15:00:34] (03CR) 10Paladox: "> Any idea why the rebase didn't fix the merge conflict?" [puppet] - 10https://gerrit.wikimedia.org/r/480789 (owner: 10Paladox) [15:00:35] (03CR) 10Andrew Bogott: "It's blocked by https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/480760/ because it's the second in a two-patch set" [puppet] - 10https://gerrit.wikimedia.org/r/480789 (owner: 10Paladox) [15:07:45] (03CR) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude (031 comment) [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 (owner: 10Elukey) [15:07:51] !log repooling db1086 after schema change (T85757) [15:07:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:53] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [15:07:58] (03CR) 10Muehlenhoff: [C: 03+1] "Looks fine and PCC also agrees. (considering the underlying change for the modified arg parser). BTW, if you make this implicitly systemd-" [puppet] - 10https://gerrit.wikimedia.org/r/480789 (owner: 10Paladox) [15:08:20] (03CR) 10Ottomata: hadoop::namenode: add the possibility to add hosts to hosts.exclude (032 comments) [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 (owner: 10Elukey) [15:08:32] (03PS1) 10Banyek: Revert "mariadb: depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481863 [15:08:36] (03PS4) 10Paladox: ircecho: Drop sysvinit support [puppet] - 10https://gerrit.wikimedia.org/r/480789 [15:10:58] (03CR) 10Banyek: [C: 03+2] Revert "mariadb: depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481863 (owner: 10Banyek) [15:11:40] (03CR) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude (032 comments) [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 (owner: 10Elukey) [15:12:04] (03Merged) 10jenkins-bot: Revert "mariadb: depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481863 (owner: 10Banyek) [15:13:36] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: repool db1086 after schema change - T85757 (duration: 00m 44s) [15:13:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:39] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [15:14:28] (03PS3) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 [15:20:13] (03CR) 10Ottomata: [C: 03+1] "Oh riiiight, I forgot about that. Hm. I dunno which is better...I think this is fine." [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 (owner: 10Elukey) [15:20:28] (03CR) 10Andrew Bogott: "In case you mean me, I definitely don't know what wgNoticeProjects is" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000) [15:20:31] 10Operations, 10Wikimedia-Mailing-lists, 10Serbian-Sites: Mailing list for Wikinews in Serbian - https://phabricator.wikimedia.org/T17648 (10Liuxinyu970226) [15:21:37] (03CR) 10jenkins-bot: Revert "mariadb: depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481863 (owner: 10Banyek) [15:22:28] 10Operations, 10Wikimedia-Site-requests, 10Patch-For-Review, 10Serbian-Sites, and 2 others: Logo for sr.wikiquote.org - https://phabricator.wikimedia.org/T168444 (10Liuxinyu970226) [15:25:21] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: convert to use class httpd [puppet] - 10https://gerrit.wikimedia.org/r/475770 [15:25:23] (03PS1) 10Giuseppe Lavagetto: jobrunner: check health-check.php from nagios [puppet] - 10https://gerrit.wikimedia.org/r/481864 [15:25:25] (03PS1) 10Giuseppe Lavagetto: jobrunner: enable php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/481865 [15:25:27] (03PS1) 10Giuseppe Lavagetto: jobrunner: support php7 [puppet] - 10https://gerrit.wikimedia.org/r/481866 [15:31:34] !log depooling db1090:3317 for schema change (T85757) [15:31:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:36] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [15:31:52] (03CR) 10Banyek: [C: 03+2] mariadb: depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481831 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [15:34:57] !log installing OpenSSL security updates [15:34:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:15] (03PS3) 10Banyek: mariadb: depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481831 (https://phabricator.wikimedia.org/T85757) [15:39:41] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: depool db1090:3317 for schema change - T85757 (duration: 00m 44s) [15:39:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:44] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [15:42:30] (03PS5) 10Jforrester: Add test-commons.wikimedia.org to prod_sites.pp [puppet] - 10https://gerrit.wikimedia.org/r/481796 (https://phabricator.wikimedia.org/T197616) (owner: 10Reedy) [15:47:04] (03CR) 10jenkins-bot: mariadb: depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481831 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [15:49:14] (03PS2) 10Andrew Bogott: wmcs: Add a cli script for managing dynamicproxy entries [puppet] - 10https://gerrit.wikimedia.org/r/478377 (https://phabricator.wikimedia.org/T211367) (owner: 10BryanDavis) [15:50:09] (03CR) 10Andrew Bogott: [C: 03+2] wmcs: Add a cli script for managing dynamicproxy entries [puppet] - 10https://gerrit.wikimedia.org/r/478377 (https://phabricator.wikimedia.org/T211367) (owner: 10BryanDavis) [15:54:20] (03CR) 10Andrew Bogott: [C: 03+2] "Sorry, I should've read to the end of your comment. It looks like you checked all the things I would check." [puppet] - 10https://gerrit.wikimedia.org/r/470726 (https://phabricator.wikimedia.org/T162070) (owner: 10Dzahn) [15:54:24] (03PS4) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 [15:54:27] (03PS2) 10Andrew Bogott: hieradata/labs: remove mysql::server::use_apparmor: false [puppet] - 10https://gerrit.wikimedia.org/r/470726 (https://phabricator.wikimedia.org/T162070) (owner: 10Dzahn) [15:56:19] (03PS1) 10Bmansurov: Recommendation API: increase mysql connection limit for service [puppet] - 10https://gerrit.wikimedia.org/r/481871 (https://phabricator.wikimedia.org/T212154) [16:01:14] (03CR) 10Ottomata: [C: 03+2] 0.214-2 - remove --data-dir from systemd service unit [debs/presto] (debian) - 10https://gerrit.wikimedia.org/r/479268 (owner: 10Ottomata) [16:03:46] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) Part 1 of this should be actioned by WMDE either this or next week. [16:08:27] (03PS1) 10Kosta Harlan: Beta labs: Enable help panel logging on enwiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481874 (https://phabricator.wikimedia.org/T211942) [16:10:06] (03PS5) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 [16:10:27] (03PS15) 10Arturo Borrero Gonzalez: openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) [16:11:34] !log T212302 disable puppet in all {cloud,lab}virt* servers to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/481194/ [16:11:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:37] T212302: CloudVPS: upgrade: jessie -> stretch & mitaka -> newton - https://phabricator.wikimedia.org/T212302 [16:11:49] (03CR) 10Filippo Giunchedi: "diff looks like the one below, I'm assuming we'll be achieving the same effects of "net none" and "rlimit-fsize" via systemd service optio" [puppet] - 10https://gerrit.wikimedia.org/r/481142 (owner: 10Muehlenhoff) [16:13:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: virt: introduce per-component per-openstack per-distro classes [puppet] - 10https://gerrit.wikimedia.org/r/481194 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [16:16:15] (03PS2) 10Ottomata: Allow pull based rsync between stat & notebook boxes only [puppet] - 10https://gerrit.wikimedia.org/r/476920 (https://phabricator.wikimedia.org/T205157) [16:17:12] 10Operations, 10Wikimedia-Mailing-lists: Administrator password recovery for wmfaliens@lists.wikimedia.org - https://phabricator.wikimedia.org/T212525 (10colewhite) 05Open→03Resolved [16:18:49] (03CR) 10Volans: cumin: Allow Puppet DB backend to be used within Labs projects that use it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk) [16:18:55] (03PS1) 10Dzahn: admins: add absented ladsgroup to absent group [puppet] - 10https://gerrit.wikimedia.org/r/481875 [16:19:25] (03PS3) 10Ottomata: Allow pull based rsync between stat & notebook boxes only [puppet] - 10https://gerrit.wikimedia.org/r/476920 (https://phabricator.wikimedia.org/T205157) [16:20:40] (03CR) 10Muehlenhoff: "With all of 481141, 481140 and 481139 merged, the PCC diff will/should look identical, we can doublecheck when those are merged." [puppet] - 10https://gerrit.wikimedia.org/r/481142 (owner: 10Muehlenhoff) [16:21:36] (03CR) 10Dzahn: [C: 03+2] admins: add absented ladsgroup to absent group [puppet] - 10https://gerrit.wikimedia.org/r/481875 (owner: 10Dzahn) [16:22:16] 10Operations, 10Operations-Software-Development, 10Goal: Expand Netbox usage - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205868 (10Volans) 05Open→03Resolved Resolving as the quarter is over and the goal-specific tasks have been completed. [16:22:40] 10Operations, 10Operations-Software-Development, 10Goal: Expand Netbox usage - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205868 (10Volans) [16:22:40] !log repooling db1090:3317 after schema change (T85757) [16:22:41] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10Volans) [16:22:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:42] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [16:22:51] (03CR) 10Filippo Giunchedi: [C: 03+1] "Ack, thanks! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/481142 (owner: 10Muehlenhoff) [16:22:54] 10Operations, 10Wikimedia-General-or-Unknown, 10Security: Massive spambot registrations at dinwiki - https://phabricator.wikimedia.org/T212519 (10sbassett) There's a private ticket (T212679) where similar issues on a couple of other wikis were being addressed over the holiday break. It looks like this issue... [16:23:09] !log T212302 re-enable puppet in all {cloud,lab}virt* servers, all was fine [16:23:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:12] T212302: CloudVPS: upgrade: jessie -> stretch & mitaka -> newton - https://phabricator.wikimedia.org/T212302 [16:23:26] 10Operations, 10Operations-Software-Development: Cumin: add backend for Netbox - https://phabricator.wikimedia.org/T205900 (10Volans) [16:23:28] 10Operations, 10Operations-Software-Development, 10Goal: Expand Netbox usage - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205868 (10Volans) [16:24:00] 10Operations, 10Operations-Software-Development: Cumin: add backend for Netbox - https://phabricator.wikimedia.org/T205900 (10Volans) Removed the parent task as the stretch part of the goal was not done, but keeping the task around as we want to create this backend anyway. [16:24:11] 10Operations, 10Wikimedia-General-or-Unknown, 10Security: Massive spambot registrations at dinwiki - https://phabricator.wikimedia.org/T212519 (10sbassett) p:05Triage→03Normal [16:24:16] 10Operations, 10Operations-Software-Development, 10Goal: Expand Netbox usage - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205868 (10Volans) [16:24:36] (03PS6) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 [16:25:09] !log restarting Hue to pick up openssl security update [16:25:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:54] volans, are you saying I should hiera('puppetdb_host') ? [16:26:14] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: convert to use class httpd [puppet] - 10https://gerrit.wikimedia.org/r/475770 [16:27:05] (03CR) 10Banyek: [C: 03+1] "As we were talking on the ticket this seems good to me, but I won't mind if there would be an another +1 from Jaime or Manuel" [puppet] - 10https://gerrit.wikimedia.org/r/481871 (https://phabricator.wikimedia.org/T212154) (owner: 10Bmansurov) [16:27:42] (03CR) 10Dzahn: [C: 03+2] langlist: Add hyw [dns] - 10https://gerrit.wikimedia.org/r/481335 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [16:27:56] (03PS1) 10Banyek: Revert "mariadb: depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481878 [16:28:09] Krenair: I guess so, labs projects that use puppetdb should be able to set it if they don't have it already [16:28:14] (03PS7) 10Elukey: hadoop::namenode: add the possibility to add hosts to hosts.exclude [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 [16:28:28] (03CR) 10Dzahn: [C: 03+2] "Western Armenian, also see "[Wikimedia-l] New language code for Western Armenian language"" [dns] - 10https://gerrit.wikimedia.org/r/481335 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [16:28:40] volans, they should, I'm more concerned about what the stylelint will have to say about it [16:29:04] !log restarting Superset to pick up openssl security update [16:29:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:06] I'll give it a go but if the stylelint says no I will revert [16:29:22] Krenair: also I think it might have some conflict rebasing [16:30:10] !log reimaging cloudvirt1030 with stretch, server cleanup after puppet refactoring [16:30:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:14] (03CR) 10Dzahn: [C: 03+2] "since https://phabricator.wikimedia.org/T97051 is resolved this should not need any further action anymore :)" [dns] - 10https://gerrit.wikimedia.org/r/481335 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [16:30:24] (03PS17) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 [16:30:56] (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk) [16:31:05] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Finally works https://puppet-compiler.wmflabs.org/compiler1002/14115/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481861 (owner: 10Elukey) [16:32:07] (03CR) 10Banyek: [C: 03+2] Revert "mariadb: depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481878 (owner: 10Banyek) [16:33:24] (03PS3) 10Dzahn: hieradata/labs: remove mysql::server::use_apparmor: false [puppet] - 10https://gerrit.wikimedia.org/r/470726 (https://phabricator.wikimedia.org/T162070) [16:33:30] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: repool db1090:3317 after schema change - T85757 (duration: 00m 46s) [16:33:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:33] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [16:33:39] (03CR) 10Dzahn: "thanks Andrew! merging it" [puppet] - 10https://gerrit.wikimedia.org/r/470726 (https://phabricator.wikimedia.org/T162070) (owner: 10Dzahn) [16:35:44] (03PS18) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 [16:35:59] (03CR) 10Dzahn: [C: 03+1] "let me know once cherry-picked" [puppet] - 10https://gerrit.wikimedia.org/r/481201 (https://phabricator.wikimedia.org/T209361) (owner: 10Hashar) [16:36:35] (03PS2) 10Dzahn: ci: Permit ES traffic from jenkins masters to relforge [puppet] - 10https://gerrit.wikimedia.org/r/479567 (https://phabricator.wikimedia.org/T78705) (owner: 10Dduvall) [16:38:44] (03CR) 10jenkins-bot: Revert "mariadb: depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481878 (owner: 10Banyek) [16:39:01] 10Operations, 10ops-eqiad: Interface errors on cr1-eqiad:xe-3/3/1 - https://phabricator.wikimedia.org/T212791 (10ayounsi) p:05Triage→03Normal [16:39:14] (03PS2) 10Bmansurov: Recommendation API: increase mysql connection limit for service [puppet] - 10https://gerrit.wikimedia.org/r/481871 (https://phabricator.wikimedia.org/T212154) [16:39:21] (03PS1) 10Elukey: hadoop::master: add support for excluded hosts [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481880 [16:39:49] (03CR) 10Marostegui: [C: 04-1] "Let's discuss on the ticket." [puppet] - 10https://gerrit.wikimedia.org/r/481871 (https://phabricator.wikimedia.org/T212154) (owner: 10Bmansurov) [16:41:05] (03CR) 10Elukey: [V: 03+2 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14117/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481880 (owner: 10Elukey) [16:41:58] (03CR) 10Volans: [C: 04-1] "One small change required, see inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk) [16:43:58] !log remove BGP sessions to AS6866 in AMS-IX (leaving the IX) [16:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:45] !log remove BGP sessions to AS42949 in AMS-IX (leaving the IX) [16:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:55] (03PS1) 10Elukey: profile::hadoop:master/standby: support a custom hosts.exclude file [puppet] - 10https://gerrit.wikimedia.org/r/481882 (https://phabricator.wikimedia.org/T209929) [16:50:10] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/14118/" [puppet] - 10https://gerrit.wikimedia.org/r/481882 (https://phabricator.wikimedia.org/T209929) (owner: 10Elukey) [16:50:26] !log create BGP sessions to AS3214 in AMS-IX [16:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:59] 10Operations, 10Wikimedia-Mailing-lists: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10Ladsgroup) >>! In T212674#4845557, @jrbs wrote: > The black and white logo is just for the Foundation. The coloured logo is still us... [16:52:07] (03CR) 10Volans: [C: 04-1] "I think that this is not the right approach, because a user might want to query hosts only in one specific region for example." [software/cumin] - 10https://gerrit.wikimedia.org/r/477811 (https://phabricator.wikimedia.org/T208861) (owner: 10Andrew Bogott) [16:52:23] 10Operations, 10Wikimedia-Mailing-lists: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10Dzahn) [16:52:35] (03CR) 10Elukey: [C: 03+2] profile::hadoop:master/standby: support a custom hosts.exclude file [puppet] - 10https://gerrit.wikimedia.org/r/481882 (https://phabricator.wikimedia.org/T209929) (owner: 10Elukey) [16:52:45] 10Operations, 10Wikimedia-Mailing-lists, 10Design: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10Dzahn) [16:53:16] 10Operations, 10DBA, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) [16:53:55] 10Operations, 10Wikimedia-Mailing-lists, 10Design: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10jrbs) I believe keeping RGB around was a compromise made by Comms when they did the redesign, so I don't think there is... [16:54:26] 10Operations, 10WMF-Communications, 10Wikimedia-Mailing-lists, 10Design: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10jrbs) [16:58:44] (03PS1) 10Elukey: role::analytics_cluster::hadoop:master/standby: remove an1028 [puppet] - 10https://gerrit.wikimedia.org/r/481885 (https://phabricator.wikimedia.org/T209929) [16:59:30] PROBLEM - puppet last run on an-master1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:01:08] this is me --^ [17:01:52] (03PS1) 10Arturo Borrero Gonzalez: openstack: nova: mitaka/stretch: require packages before installing nova-common [puppet] - 10https://gerrit.wikimedia.org/r/481887 (https://phabricator.wikimedia.org/T212302) [17:02:19] (03CR) 10jerkins-bot: [V: 04-1] openstack: nova: mitaka/stretch: require packages before installing nova-common [puppet] - 10https://gerrit.wikimedia.org/r/481887 (https://phabricator.wikimedia.org/T212302) (owner: 10Arturo Borrero Gonzalez) [17:03:04] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14119/" [puppet] - 10https://gerrit.wikimedia.org/r/481885 (https://phabricator.wikimedia.org/T209929) (owner: 10Elukey) [17:03:47] (03CR) 10Dzahn: [C: 03+2] ci: Permit ES traffic from jenkins masters to relforge [puppet] - 10https://gerrit.wikimedia.org/r/479567 (https://phabricator.wikimedia.org/T78705) (owner: 10Dduvall) [17:03:58] (03PS3) 10Dzahn: ci: Permit ES traffic from jenkins masters to relforge [puppet] - 10https://gerrit.wikimedia.org/r/479567 (https://phabricator.wikimedia.org/T78705) (owner: 10Dduvall) [17:07:26] PROBLEM - puppet last run on an-master1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:08:29] this is me again --^ [17:08:40] (03PS1) 10Elukey: hadoop::namenode: fix invalid exec relationship [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481888 [17:08:50] puppet is making me pay for those days of vacation [17:08:54] :D [17:09:11] lol [17:10:49] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14121/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481888 (owner: 10Elukey) [17:11:52] (03PS1) 10Elukey: Update cdh module to latest sha [puppet] - 10https://gerrit.wikimedia.org/r/481889 [17:14:40] (03CR) 10Elukey: [C: 03+2] Update cdh module to latest sha [puppet] - 10https://gerrit.wikimedia.org/r/481889 (owner: 10Elukey) [17:17:52] RECOVERY - puppet last run on an-master1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:19:59] (03PS1) 10Elukey: hadoop::namenode: fix erb template [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481890 [17:20:56] (03CR) 10Dzahn: [C: 03+2] "talked about the ferm part with Moritz, it's ok" [puppet] - 10https://gerrit.wikimedia.org/r/479567 (https://phabricator.wikimedia.org/T78705) (owner: 10Dduvall) [17:20:57] (03PS4) 10Dzahn: ci: Permit ES traffic from jenkins masters to relforge [puppet] - 10https://gerrit.wikimedia.org/r/479567 (https://phabricator.wikimedia.org/T78705) (owner: 10Dduvall) [17:25:00] (03CR) 10Catrope: [C: 03+2] Beta labs: Enable help panel logging on enwiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481874 (https://phabricator.wikimedia.org/T211942) (owner: 10Kosta Harlan) [17:26:05] (03CR) 10Dzahn: [C: 03+2] "on relforge1001:" [puppet] - 10https://gerrit.wikimedia.org/r/479567 (https://phabricator.wikimedia.org/T78705) (owner: 10Dduvall) [17:26:06] (03Merged) 10jenkins-bot: Beta labs: Enable help panel logging on enwiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481874 (https://phabricator.wikimedia.org/T211942) (owner: 10Kosta Harlan) [17:26:08] (03PS2) 10Elukey: hadoop::namenode: fix erb template [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481890 [17:26:29] (03PS3) 10Elukey: hadoop::namenode: fix erb template [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481890 [17:28:27] (03PS1) 10ArielGlenn: Check for truncated file content in certain circumstances [dumps] - 10https://gerrit.wikimedia.org/r/481893 (https://phabricator.wikimedia.org/T212462) [17:29:21] (03PS4) 10Elukey: hadoop::namenode: fix erb template [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481890 [17:29:40] (03CR) 10Volans: "Re-thinking about it I've decided that this is overkill (and has the drawback to show the whole icinga config when run due to Cumin's limi" [software/spicerack] - 10https://gerrit.wikimedia.org/r/481857 (owner: 10Volans) [17:31:14] (03CR) 10Volans: "I'm not convinced at all of this change, mostly sent out in response to an annoying issue with Ieafe2e9a49310c999eef29466b184e5515011efc b" [software/spicerack] - 10https://gerrit.wikimedia.org/r/481858 (https://phabricator.wikimedia.org/T212783) (owner: 10Volans) [17:31:33] (03CR) 10jenkins-bot: Beta labs: Enable help panel logging on enwiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481874 (https://phabricator.wikimedia.org/T211942) (owner: 10Kosta Harlan) [17:32:11] (03PS2) 10Arturo Borrero Gonzalez: openstack: nova: mitaka/stretch: require packages before installing nova [puppet] - 10https://gerrit.wikimedia.org/r/481887 (https://phabricator.wikimedia.org/T212302) [17:32:21] (03CR) 10Elukey: [V: 03+2 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14125/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/481890 (owner: 10Elukey) [17:32:39] (03CR) 10jerkins-bot: [V: 04-1] openstack: nova: mitaka/stretch: require packages before installing nova [puppet] - 10https://gerrit.wikimedia.org/r/481887 (https://phabricator.wikimedia.org/T212302) (owner: 10Arturo Borrero Gonzalez) [17:32:40] (03CR) 10Dzahn: [C: 04-1] "per comments from joe on the parent change that adds this module" [puppet] - 10https://gerrit.wikimedia.org/r/479580 (owner: 10Paladox) [17:33:35] (03PS1) 10Elukey: Update cdh module to latest SHA [puppet] - 10https://gerrit.wikimedia.org/r/481897 [17:33:36] (03PS3) 10Arturo Borrero Gonzalez: openstack: nova: mitaka/stretch: require packages before installing nova [puppet] - 10https://gerrit.wikimedia.org/r/481887 (https://phabricator.wikimedia.org/T212302) [17:33:54] (03CR) 10Elukey: [V: 03+2 C: 03+2] Update cdh module to latest SHA [puppet] - 10https://gerrit.wikimedia.org/r/481897 (owner: 10Elukey) [17:33:58] 10Operations, 10Wikimedia-Logstash: logstash stuck on its persistent queue - https://phabricator.wikimedia.org/T212640 (10herron) p:05Triage→03High [17:34:12] (03CR) 10Dzahn: [C: 04-1] "+1 to Elukey for using "WMF LDAP" and not "WMF Labs", consistency and also "labs" is an outdated term" [puppet] - 10https://gerrit.wikimedia.org/r/480869 (owner: 10Framawiki) [17:34:33] 10Operations: Track remaining trusty servers in production - https://phabricator.wikimedia.org/T212772 (10herron) p:05Triage→03Normal [17:35:09] (03PS4) 10Arturo Borrero Gonzalez: openstack: nova: mitaka/stretch: require packages before installing nova [puppet] - 10https://gerrit.wikimedia.org/r/481887 (https://phabricator.wikimedia.org/T212302) [17:36:01] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: nova: mitaka/stretch: require packages before installing nova [puppet] - 10https://gerrit.wikimedia.org/r/481887 (https://phabricator.wikimedia.org/T212302) (owner: 10Arturo Borrero Gonzalez) [17:36:41] (03CR) 10Dzahn: [C: 04-1] "Paladox, can you start the discussion on "request tracing"? Do we need and want that? Also, can we separate it from this change please and" [puppet] - 10https://gerrit.wikimedia.org/r/463519 (https://phabricator.wikimedia.org/T200739) (owner: 10Paladox) [17:37:40] 10Operations, 10Traffic: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10herron) p:05Triage→03Normal [17:37:51] (03CR) 10Hashar: [C: 03+1] "Cherry picked on labs, the end result seems ok to me :) Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/481201 (https://phabricator.wikimedia.org/T209361) (owner: 10Hashar) [17:38:11] (03CR) 10Effie Mouzeli: [C: 04-1] ""net none" on deployment-imagescaler03 (stretch) didn't render thumbnails, I am -1 this to investigate further" [puppet] - 10https://gerrit.wikimedia.org/r/481139 (owner: 10Muehlenhoff) [17:39:58] (03PS4) 10Dzahn: contint: remove unused classes [puppet] - 10https://gerrit.wikimedia.org/r/481201 (https://phabricator.wikimedia.org/T209361) (owner: 10Hashar) [17:40:16] (03PS1) 10Elukey: Exclude two Analytics Hadoop worker nodes for decom [puppet] - 10https://gerrit.wikimedia.org/r/481899 (https://phabricator.wikimedia.org/T209929) [17:40:32] RECOVERY - puppet last run on an-master1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:40:39] 10Operations: uwsgi's logsocket_plugin.so causes segfaults during log rotation - https://phabricator.wikimedia.org/T212697 (10herron) p:05Triage→03High a:03Volans [17:42:01] 10Operations, 10MediaWiki-Database, 10Wikidata, 10Wikimedia-production-error: DBQueryTimeoutError on Wikidata's Special:Nuke - https://phabricator.wikimedia.org/T212690 (10herron) p:05Triage→03Normal [17:43:20] 10Operations, 10monitoring: icinga doesn't log ampersand in notes_url links - https://phabricator.wikimedia.org/T212669 (10herron) p:05Triage→03Normal [17:44:11] 10Operations, 10Kubernetes: kubernetes1001 cronspam - https://phabricator.wikimedia.org/T212648 (10herron) [17:44:13] 10Operations: /dev/log symlink to /run/systemd/journal/dev-log disappeared on kubernetes1001 - https://phabricator.wikimedia.org/T212681 (10herron) [17:44:20] (03CR) 10Dzahn: [C: 03+2] contint: remove unused classes [puppet] - 10https://gerrit.wikimedia.org/r/481201 (https://phabricator.wikimedia.org/T209361) (owner: 10Hashar) [17:44:46] (03CR) 10Effie Mouzeli: [V: 04-1] "Blacklisting /sbin worked on deployment-imagescaler03 though" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/481141 (owner: 10Muehlenhoff) [17:45:12] (03CR) 10Effie Mouzeli: [V: 04-1] "Fixing the score :)" [puppet] - 10https://gerrit.wikimedia.org/r/481139 (owner: 10Muehlenhoff) [17:48:56] 10Operations, 10monitoring: icinga doesn't log ampersand in notes_url links - https://phabricator.wikimedia.org/T212669 (10Volans) a:03Volans [17:53:34] (03PS1) 10Volans: icinga: re-apply change for URL ampersends [puppet] - 10https://gerrit.wikimedia.org/r/481901 (https://phabricator.wikimedia.org/T212669) [17:54:48] (03CR) 10Muehlenhoff: [C: 03+1] ircecho: Drop sysvinit support [puppet] - 10https://gerrit.wikimedia.org/r/480789 (owner: 10Paladox) [17:55:03] herron: ^^^ FYI ;) [17:55:17] (03CR) 10Dzahn: [C: 04-1] ""template" is used as source but the file is not an .erb file. content of template is directly inserted as "source" -> https://puppet-comp" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478052 (owner: 10Paladox) [17:56:04] (03PS2) 10Volans: icinga: fix command_file property [software/spicerack] - 10https://gerrit.wikimedia.org/r/481857 [17:56:07] (03PS2) 10Volans: remote: suppress Cumin's output [software/spicerack] - 10https://gerrit.wikimedia.org/r/481858 (https://phabricator.wikimedia.org/T212783) [17:56:13] volans: 👍 [17:59:40] (03PS10) 10Paladox: profile::phabricator::httpd: Update's worker config [puppet] - 10https://gerrit.wikimedia.org/r/478052 [18:00:10] (03CR) 10Paladox: profile::phabricator::httpd: Update's worker config (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478052 (owner: 10Paladox) [18:00:18] (03PS11) 10Paladox: profile::phabricator::httpd: Update's worker config [puppet] - 10https://gerrit.wikimedia.org/r/478052 [18:00:23] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/478052 (owner: 10Paladox) [18:00:51] hey robh akosiaris moritzm could I ask one of you to please update the topic with my nick on clinic duty? [18:03:12] (03PS2) 10Elukey: Exclude two Analytics Hadoop worker nodes for decom [puppet] - 10https://gerrit.wikimedia.org/r/481899 (https://phabricator.wikimedia.org/T209929) [18:03:31] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for uwsgi-certcentral [puppet] - 10https://gerrit.wikimedia.org/r/481902 (https://phabricator.wikimedia.org/T135991) [18:04:09] (03CR) 10Elukey: [C: 03+2] Exclude two Analytics Hadoop worker nodes for decom [puppet] - 10https://gerrit.wikimedia.org/r/481899 (https://phabricator.wikimedia.org/T209929) (owner: 10Elukey) [18:04:20] herron: done [18:04:28] thanks! [18:08:10] Hi, I tracking https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/29307/console ~20 minutes. Can anyone check what happening? [18:09:23] (03PS12) 10Paladox: profile::phabricator::httpd: Update's worker config [puppet] - 10https://gerrit.wikimedia.org/r/478052 [18:09:30] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/478052 (owner: 10Paladox) [18:13:48] (03CR) 10Dzahn: [C: 03+2] profile::phabricator::httpd: Update's worker config [puppet] - 10https://gerrit.wikimedia.org/r/478052 (owner: 10Paladox) [18:14:30] 10Operations, 10ops-eqiad: wtp1028 unresponsive - https://phabricator.wikimedia.org/T212624 (10herron) Interestingly this host logged a "system firmware progress post error 0Fh" error on the 27th ` ID | Date | Time | Name | Type | Event 2 | Dec-27-2018 | 07:38:15... [18:14:34] (03PS13) 10Dzahn: profile::phabricator::httpd: Update's worker config [puppet] - 10https://gerrit.wikimedia.org/r/478052 (owner: 10Paladox) [18:23:53] herron: if you can have a look at the above CR for Icinga I can merge it right away ;) [18:24:20] legoktm: could you get wikibugs to re-join #wikimedia-serviceops ? it quit with flood message earlier [18:24:33] probably because somebody mass-edited tasks (?) [18:26:16] (03CR) 10Herron: [C: 03+1] icinga: re-apply change for URL ampersends [puppet] - 10https://gerrit.wikimedia.org/r/481901 (https://phabricator.wikimedia.org/T212669) (owner: 10Volans) [18:26:26] volans: sounds good! [18:26:31] thx [18:26:47] (03PS2) 10Volans: icinga: re-apply change for URL ampersends [puppet] - 10https://gerrit.wikimedia.org/r/481901 (https://phabricator.wikimedia.org/T212669) [18:27:39] (03CR) 10Volans: [C: 03+2] icinga: re-apply change for URL ampersends [puppet] - 10https://gerrit.wikimedia.org/r/481901 (https://phabricator.wikimedia.org/T212669) (owner: 10Volans) [18:31:27] (03PS1) 10Paladox: httpd::mpm: Add missing condition to "if $source {" [puppet] - 10https://gerrit.wikimedia.org/r/481907 [18:31:46] (03PS2) 10Paladox: httpd::mpm: Add missing condition to "if $source {" [puppet] - 10https://gerrit.wikimedia.org/r/481907 [18:32:02] (03PS3) 10Paladox: httpd::mpm: Add missing condition to "if $source {" [puppet] - 10https://gerrit.wikimedia.org/r/481907 [18:32:07] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/481907 (owner: 10Paladox) [18:34:30] PROBLEM - carbon-local-relay metric drops on graphite1004 is CRITICAL: TEST - ignore -volans https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1panelId=29fullscreen https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1panelId=29fullscreen [18:34:37] This is me testing Icinga links ^^^ [18:35:11] (03PS1) 10Elukey: role::analytics_cluster::hadoop::ui: update documentation [puppet] - 10https://gerrit.wikimedia.org/r/481910 [18:35:29] herron: mmmh IIRC it needed a restart... doing that before digging into the rabbit hole ;) [18:35:57] !log restarting icinga on icinga1001 T212669 [18:35:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:36:00] T212669: icinga doesn't log ampersand in notes_url links - https://phabricator.wikimedia.org/T212669 [18:36:48] (03PS2) 10Elukey: role::analytics_cluster::hadoop::ui: update documentation [puppet] - 10https://gerrit.wikimedia.org/r/481910 [18:37:49] RECOVERY - carbon-local-relay metric drops on graphite1004 is OK: OK: Less than 80.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1&panelId=29&fullscreen https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1&panelId=29&fullscreen [18:37:58] yay, it worked :) [18:38:47] 10Operations, 10monitoring, 10Patch-For-Review: icinga doesn't log ampersand in notes_url links - https://phabricator.wikimedia.org/T212669 (10Volans) 05Open→03Resolved Fix applied, it required an Icinga restart. See the commit message for more details. ` Wed 19:37:49 icinga-wm| RECOVERY - carbon-loca... [18:40:01] nice volans ! [18:41:38] 10Operations, 10ops-eqiad: wtp1028 unresponsive - https://phabricator.wikimedia.org/T212624 (10Cmjohnson) @herron @fgiunchedi I went to the data center on the 27th and powercycled the server. I thought I updated task but I don't see my update. [18:41:51] (03PS1) 10Volans: tests: fix Pytest RemovedInPytest4Warning [software/cumin] - 10https://gerrit.wikimedia.org/r/481913 [18:41:53] (03PS1) 10Volans: tests: test also with Python 3.7 [software/cumin] - 10https://gerrit.wikimedia.org/r/481914 [18:42:39] (03PS1) 10Paladox: Revert "profile::phabricator::httpd: Update's worker config" [puppet] - 10https://gerrit.wikimedia.org/r/481915 [18:53:59] 10Operations, 10TechCom-RFC, 10Wikidata, 10Wikidata-Termbox-Hike, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Milimetric) I agree that we can't go back on decisions that are 3 years in the making. But I do like Timo's point that we should state th... [18:54:04] (03CR) 10jerkins-bot: [V: 04-1] tests: test also with Python 3.7 [software/cumin] - 10https://gerrit.wikimedia.org/r/481914 (owner: 10Volans) [18:56:17] 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis, 10User-fgiunchedi: Better organization for SRE grafana dashboards - https://phabricator.wikimedia.org/T178690 (10CDanis) I've generated a list of Grafana dashboards sorted by their last modification time. Details below. While this isn't a p... [18:58:49] (03PS1) 10Dzahn: httpd/phabricator: revert using template for mpm config, adjust values for phab [puppet] - 10https://gerrit.wikimedia.org/r/481923 [18:59:01] (03CR) 10Volans: "CI failed because of timeout, fix is I7072f7f184c63f6412339c7026af1179e9aefe0a" [software/cumin] - 10https://gerrit.wikimedia.org/r/481914 (owner: 10Volans) [18:59:08] (03PS7) 10MacFan4000: Set wgNoticeProjects for wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) [18:59:15] (03CR) 10jerkins-bot: [V: 04-1] httpd/phabricator: revert using template for mpm config, adjust values for phab [puppet] - 10https://gerrit.wikimedia.org/r/481923 (owner: 10Dzahn) [18:59:23] (03PS8) 10MacFan4000: Set wgNoticeProjects for wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) [19:00:27] (03CR) 10MacFan4000: "I’ve added a comment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000) [19:02:50] (03PS2) 10Dzahn: httpd/phabricator: revert using template for mpm config, adjust values for phab [puppet] - 10https://gerrit.wikimedia.org/r/481923 [19:04:33] (03CR) 10Dzahn: "you don't need to revert the whole thing, i am doing https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/481923/ instead" [puppet] - 10https://gerrit.wikimedia.org/r/481915 (owner: 10Paladox) [19:10:17] (03PS3) 10Dzahn: httpd/phabricator: revert using template for mpm config, adjust values for phab [puppet] - 10https://gerrit.wikimedia.org/r/481923 [19:10:58] (03CR) 10jerkins-bot: [V: 04-1] httpd/phabricator: revert using template for mpm config, adjust values for phab [puppet] - 10https://gerrit.wikimedia.org/r/481923 (owner: 10Dzahn) [19:12:06] (03PS4) 10Dzahn: httpd/phabricator: revert using template for mpm config, adjust values for phab [puppet] - 10https://gerrit.wikimedia.org/r/481923 [19:14:05] (03CR) 10Paladox: [C: 03+1] httpd/phabricator: revert using template for mpm config, adjust values for phab [puppet] - 10https://gerrit.wikimedia.org/r/481923 (owner: 10Dzahn) [19:14:16] (03Abandoned) 10Paladox: Revert "profile::phabricator::httpd: Update's worker config" [puppet] - 10https://gerrit.wikimedia.org/r/481915 (owner: 10Paladox) [19:14:58] cdanis: if you want you can use {P7951} inside a Phab ticket comment. then you get inclusion of the text from a pastebin with scroll bars [19:15:23] oh neat [19:15:25] ty [19:15:36] yw [19:16:23] phabricator is pretty impressive software [19:17:41] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14130/" [puppet] - 10https://gerrit.wikimedia.org/r/481923 (owner: 10Dzahn) [19:18:17] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), and 2 others: TEC3:O3:O3.1:Q2 Goal - Move Blubberoid, ZoteroV2, and Graphoid through the production CD Pipeline - https://phabricator.wikimedia.org/T205919 (10thcipriani) [19:18:22] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10thcipriani) [19:18:27] 10Operations, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Release-Engineering-Team (Watching / External), 10Services (watching): Revisit the logging work done on Q1 2017-2018 for the standard pod setup - https://phabricator.wikimedia.org/T207200 (10thcipriani) [19:18:46] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10thcipriani) [19:18:53] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), and 2 others: TEC3:O3:O3.1:Q2 Goal - Move Blubberoid, ZoteroV2, and Graphoid through the production CD Pipeline - https://phabricator.wikimedia.org/T205919 (10thcipriani) 05Open→03Resolved... [19:19:06] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10greg) [19:21:24] 10Operations, 10WMF-Communications, 10Wikimedia-Mailing-lists, 10Design: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10Quiddity) I believe (and personally hope) the plan is to keep the RGB version for community logo... [19:21:56] 10Operations, 10ORES, 10Scoring-platform-team, 10Release Pipeline (Blubber): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10thcipriani) [19:27:19] (03PS1) 10Paladox: httpd::mpm: add support for a content param [puppet] - 10https://gerrit.wikimedia.org/r/481932 [19:29:25] (03PS2) 10Paladox: httpd::mpm: add support for a content param [puppet] - 10https://gerrit.wikimedia.org/r/481932 [19:33:17] (03CR) 10Dzahn: "maybe we should just always use a template and never a file? instead of having to deal with 2 separate parameters. or we could have a sin" [puppet] - 10https://gerrit.wikimedia.org/r/481932 (owner: 10Paladox) [19:39:40] 10Operations, 10WMF-Communications, 10Wikimedia-Mailing-lists, 10Design: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10Dzahn) I feel this ticket is more general than a mailman issue. Maybe rename to figure out a gen... [19:41:30] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Krinkle) > For instance, in recent changes patrolling when a change is marked as "patrolled", we can assume it is also "non dama... [19:43:15] 10Operations, 10DBA, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) The only remaining uses of the mysql module are now limited to the statistics module and wikimetrics which i think are maintai... [19:44:08] 10Operations, 10Analytics, 10Analytics-Cluster, 10DBA: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) [19:46:51] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) >>! In T200297#4850136, @Krinkle wrote: >> For instance, in recent changes patrolling when a change is marked as "patrol... [19:52:52] (03PS2) 10Dzahn: mediawiki: rename the maintenance role to match other roles [puppet] - 10https://gerrit.wikimedia.org/r/479131 [19:56:49] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Krinkle) > The benefit of these integrations is that it will allow work deduplication. I would very much like the patrolling w... [20:04:36] !log mwmaint1002: foreachwikiindblist s2 deleteEqualMessages.php [20:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:05] !log mwmaint1002: foreachwikiindblist s5 deleteEqualMessages.php [20:05:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:41] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) >>! In T200297#4850213, @Krinkle wrote: > >> The benefit of these integrations is that it will allow work deduplication... [20:16:46] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) [20:18:45] (03PS2) 10Dzahn: interface: use data types for Ipv4 and Ipv6 addresses [puppet] - 10https://gerrit.wikimedia.org/r/478114 [20:33:21] (03PS1) 10MarcoAurelio: Clear expired throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481940 [20:48:42] mutante: it will rejoin whenever it needs to send a message [20:49:34] legoktm: aha! thanks. and yes, it joined again. all good [20:50:22] :) [20:52:29] !log rebooting wtp1028 — looking for POST errors T212624 [20:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:52:33] T212624: wtp1028 unresponsive - https://phabricator.wikimedia.org/T212624 [20:53:35] (03PS1) 10MarcoAurelio: [WIP] Initial configuration for hyw.wikipedia (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) [20:54:21] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Initial configuration for hyw.wikipedia (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [20:54:25] (03PS2) 10Bmansurov: Enable reader trust survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476368 (https://phabricator.wikimedia.org/T209882) [20:55:32] (03PS2) 10Bmansurov: Disable reader trust survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476370 (https://phabricator.wikimedia.org/T209882) [20:58:57] 10Operations, 10ops-eqiad: wtp1028 unresponsive - https://phabricator.wikimedia.org/T212624 (10herron) Not seeing any errors on the console. Host boots up without issues. Repooling. [20:59:11] !log repooling wtp1028 T212624 [20:59:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:13] T212624: wtp1028 unresponsive - https://phabricator.wikimedia.org/T212624 [20:59:14] !log herron@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,service=parsoid,name=wtp1028.eqiad.wmnet [20:59:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:54] 10Operations, 10ops-eqiad: wtp1028 unresponsive - https://phabricator.wikimedia.org/T212624 (10herron) 05Open→03Resolved a:03herron ` !log herron@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,service=parsoid,name=wtp1028.eqiad.wmnet` [21:05:03] 10Operations, 10RESTBase-Cassandra, 10Services: restbase cassandra driver excessive logging when cassandra hosts are down - https://phabricator.wikimedia.org/T212424 (10herron) p:05Triage→03Normal [21:05:36] 10Operations, 10Traffic: varnishreqstats sends truncated statsd traffic - https://phabricator.wikimedia.org/T212310 (10herron) p:05Triage→03Normal [21:06:17] 10Operations, 10TechCom-RFC, 10Wikidata, 10Wikidata-Termbox-Hike, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10herron) p:05Triage→03Normal [21:11:11] (03PS1) 10MarcoAurelio: Initial configuration for hyw.wikipedia (2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481945 (https://phabricator.wikimedia.org/T212597) [21:12:01] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for hyw.wikipedia (2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481945 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [21:13:01] (03CR) 10MarcoAurelio: [WIP] Initial configuration for hyw.wikipedia (1/2) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [21:13:29] (03PS2) 10MarcoAurelio: [WIP] Initial configuration for hyw.wikipedia (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) [21:14:23] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Initial configuration for hyw.wikipedia (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [21:17:29] (03PS3) 10MarcoAurelio: [WIP] Initial configuration for hyw.wikipedia (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) [21:18:23] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Initial configuration for hyw.wikipedia (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [21:18:54] (03CR) 10Smalyshev: [C: 03+1] "+1 for wdqs parts" [puppet] - 10https://gerrit.wikimedia.org/r/481818 (owner: 10Giuseppe Lavagetto) [21:18:59] (03Abandoned) 10MarcoAurelio: Initial configuration for hyw.wikipedia (2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481945 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [21:34:31] (03PS4) 10MarcoAurelio: Initial configuration for hyw.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) [21:44:13] 10Operations, 10Developer-Advocacy, 10Discourse, 10Epic: Bring a discourse instance for technical questions to production - https://phabricator.wikimedia.org/T180853 (10bd808) [21:47:09] 10Operations, 10Puppet: puppet (systemd::service) attempts to start manually masked units - https://phabricator.wikimedia.org/T211027 (10herron) p:05Triage→03Normal [21:49:44] (03CR) 10Legoktm: [C: 04-1] "In general I would prefer to minimize the changes in this repo and the upstream debianization (https://salsa.debian.org/mediawiki-team/php" [debs/php-excimer] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/481615 (owner: 10Hashar) [21:55:17] (03CR) 10MarcoAurelio: [C: 04-1] "+ wikidata dblists, etc." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) (owner: 10MarcoAurelio) [21:57:28] (03PS5) 10MarcoAurelio: Initial configuration for hyw.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481943 (https://phabricator.wikimedia.org/T212597) [22:29:34] 10Operations, 10Developer-Advocacy, 10Discourse, 10Epic: Bring a discourse instance for technical questions to production - https://phabricator.wikimedia.org/T180853 (10bd808) [22:40:10] (03PS1) 10Dzahn: add fake dns/dnscookies.key secret to allow puppet compiling on authdns/cp hosts [labs/private] - 10https://gerrit.wikimedia.org/r/481960 [23:12:29] (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake dns/dnscookies.key secret to allow puppet compiling on authdns/cp hosts [labs/private] - 10https://gerrit.wikimedia.org/r/481960 (owner: 10Dzahn) [23:27:33] PROBLEM - DPKG on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:27:33] PROBLEM - configured eth on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:28:03] PROBLEM - Disk space on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:28:17] PROBLEM - dhclient process on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:28:25] PROBLEM - MD RAID on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:28:29] PROBLEM - Check systemd state on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:29:17] PROBLEM - puppet last run on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:39:31] RECOVERY - DPKG on notebook1004 is OK: All packages OK [23:39:31] RECOVERY - configured eth on notebook1004 is OK: OK - interfaces up [23:39:35] !log notebook1004 - systemctl status nagios-nrpe-server [23:39:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:39] RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [23:39:42] !log notebook1004 - systemctl start nagios-nrpe-server [23:39:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:17] RECOVERY - dhclient process on notebook1004 is OK: PROCS OK: 0 processes with command name dhclient [23:40:25] RECOVERY - MD RAID on notebook1004 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [23:40:27] RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational [23:45:04] (03CR) 10Dzahn: [C: 04-1] "some places already use these parameters with non-matching values. perhaps good we found them.. but can't be merged like this" [puppet] - 10https://gerrit.wikimedia.org/r/478114 (owner: 10Dzahn) [23:50:08] (03PS3) 10Dzahn: interface: use data types for Ipv4 and Ipv6 addresses [puppet] - 10https://gerrit.wikimedia.org/r/478114 [23:50:40] (03CR) 10jerkins-bot: [V: 04-1] interface: use data types for Ipv4 and Ipv6 addresses [puppet] - 10https://gerrit.wikimedia.org/r/478114 (owner: 10Dzahn) [23:52:17] PROBLEM - dhclient process on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:52:25] PROBLEM - MD RAID on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:52:27] PROBLEM - puppet last run on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:52:27] PROBLEM - Check systemd state on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:52:43] PROBLEM - configured eth on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:52:43] PROBLEM - DPKG on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:54:45] PROBLEM - Check the NTP synchronisation status of timesyncd on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [23:57:37] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:59:19] !log notebook1004 still keeps running out of memory from some user actions and that kills nagios-nrpe-server and that causes a bunch of Icinga alerts [23:59:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:57] RECOVERY - configured eth on notebook1004 is OK: OK - interfaces up [23:59:57] RECOVERY - DPKG on notebook1004 is OK: All packages OK