[00:01:07] !log disable BGP to Zayo on cr2-codfw for intrusive testing - T215193 [00:01:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:01:11] T215193: Fix codfw x-connect 65373 - https://phabricator.wikimedia.org/T215193 [00:29:41] (03CR) 10Bstorm: "Yup" [puppet] - 10https://gerrit.wikimedia.org/r/491189 (https://phabricator.wikimedia.org/T216373) (owner: 10Bstorm) [00:29:57] (03Abandoned) 10Bstorm: maintain_dbusers: Reverting to the old location to save git history [puppet] - 10https://gerrit.wikimedia.org/r/491189 (https://phabricator.wikimedia.org/T216373) (owner: 10Bstorm) [00:40:14] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10ayounsi) a:05ayounsi→03Andrew This is because `ge-3/0/19` is the last interface in the `interface-range vlan-cloud-instances1-b-eqiad`. @Andrew is vlan `cloud-instances1-b-eqiad... [01:40:45] (03PS3) 10BryanDavis: Set custom mime-types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/489409 (https://phabricator.wikimedia.org/T178601) [01:40:47] (03PS1) 10BryanDavis: Mount /mnt/nfs into Kuberntes pods [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) [02:13:33] (03PS1) 10MSantos: Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) [02:14:07] (03CR) 10jerkins-bot: [V: 04-1] Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos) [02:16:23] (03PS2) 10MSantos: Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) [02:52:41] PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:55:33] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) >>! In T205897#4953769, @faidon wrote: > So, I don't think we can reasonably expect our on-site techs to look at a box and say "oh this port is `enp4s0f0p1`" and record it as such :) Indeed, some parts of the int... [03:18:43] RECOVERY - puppet last run on mw1281 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:24:32] (03CR) 10Zhuyifei1999: "I've never tested, but hopefully mounts are recursive." (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis) [03:26:09] 10Operations: Netbox: cable termination names report - https://phabricator.wikimedia.org/T216469 (10ayounsi) p:05Triage→03Lowest [03:55:34] (03PS2) 10BryanDavis: Mount /mnt/nfs into Kuberntes pods [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) [03:55:36] (03PS4) 10BryanDavis: Set custom mime-types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/489409 (https://phabricator.wikimedia.org/T178601) [03:58:14] (03CR) 10BryanDavis: "> I've never tested, but hopefully mounts are recursive." (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis) [04:56:49] (03CR) 10BBlack: "Nice! On my particular laptop Debian setup (stretch + misc backports and testing packages, python3 is 3.5 and tox is 2.5), just installing" [dns] - 10https://gerrit.wikimedia.org/r/491280 (owner: 10Volans) [05:09:05] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:09:11] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 71, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:16:44] ^ This is planned Level3 maintenance [05:17:35] !log deleted previously deactivated BGP_community_actions terms - T204281 [05:17:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:17:38] T204281: Stop prioritizing peering over transit - https://phabricator.wikimedia.org/T204281 [05:30:40] 10Operations, 10netops, 10Performance-Team (Radar): Stop prioritizing peering over transit - https://phabricator.wikimedia.org/T204281 (10ayounsi) Doing the change in ulsfo: `lang=diff [edit policy-options policy-statement BGP_community_actions] - term peer-private-peer { - from community PEER_PRIV... [05:31:11] !log delete local pref for peering sessions in ulsfo - T204281 [05:31:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:31:14] T204281: Stop prioritizing peering over transit - https://phabricator.wikimedia.org/T204281 [06:04:09] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10ayounsi) a:05faidon→03ayounsi Discussed it with Faidon, Updated the previous diff to not redirect pings coming from our infra. Some questions, concerns raised: * The... [06:28:37] PROBLEM - puppet last run on cloudvirt1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-puppet-agent-stats] [06:30:21] PROBLEM - Check systemd state on netmon2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:39:01] RECOVERY - Check systemd state on netmon2001 is OK: OK - running: The system is fully operational [06:49:02] (03PS1) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [06:49:36] (03CR) 10jerkins-bot: [V: 04-1] WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 (owner: 10Marostegui) [06:50:17] (03PS1) 10Marostegui: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491409 (https://phabricator.wikimedia.org/T210713) [06:52:33] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491409 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:53:19] (03CR) 10Elukey: Remove stray packages after dist-upgrade on buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491275 (owner: 10Muehlenhoff) [06:53:45] (03PS2) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [06:54:01] (03PS2) 10Elukey: Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) [06:54:58] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491409 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:56:03] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1106 T210713 (duration: 00m 52s) [06:56:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:07] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [06:56:21] !log Deploy schema change on db1106 - T210713 [06:56:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:45] !log Deploy schema change on db1106 - this will generate lag on labsdb:s1 T210713 [06:56:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:49] 10Operations, 10netops, 10Performance-Team (Radar): Stop prioritizing peering over transit - https://phabricator.wikimedia.org/T204281 (10ayounsi) This caused a ~80Mbps traffic drop on the peering link. [06:57:51] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491409 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:59:57] RECOVERY - puppet last run on cloudvirt1017 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:00:18] (03PS3) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [07:14:07] (03PS4) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [07:15:16] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491410 [07:20:43] (03PS5) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [07:21:17] !log Drop ep_* tables on s3 - T174802 [07:21:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:20] T174802: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 [07:30:25] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:30:35] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:38:46] (downtimed stat1005 but I am rebooting it, might alert) [07:45:17] (03PS6) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [07:46:13] (03CR) 10jerkins-bot: [V: 04-1] WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 (owner: 10Marostegui) [07:46:32] !log Reboot db1106 for kernel upgrade (and remove debug from kernel) T216240 T216273 [07:46:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:46:37] T216273: New cronspam from db clusters - https://phabricator.wikimedia.org/T216273 [07:46:37] T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 [07:48:25] (03PS7) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [07:50:32] !log installing systemd security updates on stretch [07:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:40] !log Drop ep_* tables on s1 - T174802 [07:51:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:43] T174802: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 [07:55:28] (03PS8) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [07:56:23] 10Operations, 10Analytics, 10Discovery, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10JAllemandou) One note about hadoop blobs: HDFS stores files split in chunks, with those not collocated. If we use... [07:56:29] 10Operations: New cronspam from db clusters - https://phabricator.wikimedia.org/T216273 (10Marostegui) I have rebooted db1106, I will give it sometime to confirm the spam is gone before closing this task. [07:56:36] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [07:56:48] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) db1106 has been rebooted (and kernel was upgraded) [07:58:16] (03PS9) 10Marostegui: WIP dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 [08:00:02] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491410 (owner: 10Marostegui) [08:02:03] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491410 (owner: 10Marostegui) [08:03:18] 10Operations: New cronspam from db clusters - https://phabricator.wikimedia.org/T216273 (10MoritzMuehlenhoff) Sounds good, on db2085 there's been no further occasion after the reboot. [08:04:45] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491410 (owner: 10Marostegui) [08:05:13] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1106 T210713 (duration: 00m 49s) [08:05:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:16] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [08:08:04] (03PS10) 10Marostegui: dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 (https://phabricator.wikimedia.org/T210478) [08:08:31] (03CR) 10Marostegui: "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/compiler1001/14729/" [puppet] - 10https://gerrit.wikimedia.org/r/491408 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [08:09:00] (03CR) 10jerkins-bot: [V: 04-1] dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [08:13:18] (03PS11) 10Marostegui: dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 (https://phabricator.wikimedia.org/T210478) [08:15:38] (03PS2) 10Muehlenhoff: Remove stray packages after dist-upgrade on buster [puppet] - 10https://gerrit.wikimedia.org/r/491275 [08:15:49] (03CR) 10Muehlenhoff: Remove stray packages after dist-upgrade on buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491275 (owner: 10Muehlenhoff) [08:25:41] (03PS1) 10Marostegui: filtered_tables.txt: Remove the ep_* tables [puppet] - 10https://gerrit.wikimedia.org/r/491413 (https://phabricator.wikimedia.org/T174802) [08:29:50] (03PS1) 10Joal: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) [08:30:46] (03CR) 10jerkins-bot: [V: 04-1] Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [08:32:40] (03PS2) 10Joal: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) [08:33:37] (03CR) 10jerkins-bot: [V: 04-1] Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [08:35:40] (03PS3) 10Joal: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) [08:43:21] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491416 (https://phabricator.wikimedia.org/T210713) [08:46:41] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491416 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [08:47:06] PROBLEM - DPKG on db2058 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:48:38] PROBLEM - puppet last run on wdqs2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [08:49:08] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491416 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [08:49:44] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) https://github.com/RadeonOpenCompute/ROCm/issues/482 is a very similar problem, so I tried a couple of suggestions in here: * export H... [08:49:52] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491416 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [08:50:06] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1089 T210713 (duration: 00m 46s) [08:50:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:10] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [08:50:12] PROBLEM - puppet last run on logstash2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [08:50:29] !log Deploy schema change on db1089 - T210713 [08:50:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:10] PROBLEM - puppet last run on ms-be2034 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [08:52:30] (03PS1) 10Marostegui: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491419 [08:53:38] RECOVERY - puppet last run on wdqs2004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:53:58] PROBLEM - puppet last run on ms-be2036 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [08:54:05] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491419 (owner: 10Marostegui) [08:55:12] RECOVERY - puppet last run on logstash2003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:55:43] !log Cleaning contint1001 / partition [08:55:44] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491419 (owner: 10Marostegui) [08:55:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:41] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1093 for kernel upgrade (duration: 00m 45s) [08:56:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:46] <_joe_> !log experimenting with php-fpm configuration on mwdebug1001 for T176916 [08:56:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:50] T176916: Set up sampling profiler for PHP 7 (alternative to HHVM Xenon) - https://phabricator.wikimedia.org/T176916 [08:57:10] RECOVERY - puppet last run on ms-be2034 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:00:31] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491423 [09:01:21] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491419 (owner: 10Marostegui) [09:02:45] (03CR) 10Mforns: [C: 03+1] "LGTM! Just left a minor suggestion" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [09:05:59] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491423 (owner: 10Marostegui) [09:06:36] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) >>! In T148843#4963670, @elukey wrote: > What do you think about opening a GH issue to ROCm first to (hopefully) get some fe... [09:07:40] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491423 (owner: 10Marostegui) [09:08:45] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1093 after kernel upgrade (duration: 00m 46s) [09:08:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:08] PROBLEM - DPKG on analytics1061 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:12:43] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491423 (owner: 10Marostegui) [09:15:31] (03PS1) 10Marostegui: db-eqiad.php: Repool db1093 in API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491425 [09:19:34] RECOVERY - puppet last run on ms-be2036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:20:31] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Repool db1093 in API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491425 (owner: 10Marostegui) [09:20:34] PROBLEM - DPKG on db1123 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:20:36] (03CR) 10Jcrespo: [C: 03+1] filtered_tables.txt: Remove the ep_* tables [puppet] - 10https://gerrit.wikimedia.org/r/491413 (https://phabricator.wikimedia.org/T174802) (owner: 10Marostegui) [09:21:53] (03CR) 10Marostegui: [C: 03+2] filtered_tables.txt: Remove the ep_* tables [puppet] - 10https://gerrit.wikimedia.org/r/491413 (https://phabricator.wikimedia.org/T174802) (owner: 10Marostegui) [09:22:44] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1093 in API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491425 (owner: 10Marostegui) [09:23:47] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1093 on API after kernel upgrade (duration: 00m 46s) [09:23:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:50] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1093 in API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491425 (owner: 10Marostegui) [09:23:53] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491427 [09:24:35] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491427 [09:26:34] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491427 (owner: 10Marostegui) [09:28:30] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491427 (owner: 10Marostegui) [09:28:45] <_joe_> marostegui: can you wait 1 minute before deploying? [09:28:53] <_joe_> or, are you just doing sync-file, right? [09:29:19] gah [09:29:21] too late [09:29:24] I am deploying already [09:29:35] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1089 T210713 (duration: 00m 45s) [09:29:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:40] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [09:29:58] <_joe_> marostegui: it's ok actuallyt [09:30:27] _joe_: but yeah, we only use sync-file [09:31:02] RECOVERY - DPKG on db2058 is OK: All packages OK [09:33:04] RECOVERY - DPKG on analytics1061 is OK: All packages OK [09:33:42] RECOVERY - DPKG on db1123 is OK: All packages OK [09:34:31] !log mforns@deploy1001 Started deploy [analytics/refinery@0d7ec19]: deploying refinery to update EL sanitization whitelist [09:34:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:52] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491427 (owner: 10Marostegui) [09:35:09] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Created https://github.com/RadeonOpenCompute/ROCm/issues/714 [09:35:49] (03CR) 10Elukey: [C: 03+1] Remove stray packages after dist-upgrade on buster [puppet] - 10https://gerrit.wikimedia.org/r/491275 (owner: 10Muehlenhoff) [09:39:02] (03PS3) 10Elukey: Update sqoop launchers used by timers [puppet] - 10https://gerrit.wikimedia.org/r/491246 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [09:40:42] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation={get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:41:10] PROBLEM - DPKG on planet1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:41:14] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation=list https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:41:38] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=LIST https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:42:20] !log mforns@deploy1001 Finished deploy [analytics/refinery@0d7ec19]: deploying refinery to update EL sanitization whitelist (duration: 07m 49s) [09:42:22] RECOVERY - DPKG on planet1001 is OK: All packages OK [09:42:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:38] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb={GET,LIST} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:43:20] PROBLEM - puppet last run on webperf1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [09:43:52] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:45:36] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:46:02] PROBLEM - puppet last run on planet1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [09:46:08] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:46:14] (03PS1) 10Marostegui: db-eqiad.php: More weight to db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491428 [09:46:25] (03CR) 10Elukey: [C: 03+2] Update sqoop launchers used by timers [puppet] - 10https://gerrit.wikimedia.org/r/491246 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [09:46:32] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:47:08] PROBLEM - puppet last run on ms-be1038 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [09:47:26] PROBLEM - puppet last run on ms-be1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [09:48:23] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More weight to db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491428 (owner: 10Marostegui) [09:50:36] (03Merged) 10jenkins-bot: db-eqiad.php: More weight to db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491428 (owner: 10Marostegui) [09:51:45] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1093 after kernel upgrade (duration: 00m 46s) [09:51:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:46] (03CR) 10jenkins-bot: db-eqiad.php: More weight to db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491428 (owner: 10Marostegui) [10:04:27] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491430 [10:05:29] (03PS1) 10Jcrespo: mariadb: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491431 [10:05:31] (03PS4) 10Joal: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) [10:06:16] (03CR) 10Joal: "Thanks for the comments mforns" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [10:09:16] (03CR) 10Nikerabbit: [C: 04-1] WIP: Cron to run script to purge old CX drafts (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/486454 (https://phabricator.wikimedia.org/T189091) (owner: 10KartikMistry) [10:09:27] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491430 (owner: 10Marostegui) [10:12:04] RECOVERY - puppet last run on planet1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:12:20] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491430 (owner: 10Marostegui) [10:12:27] (03PS1) 10Urbanecm: Add new throttle rule for WikiProject Women in red, enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491432 (https://phabricator.wikimedia.org/T215295) [10:13:04] RECOVERY - puppet last run on ms-be1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:13:22] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1093 after kernel upgrade (duration: 00m 46s) [10:13:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:24] RECOVERY - puppet last run on ms-be1017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:14:06] (03CR) 10Mathew.onipe: Restore privileges to admin table after script (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos) [10:14:36] RECOVERY - puppet last run on webperf1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [10:18:51] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491430 (owner: 10Marostegui) [10:19:07] (03PS1) 10Marostegui: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491433 (https://phabricator.wikimedia.org/T210713) [10:21:08] (03PS3) 10Volans: Add tox configuration to run the tests [dns] - 10https://gerrit.wikimedia.org/r/491280 [10:21:10] (03PS2) 10Volans: Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 [10:21:19] (03CR) 10jerkins-bot: [V: 04-1] Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [10:21:45] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491433 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [10:23:37] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491433 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [10:23:53] (03CR) 10Volans: "> Patch Set 2:" [dns] - 10https://gerrit.wikimedia.org/r/491280 (owner: 10Volans) [10:25:34] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1083 T210713 (duration: 00m 46s) [10:25:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:37] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [10:25:43] !log Deploy schema change on db1083 - T210713 [10:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:06] 10Operations, 10Operations-Software-Development: Netbox: cable termination names report - https://phabricator.wikimedia.org/T216469 (10Volans) [10:29:44] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491433 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [10:30:31] (03PS2) 10Jcrespo: mariadb: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491431 [10:32:31] (03CR) 10Marostegui: [C: 03+1] mariadb: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491431 (owner: 10Jcrespo) [10:34:30] 10Operations, 10vm-requests, 10Patch-For-Review: eqiad: (1) Ganeti VM for testing Kerberos in Production - https://phabricator.wikimedia.org/T216238 (10elukey) Started makevm on a tmux session on ganeti1003 [10:37:51] 10Operations, 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) For what is worth, dbstore1002 is now lagging behind on s8 (wikidatawiki) 7 days and it keeps lagging, I doubt it will ever catch up.... [10:38:33] (03CR) 10Jcrespo: [C: 03+2] mariadb: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491431 (owner: 10Jcrespo) [10:38:45] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work), 10Patch-For-Review: Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Mathew.onipe) [10:39:14] 10Operations, 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) p:05High→03Low Reducing priority as the errors on dbstore1002 are not too important anymore as this host shouldn't be used anymor... [10:41:45] (03Merged) 10jenkins-bot: mariadb: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491431 (owner: 10Jcrespo) [10:44:02] 10Operations, 10ops-eqiad, 10Analytics: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) [10:44:10] 10Operations, 10ops-eqiad, 10Analytics: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) 05Open→03Stalled p:05Triage→03Normal [10:44:27] (03CR) 10jenkins-bot: mariadb: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491431 (owner: 10Jcrespo) [10:44:49] 10Operations, 10ops-eqiad, 10Analytics: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) [10:46:23] (03CR) 10Alexandros Kosiaris: [C: 03+1] raid_handler: fix reported executed script [puppet] - 10https://gerrit.wikimedia.org/r/490337 (owner: 10Volans) [10:48:59] 10Operations, 10Proton: Proton fails with Chromium 72 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) [10:55:14] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491441 [10:57:37] (03CR) 10Alexandros Kosiaris: [C: 04-1] raid: improve megacli get raid script (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490338 (owner: 10Volans) [11:00:29] 10Operations, 10vm-requests, 10Patch-For-Review: eqiad: (1) Ganeti VM for testing Kerberos in Production - https://phabricator.wikimedia.org/T216238 (10elukey) Full log: ` elukey@ganeti1003:~$ makevm This is an interactive script to make it easier to create a Ganeti VM. Please see https://wikitech.wikimedia... [11:02:28] (03PS1) 10Elukey: Add kerberos1001 to DHCP and partman configs [puppet] - 10https://gerrit.wikimedia.org/r/491442 (https://phabricator.wikimedia.org/T216238) [11:04:44] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491441 [11:05:25] !log Deploy schema change on dbstore1002 [11:05:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:59] marostegui: last moments of joy? [11:06:10] Indeed! [11:06:18] It might crash anyways while running it [11:07:30] (03CR) 10Elukey: [C: 03+2] Add kerberos1001 to DHCP and partman configs [puppet] - 10https://gerrit.wikimedia.org/r/491442 (https://phabricator.wikimedia.org/T216238) (owner: 10Elukey) [11:07:56] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491441 (owner: 10Marostegui) [11:08:13] 10Operations, 10Proton: Proton fails with Chromium 72 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) To exclude firejail as a source of error, I disabled puppet on deployment-chromium01, remove Firejail from the service unit and restarted proton.service, same effect, Proton still fails: `... [11:10:19] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1064 (duration: 00m 46s) [11:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:21] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491441 (owner: 10Marostegui) [11:12:24] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1083 T210713 (duration: 00m 46s) [11:12:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:28] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [11:18:11] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491441 (owner: 10Marostegui) [11:18:51] (03PS2) 10Gilles: Launch performance perception survey on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491229 (https://phabricator.wikimedia.org/T187299) [11:21:27] (03CR) 10Gilles: [C: 03+2] Launch performance perception survey on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491229 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [11:21:32] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 (10Volans) It seems that also PD: 8 is failed now: ` PD: 8 Information Enclosure Device ID: 32 Slot Number: 8 Drive's position: DiskGroup: 0, Span: 0, Arm: 8 Me... [11:22:45] !log stop and restart db1064 [11:22:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:42] (03Merged) 10jenkins-bot: Launch performance perception survey on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491229 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [11:24:23] (03PS1) 10Elukey: Add role::spare::system to kerberos1001 [puppet] - 10https://gerrit.wikimedia.org/r/491445 (https://phabricator.wikimedia.org/T216238) [11:26:13] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T187299 Launch performance perception survey on eswiki (duration: 00m 46s) [11:26:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:16] T187299: User-perceived page load performance study - https://phabricator.wikimedia.org/T187299 [11:26:25] (03CR) 10Elukey: [C: 03+2] Add role::spare::system to kerberos1001 [puppet] - 10https://gerrit.wikimedia.org/r/491445 (https://phabricator.wikimedia.org/T216238) (owner: 10Elukey) [11:27:18] 10Operations, 10Core Platform Team Backlog (Later), 10Patch-For-Review, 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10jijiki) [11:27:42] 10Operations, 10serviceops, 10Core Platform Team Backlog (Later), 10Patch-For-Review, 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10jijiki) [11:28:39] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T212990 (10Volans) megacli is failing to report data from one disk, with the new version of the script we get: ` === RaidStatus (does not include components in optimal state) name: Adapter #0 Virtual Drive: 0 (Target Id:... [11:28:55] (03CR) 10jenkins-bot: Launch performance perception survey on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491229 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [11:29:33] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T212010 (10Volans) It seems that one disk if failed in a way that is not even reported by megacli. The new version of the script reports: ` === RaidStatus (does not include components in optimal state) name: Adapter #0 Vir... [11:29:57] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491447 [11:30:03] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491447 [11:31:09] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T212990 (10Volans) 05Open→03Resolved It seems all good from megacli: ` $ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli === RaidStatus (does not include components in optimal state) === RaidStatus completed `... [11:32:59] (03PS2) 10Volans: raid_handler: fix reported executed script [puppet] - 10https://gerrit.wikimedia.org/r/490337 [11:33:01] (03PS2) 10Volans: raid: improve megacli get raid script [puppet] - 10https://gerrit.wikimedia.org/r/490338 [11:33:29] (03CR) 10Volans: "replies inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490338 (owner: 10Volans) [11:37:31] 10Operations, 10Multimedia, 10Thumbor, 10serviceops: Deploy 3d2png to thumbor servers (stretch) - https://phabricator.wikimedia.org/T216494 (10jijiki) p:05Triage→03Normal [11:38:02] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10jijiki) [11:38:09] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [11:38:11] 10Operations, 10Multimedia, 10Thumbor, 10serviceops: Deploy 3d2png to thumbor servers (stretch) - https://phabricator.wikimedia.org/T216494 (10jijiki) [11:39:50] 10Operations, 10Proton: Proton fails with Chromium 72 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) p:05Triage→03High [11:41:52] 10Operations, 10Proton, 10Security-Team, 10Reading-Infrastructure-Team-Backlog (Kanban): [2 hrs] Decide on handling system updates for Proton - https://phabricator.wikimedia.org/T213366 (10MoritzMuehlenhoff) >>! In T213366#4906913, @Tgr wrote: > @MoritzMuehlenhoff thanks, that's good to know. How would the... [11:43:55] (03CR) 10Mforns: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [11:45:34] 10Operations, 10vm-requests, 10Patch-For-Review: eqiad: (1) Ganeti VM for testing Kerberos in Production - https://phabricator.wikimedia.org/T216238 (10elukey) 05Open→03Resolved a:03elukey ` Linux kerberos1001 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64 elukey@kerberos1001:~$ ` All good! [11:47:04] (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491447 (owner: 10Jcrespo) [11:47:25] (03PS1) 10Marostegui: db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491449 (https://phabricator.wikimedia.org/T210713) [11:49:21] !log installing ruby-rack security updates [11:49:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:10] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491447 (owner: 10Jcrespo) [11:51:36] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491447 (owner: 10Jcrespo) [11:52:20] (03PS2) 10Marostegui: db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491449 (https://phabricator.wikimedia.org/T210713) [11:53:12] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1064 (duration: 00m 46s) [11:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:08] (03CR) 10Elukey: Add analytics purge job for xmldumps on HDFS (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [11:55:18] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491449 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [11:56:52] 10Operations, 10serviceops, 10Core Platform Team Backlog (Later), 10Patch-For-Review, 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10elukey) [11:58:28] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491449 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [11:59:29] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1118 T210713 (duration: 00m 46s) [11:59:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:32] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [11:59:33] (03CR) 10BBlack: [C: 04-1] Add tox configuration to run the tests (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/491280 (owner: 10Volans) [11:59:55] !log Deploy schema change on db1118 - T210713 [11:59:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190219T1200). [12:00:04] WQL, Zoranzoki21, and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:44] . [12:00:56] I can SWAT today [12:01:19] WQL, Zoranzoki21, and Urbanecm: around for SWAT? [12:01:44] I am waiting for deploy [12:02:24] (03PS4) 10Volans: Add tox configuration to run the tests [dns] - 10https://gerrit.wikimedia.org/r/491280 [12:02:26] (03PS3) 10Volans: Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 [12:02:35] (03CR) 10Volans: "good catch!" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/491280 (owner: 10Volans) [12:02:37] (03CR) 10jerkins-bot: [V: 04-1] Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [12:02:42] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491449 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [12:03:10] (03PS3) 10Muehlenhoff: Remove stray packages after dist-upgrade on buster [puppet] - 10https://gerrit.wikimedia.org/r/491275 [12:03:48] WQL: cool, I'll let you know when your patch is at mwdebug1002, ready for testing. Do you know how to test there? Do you need help with that? [12:04:11] Using the chrome extension, right? [12:04:33] yes, browser extension, works for at least chrome and firefox [12:04:48] (03PS10) 10Zfilipin: Modifying configuration about Chinese Wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [12:07:15] RECOVERY - Long running screen/tmux on an-coord1001 is OK: OK: SCREEN detected but not long running. [12:08:47] (03CR) 10Muehlenhoff: [C: 03+2] Remove stray packages after dist-upgrade on buster [puppet] - 10https://gerrit.wikimedia.org/r/491275 (owner: 10Muehlenhoff) [12:08:47] zeljkof, just fyi, I'm here now [12:08:57] Urbanecm: great, you're next [12:09:01] ack [12:09:08] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [12:11:10] (03Merged) 10jenkins-bot: Modifying configuration about Chinese Wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [12:11:43] (03CR) 10BBlack: [C: 03+2] "Thanks for working on this!" [dns] - 10https://gerrit.wikimedia.org/r/491280 (owner: 10Volans) [12:11:51] (03PS5) 10BBlack: Add tox configuration to run the tests [dns] - 10https://gerrit.wikimedia.org/r/491280 (owner: 10Volans) [12:12:45] WQL: the patch is at mwdebug1002, please test and let me know if I can deploy it [12:12:48] jenkins? [12:12:56] oh there it goes [12:13:49] (03CR) 10jenkins-bot: Modifying configuration about Chinese Wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [12:15:36] (03PS1) 10GTirloni: wikireplicas: depool labsdb1009 for updates [puppet] - 10https://gerrit.wikimedia.org/r/491454 (https://phabricator.wikimedia.org/T211939) [12:15:50] WQL: the patch is at mwdebug1002, please test and let me know if I can deploy it [12:16:08] I am checking Special:Import [12:16:11] 10Operations, 10Gerrit, 10serviceops: Gerrit loads very slowly - https://phabricator.wikimedia.org/T215855 (10hashar) From P8071 , it is often helpful to split by queues with `--by-queue` which would shows the number of worker threads and items for each queues. Example showing `ReplicateTo-slaves` and `WorkQ... [12:16:33] (03PS2) 10GTirloni: wikireplicas: depool labsdb1009 for updates [puppet] - 10https://gerrit.wikimedia.org/r/491454 (https://phabricator.wikimedia.org/T211939) [12:16:36] 10Operations: Integrate Stretch 9.8 point update - https://phabricator.wikimedia.org/T216384 (10MoritzMuehlenhoff) [12:17:11] OK it's fine. No bugs found. [12:17:17] 10Operations, 10Core Platform Team, 10Multimedia, 10Thumbor, 10serviceops: Deploy 3d2png to thumbor servers (stretch) - https://phabricator.wikimedia.org/T216494 (10jijiki) [12:17:20] (03CR) 10GTirloni: [C: 03+2] wikireplicas: depool labsdb1009 for updates [puppet] - 10https://gerrit.wikimedia.org/r/491454 (https://phabricator.wikimedia.org/T211939) (owner: 10GTirloni) [12:17:28] Though Requires `namespaceDupes.php --wiki=zhwikiversity --fix` after deployment. [12:18:23] WQL: ok, I'll deploy and run the script [12:19:42] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:482261|Modifying configuration about Chinese Wikiversity (T212919)]] (duration: 00m 48s) [12:19:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:45] T212919: Adding transwiki source, Adding ailias to namespaces and creating "Experiment" and "Lesson" namespaces in Chinese Wikiversity - https://phabricator.wikimedia.org/T212919 [12:19:51] WQL: deployed, running script [12:21:05] WQL: the script is finished https://phabricator.wikimedia.org/T212919#4964351 [12:21:16] ack [12:21:19] thanks [12:21:20] WQL: please test and thanks for deploying with #releng :) [12:21:43] Zoranzoki21: around for swat? [12:21:56] Urbanecm: I'll let you know when your patch is deployed [12:22:02] ok, thanks [12:22:11] if Zoranzoki21 is not around, I can take over his patches if you want me to zeljkof [12:22:19] Urbanecm: sure, please do [12:22:57] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491432 (https://phabricator.wikimedia.org/T215295) (owner: 10Urbanecm) [12:25:54] (03Merged) 10jenkins-bot: Add new throttle rule for WikiProject Women in red, enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491432 (https://phabricator.wikimedia.org/T215295) (owner: 10Urbanecm) [12:28:21] !log zfilipin@deploy1001 Synchronized wmf-config/throttle.php: SWAT: [[gerrit:491432|Add new throttle rule for WikiProject Women in red, enwiki (T215295)]] (duration: 00m 47s) [12:28:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:24] T215295: Wikipedia edit-a-thon IP account creation cap lift on February 20 - https://phabricator.wikimedia.org/T215295 [12:28:38] Urbanecm: 491432 deployed [12:28:44] ack [12:28:59] I'll continue with zoran's patches [12:29:25] Urbanecm: merge conflict for 489819 :) [12:29:28] <_joe_> !log creating gerrit repo operations/debs/tideways-xhprof T176916 [12:29:29] will rebase [12:29:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:31] T176916: Set up sampling profiler for PHP 7 (alternative to HHVM Xenon) - https://phabricator.wikimedia.org/T176916 [12:30:32] 10Operations, 10Gerrit, 10serviceops: Gerrit loads very slowly - https://phabricator.wikimedia.org/T215855 (10hashar) tools.ci first request was on 2019-02-12T01:12:07 . Maybe it is doing the requests to fast for Gerrit :/ Maybe https://gerrit.wikimedia.org/r/monitoring can give us some hint. Maybe we ca... [12:31:14] (03PS2) 10Zfilipin: Add namespace Додатак on srwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491024 (https://phabricator.wikimedia.org/T216343) (owner: 10Zoranzoki21) [12:32:20] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491024 (https://phabricator.wikimedia.org/T216343) (owner: 10Zoranzoki21) [12:32:35] Urbanecm: should I run a script for 491024? [12:33:12] namespaceDupes.php is needed, updateArticleCount.php should be run automatically within a week or two (IIRC) [12:33:44] ok, so just namespaceDupes now? [12:34:10] (03PS2) 10Muehlenhoff: uwsgi: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/486223 [12:34:40] (03Merged) 10jenkins-bot: Add namespace Додатак on srwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491024 (https://phabricator.wikimedia.org/T216343) (owner: 10Zoranzoki21) [12:35:23] (03CR) 10Muehlenhoff: [C: 03+2] uwsgi: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/486223 (owner: 10Muehlenhoff) [12:35:43] Urbanecm: 491024 is at mwdebug [12:35:43] zeljkof, yes [12:35:47] testing [12:35:52] (03CR) 10jenkins-bot: Add new throttle rule for WikiProject Women in red, enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491432 (https://phabricator.wikimedia.org/T215295) (owner: 10Urbanecm) [12:35:54] (03CR) 10jenkins-bot: Add namespace Додатак on srwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491024 (https://phabricator.wikimedia.org/T216343) (owner: 10Zoranzoki21) [12:36:39] zeljkof, please deploy [12:36:43] Urbanecm: ok [12:37:48] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:491024|Add namespace Додатак on srwiktionary (T216343)]] (duration: 00m 46s) [12:37:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:52] T216343: Add namespace Додатак on srwiktionary - https://phabricator.wikimedia.org/T216343 [12:38:01] Urbanecm: deployed, running script [12:38:05] ack [12:40:14] Urbanecm: script done https://phabricator.wikimedia.org/T216343#4964414 [12:40:17] thx [12:40:57] Urbanecm: you didn't rebase 489819? [12:41:02] ah, sorry, I forgot [12:41:10] (03PS2) 10Zfilipin: Set $wgArticleCountMethod = 'any' on fiwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491025 (https://phabricator.wikimedia.org/T216333) (owner: 10Zoranzoki21) [12:41:19] no problemo [12:41:22] zeljkof, let me do that immediately [12:42:23] (03PS8) 10Urbanecm: Add new throttle rule for T215839 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) (owner: 10Zoranzoki21) [12:42:24] here you are ^^^ zeljkof [12:42:37] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491025 (https://phabricator.wikimedia.org/T216333) (owner: 10Zoranzoki21) [12:42:45] Urbanecm: thanks :) [12:42:47] yw [12:42:52] Urbanecm: is script needed for 491025? [12:43:27] for immediate effect, yes, but the script should be scheduled to run automatically within a week or two [12:44:35] ok, then no script :) [12:44:43] if it's not urgent, and I guess it's not [12:44:44] you'll have people complaining [12:45:13] "omg patch was merged still incorrect article count omg omg!!11" [12:46:00] (03Merged) 10jenkins-bot: Set $wgArticleCountMethod = 'any' on fiwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491025 (https://phabricator.wikimedia.org/T216333) (owner: 10Zoranzoki21) [12:46:30] (03PS1) 10Muehlenhoff: nagios_common::commands: Remove support for trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/491460 [12:46:55] (03CR) 10jenkins-bot: Set $wgArticleCountMethod = 'any' on fiwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491025 (https://phabricator.wikimedia.org/T216333) (owner: 10Zoranzoki21) [12:46:57] Urbanecm: is there anything to test for 491025? [12:47:30] Urbanecm: 491025 is at mwdebug [12:49:17] if you didn't run the script, then no zeljkof [12:49:27] Urbanecm: ok, deploying [12:49:30] thx [12:50:23] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:491025|Set $wgArticleCountMethod = any on fiwikinews (T216333)]] (duration: 00m 45s) [12:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:26] T216333: Set $wgArticleCountMethod = 'any' on fiwikinews and run updateArticleCount.php - https://phabricator.wikimedia.org/T216333 [12:50:31] Urbanecm: 491025 deployed [12:50:33] thx [12:52:00] (03PS9) 10Zfilipin: Add new throttle rule for Kickstarter Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) (owner: 10Zoranzoki21) [12:52:23] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) (owner: 10Zoranzoki21) [12:52:31] (03CR) 10Joal: "Thanks for comments elukey :)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [12:52:50] (03PS5) 10Joal: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) [12:54:17] (03Merged) 10jenkins-bot: Add new throttle rule for Kickstarter Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) (owner: 10Zoranzoki21) [12:56:39] !log zfilipin@deploy1001 Synchronized wmf-config/throttle.php: SWAT: [[gerrit:489819|Add new throttle rule for Kickstarter Edit-a-thon (T215839)]] (duration: 00m 43s) [12:56:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:42] T215839: Lift IP cap for account creation for Kickstarter Edit-a-thon on March 3rd - https://phabricator.wikimedia.org/T215839 [12:57:15] (03CR) 10Elukey: "Let's do the bash file to be super sure!" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [12:57:29] Urbanecm: 489819 deployed, I think it's the last one, thanks for deploying with #releng ;) [12:57:38] yw and thanks for lal the deploys [12:57:48] (03CR) 10jenkins-bot: Add new throttle rule for Kickstarter Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) (owner: 10Zoranzoki21) [12:57:53] !log EU SWAT finished [12:57:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190219T1300) [13:06:48] (03CR) 10Brian Wolff: [C: 03+1] "Nice to see tools adopting CSP." [puppet] - 10https://gerrit.wikimedia.org/r/491377 (https://phabricator.wikimedia.org/T214637) (owner: 10Framawiki) [13:15:00] (03PS1) 10Muehlenhoff: Absent libpcre3-dbg from hhvm::debug [puppet] - 10https://gerrit.wikimedia.org/r/491462 (https://phabricator.wikimedia.org/T176370) [13:17:16] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1118" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491463 [13:21:21] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1118" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491463 (owner: 10Marostegui) [13:23:04] !log running `maintain-views --all-databases --replace-all --clean --debug` on labsdb1009 (T216481) [13:23:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:07] T216481: Remove views on ep_* tables on the wikireplicas hosts - https://phabricator.wikimedia.org/T216481 [13:24:49] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1118" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491463 (owner: 10Marostegui) [13:25:45] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1118 T210713 (duration: 00m 46s) [13:25:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:48] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [13:27:05] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1118" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491463 (owner: 10Marostegui) [13:31:40] !log installing rssh update for jessie [13:31:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:22] PROBLEM - puppet last run on mw2167 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[libpcre3-dbg] [13:42:26] (03PS1) 10GTirloni: Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491475 (https://phabricator.wikimedia.org/T216506) [13:42:43] (03PS1) 10Jcrespo: dbproxy: Reload automatically haproxy on configuration update [puppet] - 10https://gerrit.wikimedia.org/r/491476 [13:42:54] (03CR) 10Jcrespo: [C: 04-1] dbproxy: Reload automatically haproxy on configuration update [puppet] - 10https://gerrit.wikimedia.org/r/491476 (owner: 10Jcrespo) [13:43:21] !log ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --sysop Ladsgroup (T215919) [13:43:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:25] T215919: Investigate 4h: Wikidata Tours don't load correctly - https://phabricator.wikimedia.org/T215919 [13:43:41] (03CR) 10Mforns: [C: 03+1] Add analytics purge job for xmldumps on HDFS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [13:43:46] (03PS2) 10Jcrespo: dbproxy: Reload automatically haproxy on configuration update [puppet] - 10https://gerrit.wikimedia.org/r/491476 [13:44:13] !log mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --interface-admin Ladsgroup [13:44:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:27] (03CR) 10GTirloni: [C: 04-1] "Thanks for showing how it'd be implemented if we wanted this. I completely defer to your judgment on this matter." [puppet] - 10https://gerrit.wikimedia.org/r/491476 (owner: 10Jcrespo) [13:49:44] (03CR) 10Jcrespo: "But I understand your complain/needs, and maybe a cumin/spicerack script could be done instead? I am open to suggestions." [puppet] - 10https://gerrit.wikimedia.org/r/491476 (owner: 10Jcrespo) [13:51:02] (03PS1) 10GTirloni: Revert "wikireplicas: depool labsdb1009 for updates" [puppet] - 10https://gerrit.wikimedia.org/r/491478 (https://phabricator.wikimedia.org/T211939) [13:51:49] (03CR) 10GTirloni: [C: 03+2] Revert "wikireplicas: depool labsdb1009 for updates" [puppet] - 10https://gerrit.wikimedia.org/r/491478 (https://phabricator.wikimedia.org/T211939) (owner: 10GTirloni) [13:52:46] moritzm: I see we have two commits waiting to be merged :) [13:54:52] gtirloni: ah, I'll merge both [13:55:13] moritzm: thanks :) [13:56:53] (03CR) 10Gehel: Add wdqs data transfer cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [14:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190219T1400) [14:01:26] (03PS1) 10GTirloni: wiki replicas: depool labsdb1010 for changes [puppet] - 10https://gerrit.wikimedia.org/r/491481 (https://phabricator.wikimedia.org/T216481) [14:02:07] (03CR) 10GTirloni: [C: 03+2] wiki replicas: depool labsdb1010 for changes [puppet] - 10https://gerrit.wikimedia.org/r/491481 (https://phabricator.wikimedia.org/T216481) (owner: 10GTirloni) [14:04:07] (03PS1) 10Gehel: elasticsearch: relforge now uses elastic56 apt component [puppet] - 10https://gerrit.wikimedia.org/r/491482 (https://phabricator.wikimedia.org/T215931) [14:04:10] !log running `maintain-views --all-databases --replace-all --clean --debug` on labsdb1010 (T216481) [14:04:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:13] T216481: Remove views on ep_* tables on the wikireplicas hosts - https://phabricator.wikimedia.org/T216481 [14:04:32] (03CR) 10Muehlenhoff: "Given that this is now only used in modules/toollabs (which will eventually be removed in total), better inline it's use in toollabs::exec" [puppet] - 10https://gerrit.wikimedia.org/r/491475 (https://phabricator.wikimedia.org/T216506) (owner: 10GTirloni) [14:05:20] (03CR) 10Mathew.onipe: [C: 03+1] elasticsearch: relforge now uses elastic56 apt component [puppet] - 10https://gerrit.wikimedia.org/r/491482 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [14:05:40] (03PS2) 10Gehel: elasticsearch: relforge now uses elastic56 apt component [puppet] - 10https://gerrit.wikimedia.org/r/491482 (https://phabricator.wikimedia.org/T215931) [14:06:01] (03CR) 10DCausse: [C: 03+1] elasticsearch: relforge now uses elastic56 apt component [puppet] - 10https://gerrit.wikimedia.org/r/491482 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [14:06:04] (03CR) 10GTirloni: "Agreed. I'll submit a separate change. Thanks for reviewing!" [puppet] - 10https://gerrit.wikimedia.org/r/491475 (https://phabricator.wikimedia.org/T216506) (owner: 10GTirloni) [14:06:07] (03Abandoned) 10GTirloni: Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491475 (https://phabricator.wikimedia.org/T216506) (owner: 10GTirloni) [14:06:17] (03CR) 10Gehel: [C: 03+2] elasticsearch: relforge now uses elastic56 apt component [puppet] - 10https://gerrit.wikimedia.org/r/491482 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [14:06:25] (03CR) 10Muehlenhoff: [C: 03+1] admin: add Angela Muigai to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/490932 (https://phabricator.wikimedia.org/T216101) (owner: 10Cwhite) [14:07:02] (03PS2) 10Fsero: admin: add Angela Muigai to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/490932 (https://phabricator.wikimedia.org/T216101) (owner: 10Cwhite) [14:07:33] (03CR) 10Fsero: [V: 03+2 C: 03+2] admin: add Angela Muigai to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/490932 (https://phabricator.wikimedia.org/T216101) (owner: 10Cwhite) [14:08:53] (03PS1) 10Ladsgroup: Set wmgWikibaseRepoIdGeneratorSeparateDbConnection to true for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491484 (https://phabricator.wikimedia.org/T215147) [14:09:10] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10Marostegui) [14:09:12] 10Operations: New cronspam from db clusters - https://phabricator.wikimedia.org/T216273 (10Marostegui) 05Open→03Resolved Nothing has arrived since the restart without debug, so I think we are good [14:11:21] (03PS1) 10Gehel: elasticsearch/relforge: fix typo in hiera param for elasticsearch version [puppet] - 10https://gerrit.wikimedia.org/r/491485 (https://phabricator.wikimedia.org/T215931) [14:11:47] (03CR) 10DCausse: [C: 03+1] elasticsearch/relforge: fix typo in hiera param for elasticsearch version [puppet] - 10https://gerrit.wikimedia.org/r/491485 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [14:12:32] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10fsero) Hi @kostajh or @marcella could you identify the specific group we should grant? Thanks [14:12:59] (03CR) 10Gehel: [C: 03+2] elasticsearch/relforge: fix typo in hiera param for elasticsearch version [puppet] - 10https://gerrit.wikimedia.org/r/491485 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [14:16:04] !log stop db2090 for reboot testing T216240 [14:16:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:07] T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 [14:23:24] (03PS3) 10MSantos: Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) [14:25:10] (03CR) 10MSantos: Restore privileges to admin table after script (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos) [14:29:57] !log rolling upgrade of elasticsearch on relforge - T215931 [14:30:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:04] T215931: Upgrade elasticsearch to 5.6.14 - https://phabricator.wikimedia.org/T215931 [14:30:24] ^ this is the first real world use of the new cookbooks for elasticsearch, please cross fingers [14:30:26] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-upgrade [14:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:47] !log gehel@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99) [14:31:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:39] gehel: finger crossed and I'm around [14:33:12] volans: this is only relforge, no big deal if I break it (I'm still trying to not break it) [14:33:21] ehehehe [14:33:49] hehehehe [14:35:05] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-upgrade [14:35:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:28] nice! [14:36:14] bd808: and thanks to you SAL tool now interprets them correctly ;() [14:36:25] !log gehel@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99) [14:36:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:28] * ;) [14:38:44] (03CR) 10Elukey: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [14:40:35] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-upgrade [14:40:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:37] crossing finger harder [14:41:49] stupid argument parsing :/ [14:41:55] !log gehel@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99) [14:41:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:57] who needs argument parsing [14:42:39] gehel: didn't we move the argparse step before the START? [14:43:01] it shouldn't fail, unless is in the "post" argument validation [14:43:15] the args are parsed without error, I'm just learning about argparse and booleans [14:43:34] lol :) [14:44:02] it looked so obviously correct :) [14:47:38] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Papaul) Can db2089 be depool please if it is not yet? Thanks [14:49:21] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10jcrespo) Rebooting db2090: ` PowerEdge R630 BIOS Version: 2.4.3 ` ` 1st reboot: OK 2nd reboot: FAIL 3rd reboot: OK 4th reboot: OK 5th reboot: OK 6th reboot:... [14:49:45] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10jcrespo) Preparing db2089 for you, @Papaul give me 5 minutes. [14:51:34] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Better" [puppet] - 10https://gerrit.wikimedia.org/r/490338 (owner: 10Volans) [14:53:11] !log stopping db2089 for hw maintenance T216240 [14:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:14] T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 [14:57:02] (03PS1) 10Gehel: elasticsearch: invert the "--nodes-has-lvs" argument [cookbooks] - 10https://gerrit.wikimedia.org/r/491502 [14:59:57] (03CR) 10Ottomata: "This is good! What would be best overall is if the entire Hadoop/Yarn data hierarchy lived in a sub chroot in ZK, like we do for Kafka. " [puppet/cdh] - 10https://gerrit.wikimedia.org/r/490572 (owner: 10Elukey) [15:00:50] (03CR) 10Ottomata: "Great!" [puppet] - 10https://gerrit.wikimedia.org/r/490877 (owner: 10Elukey) [15:00:52] (03PS2) 10Gehel: elasticsearch: invert the "--nodes-has-lvs" argument [cookbooks] - 10https://gerrit.wikimedia.org/r/491502 [15:02:03] (03PS1) 10Ladsgroup: Drop obsolete Wikibase configs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491506 (https://phabricator.wikimedia.org/T213713) [15:02:58] (03CR) 10Volans: [C: 03+1] "LGTM, nitpick inline, no need to change the code." (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/491502 (owner: 10Gehel) [15:04:43] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10jcrespo) Rebooting db2090: ` PowerEdge R630 BIOS Version: 2.4.3 ` ` 1st reboot: OK 2nd reboot: FAIL 3rd reboot: OK 4th reboot: OK 5th reboot: OK 6th reboot:... [15:05:15] (03CR) 10Mathew.onipe: elasticsearch: invert the "--nodes-has-lvs" argument (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/491502 (owner: 10Gehel) [15:05:30] (03CR) 10Mathew.onipe: [C: 03+1] elasticsearch: invert the "--nodes-has-lvs" argument [cookbooks] - 10https://gerrit.wikimedia.org/r/491502 (owner: 10Gehel) [15:06:27] (03CR) 10Mforns: "I understand your concern Luca. I also think it is likely to fail due to weird characters." [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [15:07:06] (03CR) 10Gehel: [C: 03+2] elasticsearch: invert the "--nodes-has-lvs" argument [cookbooks] - 10https://gerrit.wikimedia.org/r/491502 (owner: 10Gehel) [15:07:12] 10Operations, 10Multimedia, 10Performance-Team, 10Thumbor, 10serviceops: Deploy 3d2png to thumbor servers (stretch) - https://phabricator.wikimedia.org/T216494 (10kchapman) [15:09:44] !log depooled labsdb1010 T216481 [15:09:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:48] T216481: Remove views on ep_* tables on the wikireplicas hosts - https://phabricator.wikimedia.org/T216481 [15:10:53] <_joe_> !log uploading tideways-xhprof_5.0.0~beta3 to reprepro T176916 [15:10:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:56] T176916: Set up sampling profiler for PHP 7 (alternative to HHVM Xenon) - https://phabricator.wikimedia.org/T176916 [15:11:20] (03PS1) 10Gehel: elasticsearch: remove a double negation from the arguments [cookbooks] - 10https://gerrit.wikimedia.org/r/491509 [15:11:37] (03PS2) 10Elukey: statistics: Add configs for new analytics db hosts [puppet] - 10https://gerrit.wikimedia.org/r/490085 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [15:11:57] (03CR) 10Addshore: [C: 03+1] statistics: Add configs for new analytics db hosts [puppet] - 10https://gerrit.wikimedia.org/r/490085 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [15:12:00] (03CR) 10Volans: [C: 03+1] "LGTM!" [cookbooks] - 10https://gerrit.wikimedia.org/r/491509 (owner: 10Gehel) [15:12:45] (03CR) 10Elukey: [C: 03+2] statistics: Add configs for new analytics db hosts [puppet] - 10https://gerrit.wikimedia.org/r/490085 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [15:13:50] (03CR) 10Gehel: [C: 03+2] elasticsearch: remove a double negation from the arguments [cookbooks] - 10https://gerrit.wikimedia.org/r/491509 (owner: 10Gehel) [15:15:40] (03CR) 10Tchanders: [C: 03+1] Enable partial blocks on Meta Wiki and MediaWiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491291 (https://phabricator.wikimedia.org/T216065) (owner: 10Dbarratt) [15:22:53] 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10Ottomata) > It might be a reasonable compromise to build a shell pipeline compatible utility that can be used to reformat JSON log event records kafkat... [15:25:47] !log Started instance compiler1002.puppet-diffs.eqiad.wmflabs via Horizon. It was in shutoff state | T216513 [15:25:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:50] T216513: compiler1002.puppet-diffs.eqiad.wmflabs instance is down - https://phabricator.wikimedia.org/T216513 [15:27:29] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-upgrade [15:27:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:59] there we go! [15:28:24] \o/ [15:28:27] (03PS1) 10Giuseppe Lavagetto: Allow building for php 7.2 too [debs/tideways-xhprof] - 10https://gerrit.wikimedia.org/r/491515 [15:29:28] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Allow building for php 7.2 too [debs/tideways-xhprof] - 10https://gerrit.wikimedia.org/r/491515 (owner: 10Giuseppe Lavagetto) [15:29:51] twentyafterfour: I can deploy the T216200 fix in a bit if you want [15:29:52] T216200: includes/specials/pagers/ActiveUsersPager.php: PHP Notice: Undefined index: dir - https://phabricator.wikimedia.org/T216200 [15:30:54] 10Operations, 10Multimedia, 10Performance-Team, 10Thumbor, 10serviceops: Deploy 3d2png to thumbor servers (stretch) - https://phabricator.wikimedia.org/T216494 (10Jdforrester-WMF) [15:32:19] 10Operations, 10Cassandra, 10RESTBase, 10RESTBase-Cassandra, and 2 others: secure Cassandra/RESTBase cluster - https://phabricator.wikimedia.org/T94329 (10EvanProdromou) [15:32:23] !log apt-get upgrade on compiler1001 and compiler1002.puppet-diffs.eqiad.wmflabs [15:32:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:42] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Papaul) db2089 upgrade complete Upgrade BIOS from 2.4.3 to 2.9.1 IDRAC from 2.40. to 2.61 [15:38:37] 10Operations, 10ops-codfw, 10serviceops, 10User-jijiki: Degraded RAID on thumbor2002 - https://phabricator.wikimedia.org/T214813 (10Papaul) The network cable was was plugged back in after the disk replacement. Should be good now. [15:38:55] !log gehel@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99) [15:38:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:16] PROBLEM - ElasticSearch health check for shards on 9200 on relforge1001 is CRITICAL: CRITICAL - elasticsearch http://10.64.4.13:9200/_cluster/health error while fetching: HTTPConnectionPool(host=10.64.4.13, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(requests.packages.urllib3.connection.HTTPConnection object at 0x7f862e7cc510: Failed to establish a new connection: [Errno 111] Connecti [15:39:28] PROBLEM - Check systemd state on relforge1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:39:44] ^ that's expected, re-adding downtime [15:39:59] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10jcrespo) a:05Papaul→03jcrespo Thanks, will ping you when/if tested more issues on that and other servers. [15:40:03] (03PS3) 10Herron: logstash: apply role::logstash to new logstash101[0-2] hardware hosts [puppet] - 10https://gerrit.wikimedia.org/r/490695 (https://phabricator.wikimedia.org/T214608) [15:40:12] (03PS4) 10Herron: logstash: apply role::logstash to new logstash101[0-2] hardware hosts [puppet] - 10https://gerrit.wikimedia.org/r/490695 (https://phabricator.wikimedia.org/T214608) [15:40:28] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10jcrespo) [15:44:22] RECOVERY - Check systemd state on relforge1001 is OK: OK - running: The system is fully operational [15:44:53] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-herron: Ban recurrent spam to Wikimedia mailing lists (January 2019) - https://phabricator.wikimedia.org/T215251 (10herron) 05Open→03Resolved a:03herron Great! Glad to hear it. Resolving [15:45:20] RECOVERY - ElasticSearch health check for shards on 9200 on relforge1001 is OK: OK - elasticsearch status relforge-eqiad: status: green, number_of_nodes: 2, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 83, task_max_waiting_in_queue_millis: 0, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_as_number: 100.0, active_shards: 104, in [15:45:20] : 0, number_of_data_nodes: 2, delayed_unassigned_shards: 0 [15:46:00] godog: q about https://phabricator.wikimedia.org/T205856#4957430 [15:46:53] are all of those currently sent to udp2log via Monolog in mediawiki? [15:47:23] and, will there be only one 'mwlog' kafka topic, or will there be multipel per 'channel'? [15:47:29] !log Reimaging thumbor2002 to stretch - T214597 [15:47:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:32] T214597: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 [15:51:07] hey ottomata, go.dog is sick today so best to discuss on the task [15:53:46] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin2001.codfw.wmnet for hosts: ` ['thumbor2002.codfw.wmnet'] ` The log can be foun... [15:53:51] ottomata but yeah on mwlog there is an egrep '^(scap|scholarships|iegreview) ' | /usr/bin/log2udp -h logstash.svc.eqiad.wmnet -p 8324 [15:54:21] and regarding topics the current thought was to create a topic prefix of ‘mwlog’ with log severity suffixes [15:56:04] (03CR) 10Milimetric: "everything looked good to me, thanks for being more careful than me and testing :)" [puppet] - 10https://gerrit.wikimedia.org/r/491246 (https://phabricator.wikimedia.org/T205940) (owner: 10Joal) [15:58:39] (03PS1) 10Giuseppe Lavagetto: profile: use register_shutdown_function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491518 (https://phabricator.wikimedia.org/T176916) [16:03:03] k thanks herron [16:04:49] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-upgrade [16:04:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:56] (03CR) 10Krinkle: [C: 03+1] profile: use register_shutdown_function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491518 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto) [16:07:17] (03PS1) 10Hashar: contint: restore packages::java [puppet] - 10https://gerrit.wikimedia.org/r/491522 (https://phabricator.wikimedia.org/T216517) [16:07:48] !log gehel@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99) [16:07:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:58] (03Abandoned) 10Hashar: contint: restore packages::java [puppet] - 10https://gerrit.wikimedia.org/r/491522 (https://phabricator.wikimedia.org/T216517) (owner: 10Hashar) [16:08:31] (03PS1) 10Alexandros Kosiaris: Introduce citoid helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) [16:12:53] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10jcrespo) Rebooting db2089: ` PowerEdge R630 BIOS Version: 2.9.1 ` ` 1st reboot: OK 2nd reboot: OK 3rd reboot: OK 4th reboot: OK 5th reboot: OK 6th reboot: O... [16:15:23] (03PS1) 10GTirloni: Revert "wiki replicas: depool labsdb1010 for changes" [puppet] - 10https://gerrit.wikimedia.org/r/491525 (https://phabricator.wikimedia.org/T216481) [16:16:04] (03CR) 10GTirloni: [C: 03+2] Revert "wiki replicas: depool labsdb1010 for changes" [puppet] - 10https://gerrit.wikimedia.org/r/491525 (https://phabricator.wikimedia.org/T216481) (owner: 10GTirloni) [16:16:12] (03PS2) 10GTirloni: Revert "wiki replicas: depool labsdb1010 for changes" [puppet] - 10https://gerrit.wikimedia.org/r/491525 (https://phabricator.wikimedia.org/T216481) [16:18:54] !log re-pooled labsdb1010 T216481 [16:18:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:57] T216481: Remove views on ep_* tables on the wikireplicas hosts - https://phabricator.wikimedia.org/T216481 [16:20:46] PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[cdh::hadoop::directory /user/spark] [16:20:46] 10Operations, 10Operations-Software-Development: Netbox: cable termination names report - https://phabricator.wikimedia.org/T216469 (10crusnov) Presumably you'd like to surface the particular parent device on that termination point ? [16:24:06] 10Operations, 10Maps, 10Reading-Infrastructure-Team-Backlog, 10Services: Create Debian packages for Node.js 8 upgrade for Maps - https://phabricator.wikimedia.org/T216521 (10MSantos) [16:24:27] 10Operations: Integrate Stretch 9.8 point update - https://phabricator.wikimedia.org/T216384 (10MoritzMuehlenhoff) [16:26:00] !log enabling elasticsearch on new eqiad hosts logstash101[0-2] [16:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:35] (03PS5) 10Herron: logstash: apply role::logstash to new logstash101[0-2] hardware hosts [puppet] - 10https://gerrit.wikimedia.org/r/490695 (https://phabricator.wikimedia.org/T214608) [16:27:30] (03CR) 10Herron: [C: 03+2] logstash: apply role::logstash to new logstash101[0-2] hardware hosts [puppet] - 10https://gerrit.wikimedia.org/r/490695 (https://phabricator.wikimedia.org/T214608) (owner: 10Herron) [16:28:44] (03PS1) 10Gehel: Add remove_on_error parameter to icinga.hosts_downtimed() [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 [16:28:52] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.17/includes/specials/pagers/ActiveUsersPager.php: T216200 Hot deploy variable name fix for ActiveUsersPager query (duration: 00m 48s) [16:28:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:54] T216200: includes/specials/pagers/ActiveUsersPager.php: PHP Notice: Undefined index: dir - https://phabricator.wikimedia.org/T216200 [16:32:14] 10Operations, 10Maps, 10Reading-Infrastructure-Team-Backlog, 10Services: Create Debian packages for Node.js 8 upgrade for Maps - https://phabricator.wikimedia.org/T216521 (10MoritzMuehlenhoff) We can't easily maintain nodejs 8 packages in parallel, that adds a substantive maintenance overhead. Plus, 8 will... [16:32:23] (03PS1) 10Elukey: profile::analytics::refinery: add a wrapper for analytics-mysql [puppet] - 10https://gerrit.wikimedia.org/r/491528 (https://phabricator.wikimedia.org/T212386) [16:33:44] !log installing libssh update from stretch point release [16:33:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:48] (03PS1) 10GTirloni: wiki replicas: depool labsdb1011 for changes [puppet] - 10https://gerrit.wikimedia.org/r/491530 (https://phabricator.wikimedia.org/T216481) [16:34:00] (03PS1) 10Gehel: Make retries less verbose. [software/spicerack] - 10https://gerrit.wikimedia.org/r/491531 [16:34:55] (03CR) 10jerkins-bot: [V: 04-1] wiki replicas: depool labsdb1011 for changes [puppet] - 10https://gerrit.wikimedia.org/r/491530 (https://phabricator.wikimedia.org/T216481) (owner: 10GTirloni) [16:35:47] (03PS2) 10GTirloni: wiki replicas: depool labsdb1011 for changes [puppet] - 10https://gerrit.wikimedia.org/r/491530 (https://phabricator.wikimedia.org/T216481) [16:36:43] (03CR) 10GTirloni: [C: 03+2] wiki replicas: depool labsdb1011 for changes [puppet] - 10https://gerrit.wikimedia.org/r/491530 (https://phabricator.wikimedia.org/T216481) (owner: 10GTirloni) [16:38:17] 10Operations, 10Analytics, 10Discovery, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10mmodell) FWIW I found it fairly easy to work with swift from a development point of view but getting that experimen... [16:38:48] 10Operations, 10Analytics, 10Discovery, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Nuria) I think transferring data *seems* that could be taken care of with hadoop's copytolocal right? Issue we wan... [16:39:26] !log depooled labsdb1011 T216481 [16:39:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:28] T216481: Remove views on ep_* tables on the wikireplicas hosts - https://phabricator.wikimedia.org/T216481 [16:40:17] (03CR) 10jerkins-bot: [V: 04-1] Make retries less verbose. [software/spicerack] - 10https://gerrit.wikimedia.org/r/491531 (owner: 10Gehel) [16:43:07] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: install tideways-xhprof, remove tideways [puppet] - 10https://gerrit.wikimedia.org/r/491533 (https://phabricator.wikimedia.org/T176916) [16:43:42] (03CR) 10Mathew.onipe: Add remove_on_error parameter to icinga.hosts_downtimed() (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel) [16:44:52] 10Operations: Integrate Stretch 9.8 point update - https://phabricator.wikimedia.org/T216384 (10MoritzMuehlenhoff) [16:47:06] hashar: o/ [16:47:22] I noticed a strange pcc failure https://puppet-compiler.wmflabs.org/compiler1002/14733/stat1007.eqiad.wmnet/change.stat1007.eqiad.wmnet.err [16:47:39] could it be related to what you worked on today? [16:47:53] passwords::puppet::database seems not a recent change [16:48:24] and I can see it in modules/passwords/manifests/init.pp (labs_private repo) [16:49:40] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10kostajh) After consulting with @nettrom_WMF, it seems like `analytics-privatedata-users` is the level we want, but, would it be possible for someone to clarify what exactly is meant by... [16:50:26] also there are a lot of other logs/output [16:51:25] (03CR) 10Eevans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) (owner: 10Elukey) [16:51:38] (03CR) 10Mathew.onipe: Make retries less verbose. (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491531 (owner: 10Gehel) [16:52:24] (03CR) 10Fsero: [C: 04-1] Introduce citoid helm chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [16:53:34] 10Operations, 10Mail, 10Phabricator: DomainKeys Identified Mail (DKIM) for phabricator.wikimedia.org - https://phabricator.wikimedia.org/T116805 (10Niedzielski) This is only a single datapoint but I noticed a Phab comment email notification from gerritbot was mistakenly marked as spam on Friday in my Gmail. [16:56:39] (03CR) 10Addshore: [C: 03+1] Set wmgWikibaseRepoIdGeneratorSeparateDbConnection to true for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491484 (https://phabricator.wikimedia.org/T215147) (owner: 10Ladsgroup) [17:00:04] godog and _joe_: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Puppet SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190219T1700). [17:00:04] No GERRIT patches in the queue for this window AFAICS. [17:00:18] !log Offlined compiler1002.puppet-diffs.eqiad.wmflabs from Jenkins. Its disk is corrupt | T216513 [17:00:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:21] T216513: compiler1002.puppet-diffs.eqiad.wmflabs instance is down - https://phabricator.wikimedia.org/T216513 [17:04:45] 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10Ottomata) Qs: Are the logs sent using Monolog? Is there just one topic 'mwlog', or multiple, one per channel? I'm asking just in case we should consi... [17:06:22] (03CR) 10Fsero: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/491533 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto) [17:14:14] (03CR) 10Ottomata: [WIP]: Switch kafka logging to EventBus logging. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [17:16:34] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [17:16:36] 10Operations, 10ops-codfw, 10serviceops, 10User-jijiki: Degraded RAID on thumbor2002 - https://phabricator.wikimedia.org/T214813 (10jijiki) 05Open→03Resolved @Papaul Thank you, it works now:) [17:17:09] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/14740/stat1007.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/491528 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey) [17:18:51] (03CR) 10Ottomata: [WIP]: Switch kafka logging to EventBus logging. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [17:22:20] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T216004 (10Cmjohnson) @GTirloni The disks in slots 2 and 3 have been replaced. Return shipping info USPS 9202 3946 5301 2441 0207 95 FEDEX 9611918 2393026 77770812 [17:24:27] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson @GTirloni The disk has been replaced Return Shipping Info USPS 9202 3946 5301 2441 0201 84 FEDEX 9611918 2393026 77770201 [17:24:35] ACKNOWLEDGEMENT - MegaRAID on cloudvirt1018 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T216526 [17:24:38] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T216526 (10ops-monitoring-bot) [17:26:51] 10Operations, 10ops-eqiad, 10cloud-services-team: Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T216526 (10Marostegui) [17:29:28] !ops You are part of the biased liberal Fake News Media that caused Flakka to be made illegal. Shame on you, and may your mothers be raped by the beasts of the jungle. [17:30:37] !ops [17:31:10] lol [17:31:24] thanks foks [17:31:48] np [17:31:50] ah right [17:31:52] :) [17:31:55] not sure why the ban took so long [17:34:38] 10Operations, 10ops-eqiad, 10DC-Ops: icinga1001 mysterious reboots - https://phabricator.wikimedia.org/T210108 (10Cmjohnson) [17:34:43] 10Operations, 10ops-eqiad, 10monitoring, 10Patch-For-Review: icinga1001 crashed - https://phabricator.wikimedia.org/T214760 (10Cmjohnson) 05Open→03Resolved CPU2 was replaced Shipping Info USPS 9202 3946 5301 2441 0151 11 FEDEX 9611918 2393026 77765139 [17:36:13] 10Operations, 10Analytics, 10Analytics-Kanban, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10RobH) >>! In T216226#4961096, @elukey wrote: > Thanks all for all the detailed info! > > One thought: I found this interesting use case https://www.amd.com/en/case-st... [17:37:16] (03PS2) 10Gehel: Make retries less verbose. [software/spicerack] - 10https://gerrit.wikimedia.org/r/491531 [17:38:09] (03CR) 10Gehel: Add remove_on_error parameter to icinga.hosts_downtimed() (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel) [17:38:22] (03CR) 10Gehel: Make retries less verbose. (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491531 (owner: 10Gehel) [17:39:11] 10Operations, 10Analytics, 10Analytics-Kanban, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10EBernhardson) Unfortunately the rx 550 and 560 mentioned have 4GB of memory, which is basically a show stopper. [17:41:49] (03CR) 10Volans: [C: 04-1] "It seems a weird construct to me." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491531 (owner: 10Gehel) [17:43:34] (03CR) 10Gehel: Make retries less verbose. (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491531 (owner: 10Gehel) [17:44:57] (03PS3) 10Volans: raid_handler: fix reported executed script [puppet] - 10https://gerrit.wikimedia.org/r/490337 [17:46:13] (03CR) 10Volans: [C: 03+2] raid_handler: fix reported executed script [puppet] - 10https://gerrit.wikimedia.org/r/490337 (owner: 10Volans) [17:46:30] (03CR) 10Volans: [C: 03+2] raid: improve megacli get raid script [puppet] - 10https://gerrit.wikimedia.org/r/490338 (owner: 10Volans) [17:46:41] (03PS3) 10Volans: raid: improve megacli get raid script [puppet] - 10https://gerrit.wikimedia.org/r/490338 [17:47:50] 10Operations, 10ops-codfw, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10jcrespo) 05Open→03Stalled p:05Triage→03Low So I believe this is still an ongoing issue, but the remaining hosts may have a lower probability of failing... [17:50:14] 10Operations, 10Cloud-VPS, 10Traffic, 10netops, 10cloud-services-team (Kanban): Evaluate the possibility to add Juniper images to Openstack - https://phabricator.wikimedia.org/T180179 (10aborrero) >>! In T180179#4953415, @ayounsi wrote: > Bumping this task, now that WMCS is on Neutron. I'm not sure whic... [17:50:26] (03PS1) 10Effie Mouzeli: Apply -R 200 to memcached on mc1027 [puppet] - 10https://gerrit.wikimedia.org/r/491541 (https://phabricator.wikimedia.org/T208844) [17:51:08] (03CR) 10Effie Mouzeli: [C: 03+2] Apply -R 200 to memcached on mc1027 [puppet] - 10https://gerrit.wikimedia.org/r/491541 (https://phabricator.wikimedia.org/T208844) (owner: 10Effie Mouzeli) [17:52:43] !log Restarting memcache on mc1027 - T208844 [17:52:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:46] T208844: Apply -R 200 to all the memcached mw object cache instances running in eqiad/codfw - https://phabricator.wikimedia.org/T208844 [17:53:23] (03PS2) 10CRusnov: Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) [17:54:22] (03CR) 10jerkins-bot: [V: 04-1] Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [17:54:26] 10Operations, 10MediaWiki-Cache, 10serviceops, 10Patch-For-Review, and 3 others: Apply -R 200 to all the memcached mw object cache instances running in eqiad/codfw - https://phabricator.wikimedia.org/T208844 (10jijiki) [17:55:23] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban: confirm gpu form factor in stat1005 - https://phabricator.wikimedia.org/T216528 (10RobH) p:05Triage→03Normal [17:55:31] !log temporarily increased eqiad logstash elasticsearch low disk watermark to 87% (will restore to 85% when eqiad expansion hosts are fully online) [17:55:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: Time to snap out of that daydream and deploy Services – Graphoid / Parsoid / Citoid / ORES. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190219T1800). [18:01:56] PROBLEM - Host stat1005.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:04:52] (03PS1) 10Gehel: elasticsearch: wait a bit for elasticsearch servers to be up [cookbooks] - 10https://gerrit.wikimedia.org/r/491544 [18:08:53] (03CR) 10Volans: [C: 03+1] "syntactically correct, looks mostly based on wisdom ;)" [cookbooks] - 10https://gerrit.wikimedia.org/r/491544 (owner: 10Gehel) [18:09:13] (03CR) 10Gehel: Make retries less verbose. (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491531 (owner: 10Gehel) [18:09:30] (03CR) 10Gehel: [C: 03+2] elasticsearch: wait a bit for elasticsearch servers to be up [cookbooks] - 10https://gerrit.wikimedia.org/r/491544 (owner: 10Gehel) [18:12:44] RECOVERY - Host stat1005.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.74 ms [18:17:02] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1009: evaluate upgrading to 10G - https://phabricator.wikimedia.org/T216324 (10Andrew) [18:20:39] !log starting branch-cut for 1.33.0-wmf.18 [18:20:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:11] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 (10GTirloni) 05Resolved→03Open [18:21:28] 10Operations, 10Cloud-VPS, 10Traffic, 10netops, 10cloud-services-team (Kanban): Evaluate the possibility to add Juniper images to Openstack - https://phabricator.wikimedia.org/T180179 (10bd808) There are a few other issues beyond the Neutron constraints that still exist (which kind of boil down to a lack... [18:21:55] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban: confirm gpu form factor in stat1005 - https://phabricator.wikimedia.org/T216528 (10Cmjohnson) {F28247119} {F28247120} {F28247121} {F28247122} {F28247124} {F28247123} {F28247126} {F28247125} {F28247127} [18:23:46] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban: confirm gpu form factor in stat1005 - https://phabricator.wikimedia.org/T216528 (10Cmjohnson) There appears to be power already connected to the GPU The dimensions are 12"L 4" Width 2" Depth. The pictures have the measurements as well [18:31:29] 10Operations, 10Elasticsearch, 10Discovery-Search (Current work): Test spicerack elasticsearch module - https://phabricator.wikimedia.org/T207920 (10Mathew.onipe) [18:31:55] 10Operations, 10Wikimedia-Logstash, 10Discovery-Search (Current work): upgrade logstash and the logstash elasticsearch cluster to 5.6.14 - https://phabricator.wikimedia.org/T216052 (10Gehel) [18:32:05] 10Operations, 10Discovery-Search, 10Wikimedia-Logstash: upgrade logstash and the logstash elasticsearch cluster to 5.6.14 - https://phabricator.wikimedia.org/T216052 (10Gehel) [18:33:26] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Mathew.onipe) [18:33:37] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 (10GTirloni) 05Open→03Resolved [18:40:16] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Patch-For-Review: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10toddleroux) I'm having trouble logging in using my public key. toddleroux@toddleroux-UX310UA:~/.s... [18:50:42] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:55:00] (03PS1) 10CRusnov: Add dummy password for ganeti readonly user. [labs/private] - 10https://gerrit.wikimedia.org/r/491552 [19:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190219T1900) [19:01:22] (03CR) 10Volans: [C: 03+1] "LGTM" [labs/private] - 10https://gerrit.wikimedia.org/r/491552 (owner: 10CRusnov) [19:04:34] 10Operations, 10Proton: Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) [19:07:04] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Mathew.onipe) Looking at this, It would be nice to know what we should enable for cloudelastic and what not. This will help move cl... [19:07:43] (03PS3) 10CRusnov: Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) [19:08:49] (03CR) 10jerkins-bot: [V: 04-1] Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [19:09:23] !log rebooting cloudvirt1009 to poke around in the bios [19:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:12] (03CR) 10Volans: [C: 04-1] "Nice! Few minor things inline." (035 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel) [19:17:07] (03PS1) 10Arturo Borrero Gonzalez: aptrepo: pull openstack mitaka packages into reprepro [puppet] - 10https://gerrit.wikimedia.org/r/491558 (https://phabricator.wikimedia.org/T216497) [19:17:11] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T194855 (10Cmjohnson) I believe the supposed failed disk was a result of me working inside the server last week and I put it back together quickly. The cables... [19:19:17] (03PS2) 10Gehel: Add remove_on_error parameter to icinga.hosts_downtimed() [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 [19:19:58] (03PS3) 10Gehel: Add remove_on_error parameter to icinga.hosts_downtimed() [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 [19:20:08] (03CR) 10Gehel: Add remove_on_error parameter to icinga.hosts_downtimed() (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel) [19:20:47] (03CR) 10EBernhardson: cloudelastic: Add cloudelastic configs (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [19:22:12] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10EBernhardson) Added a review. If opening up kafka access is problematic there is no hard requirement to read kafka on these machin... [19:28:27] (03CR) 10Cwhite: [C: 03+1] nagios_common::commands: Remove support for trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/491460 (owner: 10Muehlenhoff) [19:30:05] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thumbor2002.codfw.wmnet'] ` Of which those **FAILED**: ` ['thumbor2002.codfw.wmnet'] ` [19:32:28] (03CR) 10Zhuyifei1999: [C: 03+1] "LGTM. Shall I build + deploy this?" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis) [19:33:46] (03PS1) 10GTirloni: Revert "wiki replicas: depool labsdb1011 for changes" [puppet] - 10https://gerrit.wikimedia.org/r/491560 (https://phabricator.wikimedia.org/T216481) [19:34:24] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Ottomata) > If opening up kafka access is problematic It's going to be problematic! :) [19:34:54] (03CR) 10GTirloni: [C: 03+2] Revert "wiki replicas: depool labsdb1011 for changes" [puppet] - 10https://gerrit.wikimedia.org/r/491560 (https://phabricator.wikimedia.org/T216481) (owner: 10GTirloni) [19:34:59] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Ottomata) Oh, maybe it isn't...If these are nodes in production networks then it could be fine. [19:39:12] !log re-pooled labsdb1011 T216481 [19:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:15] T216481: Remove views on ep_* tables on the wikireplicas hosts - https://phabricator.wikimedia.org/T216481 [19:39:22] RECOVERY - Check systemd state on cloudvirt1024 is OK: OK - running: The system is fully operational [19:40:39] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10EBernhardson) >>! In T214921#4965991, @Ottomata wrote: > Oh, maybe it isn't...If these are nodes in production networks then it cou... [19:42:47] (03PS1) 10Paladox: Upgrade zuul-status plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/491562 [19:43:06] (03CR) 10Paladox: [V: 03+2 C: 03+2] Upgrade zuul-status plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/491562 (owner: 10Paladox) [19:43:50] (03PS1) 10Paladox: Update wikimedia plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/491563 [19:44:14] (03CR) 10Paladox: [V: 03+2 C: 03+2] Update wikimedia plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/491563 (owner: 10Paladox) [19:45:30] RECOVERY - MegaRAID on cloudvirt1024 is OK: OK: optimal, 1 logical, 8 physical [19:48:39] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Ottomata) Ah, you're still going to have problems then. In {T207321} we were told that the network hole for the replicas was bad a... [19:49:20] !log thcipriani@deploy1001 Pruned MediaWiki: 1.33.0-wmf.13 (duration: 11m 52s) [19:49:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:07] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Ottomata) > The servers live in the production network and have a port opened up to the cloud network somehow. Actually, this is e... [19:53:01] (03PS1) 10Hashar: build: on CI only lint changed files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491564 [19:53:13] 10Operations, 10Analytics, 10Wikimedia-Stream, 10Services (watching): Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Ottomata) Hm, not sure why EventStreams is requiring node-rdkafka@2.5.1. EventStreams itself doesn't require node-rdkafka, its KafkaSSE dependency does. [[ h... [19:57:04] !log restarting ci-jenkins for plugin update [19:57:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:11] 10Operations, 10Analytics, 10Wikimedia-Stream, 10Services (watching): Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Pchelolo) > KafkaSSE requires ^2.3.4. 2.5.1 satisfies `^2.3.4` :) I think we should lock the node-rdkafka dependency either by removing the `^` or by adding a p... [20:00:04] thcipriani: Dear deployers, time to do the MediaWiki train - Americas version deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190219T2000). [20:00:52] * thcipriani does train [20:01:40] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-upgrade [20:01:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:14] !log thcipriani@deploy1001 Started scap: testwiki to php-1.33.0-wmf.18 and rebuild l10n cache [20:04:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:51] (03CR) 10Volans: [C: 03+1] "LGTM, thanks for the patch!" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel) [20:07:23] !log gehel@cumin1001 END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0) [20:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:11] (03PS6) 10Joal: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) [20:11:53] 10Operations, 10Multimedia, 10Performance-Team, 10Thumbor, 10serviceops: Deploy 3d2png to thumbor servers (stretch) - https://phabricator.wikimedia.org/T216494 (10jijiki) p:05Normal→03High [20:12:04] (03CR) 10jerkins-bot: [V: 04-1] Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [20:12:46] Is this the proper channel for network related issues? [20:13:21] (03PS7) 10Joal: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) [20:14:26] (03CR) 10jerkins-bot: [V: 04-1] Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [20:14:29] stp121: https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue [20:16:16] (03PS8) 10Joal: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) [20:21:10] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10Gilles) [20:21:33] (03CR) 10Mobrovac: Introduce citoid helm chart (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [20:22:04] p858snake|L: I'm having a connectivity issue to eqiad. All 3 other sites work fine from my network. [20:28:41] 10Operations, 10Analytics, 10Wikimedia-Stream, 10Services (watching): Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Ottomata) Can we do a package-lock in the EventStreams repo? [20:28:45] (03CR) 10Joal: "Using a batch file instead of in-lined command" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [20:30:38] PROBLEM - High CPU load on API appserver on mw1233 is CRITICAL: CRITICAL - load average: 48.18, 23.58, 15.42 [20:30:56] 10Operations, 10Analytics, 10Wikimedia-Stream, 10Services (watching): Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Pchelolo) It's still undecided what to do with package-lock (T179229), so maybe let's just freeze the verison? [20:31:46] RECOVERY - High CPU load on API appserver on mw1233 is OK: OK - load average: 23.35, 21.34, 15.24 [20:34:09] (03PS1) 10Hashar: contint: phaseout android slave [puppet] - 10https://gerrit.wikimedia.org/r/491574 (https://phabricator.wikimedia.org/T198495) [20:34:44] !log thcipriani@deploy1001 Finished scap: testwiki to php-1.33.0-wmf.18 and rebuild l10n cache (duration: 30m 31s) [20:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:57] (03CR) 10Hashar: "Spotted by Mholloway on https://gerrit.wikimedia.org/r/#/c/integration/config/+/490492/ ;-)" [puppet] - 10https://gerrit.wikimedia.org/r/491574 (https://phabricator.wikimedia.org/T198495) (owner: 10Hashar) [20:35:50] PROBLEM - puppet last run on thumbor2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 19 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[3d2png/deploy] [20:38:18] PROBLEM - MariaDB Slave Lag: s5 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 353.62 seconds [20:39:45] (03PS1) 10Thcipriani: Group0 to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491578 [20:42:44] (03CR) 10Thcipriani: [C: 03+2] Group0 to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491578 (owner: 10Thcipriani) [20:43:48] (03Merged) 10jenkins-bot: Group0 to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491578 (owner: 10Thcipriani) [20:48:04] (03CR) 10jenkins-bot: Group0 to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491578 (owner: 10Thcipriani) [20:49:09] !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.18 [20:49:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:49:14] (03PS4) 10CRusnov: Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) [20:49:48] (03CR) 10jerkins-bot: [V: 04-1] Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [20:51:24] PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[cdh::hadoop::directory /user/spark] [20:54:36] (03CR) 10Mholloway: [C: 03+1] contint: phaseout android slave [puppet] - 10https://gerrit.wikimedia.org/r/491574 (https://phabricator.wikimedia.org/T198495) (owner: 10Hashar) [20:58:20] (03CR) 10Umherirrender: [C: 03+1] build: on CI only lint changed files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491564 (owner: 10Hashar) [21:06:42] 10Operations, 10Multimedia, 10Thumbor, 10serviceops, 10Performance-Team (Radar): Deploy 3d2png to thumbor servers (stretch) - https://phabricator.wikimedia.org/T216494 (10kchapman) [21:08:47] PROBLEM - ensure kvm processes are running on cloudvirt1009 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 [21:13:20] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10Krinkle) a:03jcrespo @jcrespo I believe the issue is resolved, but leaving it open for you to confirm and/or test as you wish. F... [21:13:39] RECOVERY - ensure kvm processes are running on cloudvirt1009 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 [21:19:22] RECOVERY - MariaDB Slave Lag: s5 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [21:20:04] 10Operations, 10Patch-For-Review: rack/setup/install logstash101[012].eqiad.wmnet - https://phabricator.wikimedia.org/T214608 (10herron) `logstash101[0-2]` have been added to the logging eqiad elasticsearch cluster, and data is now being relocated from the old `logstash100[4-6]` hosts onto `logstash101[0-2]`.... [21:26:24] 10Operations, 10Analytics, 10Wikimedia-Stream, 10Services (watching): Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10mobrovac) +1 on freezing the version in package.json in this instance, as this is what we really need. [21:35:03] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10mobrovac) It looks like Chromium is trying to write some PulseAudio config. @MSantos I'd say to first try to upgrade to t... [21:47:41] 10Operations, 10Traffic, 10Performance-Team (Radar): Determine cause of upload.wikimedia.org requests routed to text-lb (404 Not Found) - https://phabricator.wikimedia.org/T207340 (10kchapman) [21:51:28] 10Operations, 10MediaWiki-Debug-Logger, 10Performance-Team: Set up request profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 (10Krinkle) >>! In T176916#4964977, @Joe wrote: > I did manually install php7.2-tideways-xhprof on `mwdebug1001` and I now see the following error: > > ` > Fatal error:... [21:51:37] 10Operations, 10MediaWiki-Debug-Logger, 10Performance-Team: Set up request debug profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 (10Krinkle) [21:54:12] 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle) [22:05:07] 10Operations, 10Maps, 10Reading-Infrastructure-Team-Backlog, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Create Debian packages for Node.js 8 upgrade for Maps - https://phabricator.wikimedia.org/T216521 (10mobrovac) [22:27:23] hi AaronSchulz - would you have a couple of minutes to review https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Renameuser/+/491400/ for jobqueue, etc. issues? Thanks! [22:30:32] 10Operations, 10Scoring-platform-team, 10Release-Engineering-Team (Watching / External): Contact number of some WMDE staff should be avalible to SRE/RelEng - https://phabricator.wikimedia.org/T210721 (10Halfak) p:05High→03Triage [22:35:17] 10Operations, 10ORES, 10Scoring-platform-team: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331 (10Halfak) p:05Low→03High [22:36:33] 10Operations, 10ORES, 10Scoring-platform-team: Investigate memory usage of ORES in kubernetes - https://phabricator.wikimedia.org/T210264 (10Halfak) [22:42:05] 10Operations, 10ORES, 10Scoring-platform-team: Build helm charts for ORES - https://phabricator.wikimedia.org/T210269 (10Halfak) Is this something that will be part of the common infrastructure for services in kubernetes? [22:42:10] 10Operations, 10ORES, 10Scoring-platform-team: Build helm charts for ORES - https://phabricator.wikimedia.org/T210269 (10Halfak) [22:48:19] 10Operations, 10ORES, 10Scoring-platform-team, 10Performance: Stress test ORES/kubernetes (above 4.5k scores/second) - https://phabricator.wikimedia.org/T214054 (10Halfak) p:05Low→03High [22:51:05] 10Operations, 10ops-eqsin, 10Traffic: Degraded RAID on cp5010 - https://phabricator.wikimedia.org/T214274 (10RobH) Ok, there has been multiple back and forth on this via email with both Dell SG and DHL. We've advised DHL of the SG3 inbound shipment ticket to refer to when attempting to deliver this package.... [23:01:49] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10bd808) >>! In T214921#4966002, @EBernhardson wrote: >>>! In T214921#4965991, @Ottomata wrote: >> Oh, maybe it isn't...If these are... [23:02:41] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10EBernhardson) I was under the impression we would do the same as relforge (T142211) which is in prod and accessible from labs. Look... [23:08:03] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10bcampbell) @jrbs Okay, so I think I figured this out... trustandsafety is currently an alias for the Google Group susa@wikimedia.org. tsops@ is a Google Group that was... [23:09:00] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10EBernhardson) >>! In T214921#4967027, @bd808 wrote: >>>! In T214921#4966002, @EBernhardson wrote: >>>>! In T214921#4965991, @Ottoma... [23:16:25] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10jrbs) Whew. Thanks for unravelling that! I think that is a remnant of a move we were intending to make internally (i.e. having one email for Ops and another for Policy,... [23:16:40] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to tsops@ - https://phabricator.wikimedia.org/T210464 (10jrbs) [23:20:51] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:37:17] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10EBernhardson) Actually, looking back in otto's task, this was said by one of our network engineers:: >>! In T207321#4882980, @ayou... [23:50:52] !log temporarly stop ferm on relforge1001 to test where a connection is being blocked [23:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:25] !log restarted ferm on relforge1001 [23:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:54:17] (03PS5) 10CRusnov: Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229)