[00:02:01] <icinga-wm>	 PROBLEM - snapshot of s4 in codfw on db1115 is CRITICAL: snapshot for s4 at codfw taken more than 4 days ago: Most recent backup 2019-10-03 23:32:11 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[00:33:23] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
[00:33:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:54:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T226715 (10Papaul) ` papaul@asw2-a-eqiad# show | compare  [edit interfaces] -   ge-3/0/16 { -       description "restbase1010 1G"; -   } -   ge-3/0/17 { -       description "restb...
[00:55:40] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T226715 (10Papaul)
[00:58:34] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T226715 (10Papaul)
[02:10:36] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2098 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 948.43 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[02:38:24] <icinga-wm>	 PROBLEM - snapshot of s5 in eqiad on db1115 is CRITICAL: snapshot for s5 at eqiad taken more than 4 days ago: Most recent backup 2019-10-04 02:11:25 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[02:55:10] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack
[02:57:56] <icinga-wm>	 ACKNOWLEDGEMENT - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project andrew bogott Im investigating https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack
[03:03:59] <andrewbogott>	 !log restarted nova-conductor on cloudcontrol1003 and cloudcontrol1004 — experimental band-aid for T234876
[03:04:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:04:03] <stashbot>	 T234876: nova-conductor running out of mysql connections - https://phabricator.wikimedia.org/T234876
[03:06:30] <icinga-wm>	 RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 0 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack
[03:17:29] <wikibugs>	 (03PS1) 10Andrew Bogott: nova: try to reduce the number of db connections [puppet] - 10https://gerrit.wikimedia.org/r/541407 (https://phabricator.wikimedia.org/T234876)
[03:21:38] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] "I don't know if this is a good idea, but at least the patch does what I intended." [puppet] - 10https://gerrit.wikimedia.org/r/541407 (https://phabricator.wikimedia.org/T234876) (owner: 10Andrew Bogott)
[03:21:53] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T226715 (10Papaul)
[03:22:24] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T226715 (10Papaul) ` papaul@asw2-a-eqiad# show | compare  [edit interfaces] -   ge-4/0/30 { -       description restbase1007; -   }
[03:23:24] <wikibugs>	 (03CR) 10Mathew.onipe: query_service: prepare query_service for reusbility (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[03:25:11] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission: Decommission rhenium - https://phabricator.wikimedia.org/T224268 (10Papaul) ` papaul@asw2-a-eqiad# show | compare  [edit interfaces] -   ge-4/0/17 { -       description rhenium; -   }
[03:25:29] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission: Decommission rhenium - https://phabricator.wikimedia.org/T224268 (10Papaul)
[03:28:42] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission lithium - https://phabricator.wikimedia.org/T229557 (10Papaul) ` papaul@asw2-c-eqiad# show | compare  [edit interfaces] -   ge-7/0/34 { -       description lithium; -   }
[03:29:11] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission lithium - https://phabricator.wikimedia.org/T229557 (10Papaul)
[03:31:12] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10decommission: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) ` papaul@asw2-c-eqiad# show | compare  [edit interfaces] -   ge-7/0/4 { -       description dbproxy1009; -   }
[03:31:50] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10decommission: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul)
[03:34:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2098 is OK: OK slave_sql_lag Replication lag: 0.45 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[03:34:48] <wikibugs>	 (03PS13) 10Mathew.onipe: query_service: rename wdqs module to query_service [puppet] - 10https://gerrit.wikimedia.org/r/538572 (https://phabricator.wikimedia.org/T232297)
[03:34:50] <wikibugs>	 (03PS17) 10Mathew.onipe: query_service: prepare query_service for reusbility [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297)
[03:34:52] <wikibugs>	 (03PS14) 10Mathew.onipe: query_service: rename profile/wdqs to profile/query_service [puppet] - 10https://gerrit.wikimedia.org/r/538849 (https://phabricator.wikimedia.org/T232297)
[03:34:54] <wikibugs>	 (03PS9) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297)
[03:34:56] <wikibugs>	 (03PS9) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[03:34:58] <wikibugs>	 (03PS9) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[03:36:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] query_service: prepare query_service for reusbility [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[03:38:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[03:40:06] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, 10decommission: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10Papaul) No switch port reference for kafka1014 and kafka1022 on asw2-c-eqiad  or asw-c-eqaid
[03:41:24] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, 10decommission: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10Papaul)
[03:46:17] <wikibugs>	 (03PS18) 10Mathew.onipe: query_service: prepare query_service for reusbility [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297)
[03:46:19] <wikibugs>	 (03PS15) 10Mathew.onipe: query_service: rename profile/wdqs to profile/query_service [puppet] - 10https://gerrit.wikimedia.org/r/538849 (https://phabricator.wikimedia.org/T232297)
[03:46:21] <wikibugs>	 (03PS10) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297)
[03:46:23] <wikibugs>	 (03PS10) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[03:46:25] <wikibugs>	 (03PS10) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[03:51:35] <wikibugs>	 (03CR) 10Mathew.onipe: "> Patch Set 12: Code-Review+1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538572 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[03:59:04] <wikibugs>	 (03PS19) 10Mathew.onipe: query_service: prepare query_service for reusbility [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297)
[03:59:06] <wikibugs>	 (03PS16) 10Mathew.onipe: query_service: rename profile/wdqs to profile/query_service [puppet] - 10https://gerrit.wikimedia.org/r/538849 (https://phabricator.wikimedia.org/T232297)
[03:59:08] <wikibugs>	 (03PS11) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297)
[03:59:10] <wikibugs>	 (03PS11) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[03:59:12] <wikibugs>	 (03PS11) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[04:23:54] <icinga-wm>	 PROBLEM - snapshot of s7 in eqiad on db1115 is CRITICAL: snapshot for s7 at eqiad taken more than 4 days ago: Most recent backup 2019-10-04 04:11:37 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[05:00:33] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, 10DC-Ops: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted - https://phabricator.wikimedia.org/T232069 (10elukey) 05Open→03Resolved >>! In T232069#5553714, @wiki_willy wrote: > Thanks @elukey .  Should we ignore/reso...
[05:03:21] <wikibugs>	 (03PS3) 10Marostegui: wikireplica_analytics: Change query killer from 4h to 1h [puppet] - 10https://gerrit.wikimedia.org/r/541257 (https://phabricator.wikimedia.org/T233986)
[05:07:44] <marostegui>	 !log Deploy schema change on db1097:3315 - T233625
[05:07:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:07:54] <stashbot>	 T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625
[05:08:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9252 and previous config saved to /var/cache/conftool/dbconfig/20191008-050833-marostegui.json
[05:08:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:09:11] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic: cp1085 - IPMI not working - https://phabricator.wikimedia.org/T231525 (10Vgutierrez) we can depool it just before shutting it down, just let us know when you want to do it
[05:09:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] "As per this comment T233986#5553960 from bd808 I am going to merge this" [puppet] - 10https://gerrit.wikimedia.org/r/541257 (https://phabricator.wikimedia.org/T233986) (owner: 10Marostegui)
[05:10:35] <marostegui>	 !log Reload query killer on labsdb1011
[05:10:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:14:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P9253 and previous config saved to /var/cache/conftool/dbconfig/20191008-051435-marostegui.json
[05:14:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:23:45] <wikibugs>	 (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/541411 (https://phabricator.wikimedia.org/T233986)
[05:24:33] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/541411 (https://phabricator.wikimedia.org/T233986) (owner: 10Marostegui)
[05:25:33] <marostegui>	 !log Depool labsdb1011 for mysql upgrade
[05:25:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:30:26] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1010 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[05:30:38] <marostegui>	 ^ expected
[05:31:48] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1018 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[05:31:54] <marostegui>	 ^ expected too
[05:33:38] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1010 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[05:35:20] <elukey>	 !log drop CitationUsage tables from the log database on db1107/db1108 (the ones listed in the task) - T233893
[05:35:25] <elukey>	 marostegui: --^
[05:35:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:35:25] <elukey>	 o/
[05:35:26] <stashbot>	 T233893: drop CitatitionUsage data on mysql  - https://phabricator.wikimedia.org/T233893
[05:35:30] <marostegui>	 elukey: <3!!!!
[05:35:47] <wikibugs>	 (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/541412
[05:40:32] <icinga-wm>	 PROBLEM - Check systemd state on db1108 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:41:26] <icinga-wm>	 PROBLEM - eventlogging_sync processes on db1108 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /bin/bash /usr/local/bin/eventlogging_sync.sh https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging
[05:41:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1082 db1081 db1080 db1079 db1075 db1074 for PDU maintenance T227138', diff saved to https://phabricator.wikimedia.org/P9254 and previous config saved to /var/cache/conftool/dbconfig/20191008-054127-marostegui.json
[05:41:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:41:32] <stashbot>	 T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138
[05:41:44] <marostegui>	 elukey: I guess db1108 is also you? ^
[05:42:45] <elukey>	 marostegui: checking
[05:43:35] <elukey>	 yep my bad sorry
[05:43:46] <icinga-wm>	 RECOVERY - Check systemd state on db1108 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:43:53] <marostegui>	 no worries!
[05:43:57] <marostegui>	 thanks for fixing it
[05:44:02] <elukey>	 !log drop PageCreation_7481635 table from the log db on db1107/db1108 - T233892
[05:44:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:44:06] <stashbot>	 T233892: Drop page create event data on mysql - https://phabricator.wikimedia.org/T233892
[05:44:40] <icinga-wm>	 RECOVERY - eventlogging_sync processes on db1108 is OK: PROCS OK: 1 process with UID = 0 (root), args /bin/bash /usr/local/bin/eventlogging_sync.sh https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging
[05:48:18] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission
[05:48:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:48:23] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Remove references to db2058 [puppet] - 10https://gerrit.wikimedia.org/r/541413 (https://phabricator.wikimedia.org/T229543)
[05:48:30] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[05:48:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:48:36] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2058.codfw.wmnet - https://phabricator.wikimedia.org/T229543 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db2058.codfw.wmnet` -  db2058.codfw.wmnet (**PASS**)...
[05:49:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] site.pp: Remove references to db2058 [puppet] - 10https://gerrit.wikimedia.org/r/541413 (https://phabricator.wikimedia.org/T229543) (owner: 10Marostegui)
[05:49:59] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Remove db2058 production DNS [dns] - 10https://gerrit.wikimedia.org/r/541414 (https://phabricator.wikimedia.org/T229543)
[05:50:58] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wmnet: Remove db2058 production DNS [dns] - 10https://gerrit.wikimedia.org/r/541414 (https://phabricator.wikimedia.org/T229543) (owner: 10Marostegui)
[05:51:06] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1018 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[05:51:09] <wikibugs>	 10Operations, 10MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), 10Patch-For-Review, 10User-Ladsgroup, and 2 others: Create Wikisource Hindi - https://phabricator.wikimedia.org/T218155 (10Ajit_Kumar_Tiwari) @Dcljr: no pages will be overwritten when importing is done because all are new. We are already keeping th...
[05:51:41] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2058.codfw.wmnet - https://phabricator.wikimedia.org/T229543 (10Marostegui) a:05RobH→03Papaul
[05:52:01] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2058.codfw.wmnet - https://phabricator.wikimedia.org/T229543 (10Marostegui) Host ready final decommissioning steps + switch disablement
[06:07:47] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/541412 (owner: 10Marostegui)
[06:07:59] <wikibugs>	 (03PS2) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/541412
[06:09:01] <marostegui>	 !log Repool labsdb1011 for mysql upgrade
[06:09:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:14:53] <wikibugs>	 10Operations, 10netops: BGP sessions down on cr2-esams - https://phabricator.wikimedia.org/T232617 (10elukey) 05Resolved→03Open ` elukey@re0.cr2-esams> show bgp summary | match 12871 80.249.208.32         12871          0          0       0       1     1w3d20h Active 2001:7f8:1::a501:2871:2       12871...
[06:18:44] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10wikitech.wikimedia.org, and 3 others: switch wikitech to PHP 7.2 - https://phabricator.wikimedia.org/T223393 (10Joe) >>! In T223393#5553766, @Jdforrester-WMF wrote: > If this isn't done before tomorrow, the train rollout will break wikitechwiki...
[06:22:36] <wikibugs>	 (03PS2) 10Elukey: reportupdater:manifests:job.pp: fix typo in config-file param [puppet] - 10https://gerrit.wikimedia.org/r/541324 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns)
[06:22:54] <wikibugs>	 (03PS3) 10Elukey: reportupdater: fix typo in config-file param [puppet] - 10https://gerrit.wikimedia.org/r/541324 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns)
[06:23:01] <wikibugs>	 (03PS4) 10Elukey: reportupdater: fix typo in config-file param [puppet] - 10https://gerrit.wikimedia.org/r/541324 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns)
[06:23:12] <icinga-wm>	 RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:35:47] <wikibugs>	 (03CR) 10Elukey: "This seems safe to merge: https://puppet-compiler.wmflabs.org/compiler1002/18774/" [puppet] - 10https://gerrit.wikimedia.org/r/541324 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns)
[06:48:45] <marostegui>	 !log Stop MySQL on es011 db1082 db1081 db1080 db1079 db1075 db1074 (replication lag will appear on labs for s5) for on-site maintenance T227138
[06:48:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:48:49] <wikibugs>	 (03PS1) 10DCausse: [cirrus] drop support for HHVM connection pooling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541425
[06:48:49] <stashbot>	 T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138
[07:00:56] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138 (10Marostegui) @Cmjohnson the following hosts are good to go: db1082 db1081 db1080 db1079 db1075 db1074 es1011 Please note: - db1074 has been powered off as it has a broken PSU, so p...
[07:04:04] <icinga-wm>	 PROBLEM - Check systemd state on stat1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:06:08] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: admin: Add view clusterrolebinding [deployment-charts] - 10https://gerrit.wikimedia.org/r/541426
[07:10:33] <moritzm>	 !log draining ganeti1002 for upcoming reboot (combined kernel/qemu security updates)
[07:10:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:11:49] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "+1, but I am curious, how is the dev tag updated?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/541371 (https://phabricator.wikimedia.org/T234578) (owner: 10Jeena Huneidi)
[07:13:15] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Update wikifeeds chart to 0.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/540967 (https://phabricator.wikimedia.org/T170455) (owner: 10Mholloway)
[07:15:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P9255 and previous config saved to /var/cache/conftool/dbconfig/20191008-071551-marostegui.json
[07:15:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:17:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/540832 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey)
[07:19:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P9256 and previous config saved to /var/cache/conftool/dbconfig/20191008-071859-marostegui.json
[07:19:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:20:32] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: admin: Add view clusterrolebinding [deployment-charts] - 10https://gerrit.wikimedia.org/r/541426
[07:21:11] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: admin: Add view clusterrolebinding [deployment-charts] - 10https://gerrit.wikimedia.org/r/541426 (https://phabricator.wikimedia.org/T207200)
[07:23:20] <wikibugs>	 (03PS6) 10Elukey: profile::kerberos::kdc: add support for bacula backups [puppet] - 10https://gerrit.wikimedia.org/r/540832 (https://phabricator.wikimedia.org/T226089)
[07:25:29] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::kerberos::kdc: add support for bacula backups [puppet] - 10https://gerrit.wikimedia.org/r/540832 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey)
[07:29:04] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[07:29:05] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[07:29:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:29:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:51] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[07:30:52] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[07:30:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:35:46] <wikibugs>	 (03CR) 10Mobrovac: [C: 03+1] Update wikifeeds chart to 0.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/540967 (https://phabricator.wikimedia.org/T170455) (owner: 10Mholloway)
[07:38:58] <wikibugs>	 10Operations, 10Traffic: ATS-tls nodes on the text cluster have a slightly higher rate of failed fetches on varnish-fe - https://phabricator.wikimedia.org/T234887 (10Vgutierrez)
[07:39:18] <wikibugs>	 10Operations, 10Traffic: ATS-tls nodes on the text cluster have a slightly higher rate of failed fetches on varnish-fe - https://phabricator.wikimedia.org/T234887 (10Vgutierrez) p:05Triage→03Normal
[07:39:26] <icinga-wm>	 PROBLEM - k8s API server requests latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb={GET,PATCH} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[07:39:44] <icinga-wm>	 PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation=get https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[07:41:02] <icinga-wm>	 RECOVERY - k8s API server requests latencies on neon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[07:41:20] <icinga-wm>	 RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[07:44:28] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Remove tmpreaper from mediawiki servers [puppet] - 10https://gerrit.wikimedia.org/r/538884 (https://phabricator.wikimedia.org/T151304) (owner: 10Muehlenhoff)
[07:44:37] <wikibugs>	 (03PS3) 10Effie Mouzeli: Remove tmpreaper from mediawiki servers [puppet] - 10https://gerrit.wikimedia.org/r/538884 (https://phabricator.wikimedia.org/T151304) (owner: 10Muehlenhoff)
[07:49:25] <akosiaris>	 !log update OTRS to 5.0.38
[07:49:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:51:14] <moritzm>	 !log draining ganeti1003 for upcoming reboot (combined kernel/qemu security updates)
[07:51:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:10] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "Much cleaner! Thanks! A few more comments inline." (0310 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/540153 (https://phabricator.wikimedia.org/T230588) (owner: 10Mathew.onipe)
[07:58:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: Add view clusterrolebinding [deployment-charts] - 10https://gerrit.wikimedia.org/r/541426 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris)
[07:58:48] <wikibugs>	 (03Merged) 10jenkins-bot: admin: Add view clusterrolebinding [deployment-charts] - 10https://gerrit.wikimedia.org/r/541426 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris)
[08:03:37] <icinga-wm>	 ACKNOWLEDGEMENT - snapshot of s4 in codfw on db1115 is CRITICAL: snapshot for s4 at codfw taken more than 4 days ago: Most recent backup 2019-10-03 23:32:11 Jcrespo rerunning backups/prepare - The acknowledgement expires at: 2019-10-09 08:03:07. https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[08:03:37] <icinga-wm>	 ACKNOWLEDGEMENT - snapshot of s5 in eqiad on db1115 is CRITICAL: snapshot for s5 at eqiad taken more than 4 days ago: Most recent backup 2019-10-04 02:11:25 Jcrespo rerunning backups/prepare - The acknowledgement expires at: 2019-10-09 08:03:07. https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[08:03:37] <icinga-wm>	 ACKNOWLEDGEMENT - snapshot of s7 in eqiad on db1115 is CRITICAL: snapshot for s7 at eqiad taken more than 4 days ago: Most recent backup 2019-10-04 04:11:37 Jcrespo rerunning backups/prepare - The acknowledgement expires at: 2019-10-09 08:03:07. https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[08:05:09] <logmsgbot>	 !log akosiaris@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:05:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:07:53] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki::packages: remove packages for math rendering [puppet] - 10https://gerrit.wikimedia.org/r/540154 (https://phabricator.wikimedia.org/T195847) (owner: 10Giuseppe Lavagetto)
[08:07:55] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: admin: Fix typo with Group definition [deployment-charts] - 10https://gerrit.wikimedia.org/r/541510
[08:08:07] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: mediawiki::packages: remove packages for math rendering [puppet] - 10https://gerrit.wikimedia.org/r/540154 (https://phabricator.wikimedia.org/T195847)
[08:08:20] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: Fix typo with Group definition [deployment-charts] - 10https://gerrit.wikimedia.org/r/541510 (owner: 10Alexandros Kosiaris)
[08:08:32] <wikibugs>	 (03Merged) 10jenkins-bot: admin: Fix typo with Group definition [deployment-charts] - 10https://gerrit.wikimedia.org/r/541510 (owner: 10Alexandros Kosiaris)
[08:09:48] <logmsgbot>	 !log akosiaris@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:09:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:05] <logmsgbot>	 !log akosiaris@ helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:10:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:25] <logmsgbot>	 !log akosiaris@ helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:10:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:46] <wikibugs>	 (03PS1) 10Jon Harald Søby: Enable more transwiki import sources for hiwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541511 (https://phabricator.wikimedia.org/T234892)
[08:31:05] <wikibugs>	 (03PS1) 10Elukey: role::druid::analytics::worker: increase query timeout to 10s [puppet] - 10https://gerrit.wikimedia.org/r/541512 (https://phabricator.wikimedia.org/T234684)
[08:31:38] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::druid::analytics::worker: increase query timeout to 10s [puppet] - 10https://gerrit.wikimedia.org/r/541512 (https://phabricator.wikimedia.org/T234684) (owner: 10Elukey)
[08:33:08] <elukey>	 !log roll restart druid historicals and brokers on druid100[1-3] to pick up new settings - T234684
[08:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:12] <stashbot>	 T234684: Superset not able to load a  reading dashboard  - https://phabricator.wikimedia.org/T234684
[08:38:05] <logmsgbot>	 !log mobrovac@deploy1001 Started deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging
[08:38:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:21] <wikibugs>	 10Operations, 10Traffic: ATS-tls nodes on the text cluster have a slightly higher rate of failed fetches on varnish-fe - https://phabricator.wikimedia.org/T234887 (10Vgutierrez) It looks to me like this is some kind of timeout issue with POST requests, checking the output of `varnishlog -n frontend -q "FetchEr...
[08:45:14] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2018 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[08:45:44] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most read articles for January 1, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[08:45:52] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[08:45:54] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[08:46:02] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2015 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[08:46:09] <logmsgbot>	 !log mobrovac@deploy1001 Finished deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging (duration: 08m 05s)
[08:46:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:48] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[08:47:20] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[08:47:24] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[08:47:26] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[08:47:36] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2015 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[08:53:42] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "PCC in error: https://puppet-compiler.wmflabs.org/compiler1001/18775/wdqs1004.eqiad.wmnet/change.wdqs1004.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[08:57:15] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[08:57:16] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[08:57:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:58:31] <wikibugs>	 10Operations, 10Traffic: ATS-tls nodes on the text cluster have a slightly higher rate of failed fetches on varnish-fe - https://phabricator.wikimedia.org/T234887 (10Vgutierrez) on the ATS side it doesn't look like there is any timeout set to 60 seconds though: `vgutierrez@cp4027:~$ sudo -i traffic_ctl --run-r...
[09:05:30] <wikibugs>	 10Operations, 10ops-eqiad: Move YHSM from auth1001 to auth1002 - https://phabricator.wikimedia.org/T233821 (10MoritzMuehlenhoff) I see in dmesg that it got removed from auth1001, but I don't see it in the logs for auth1002, is the USB slot in question maybe inactive? Could you try moving it to a different slot?
[09:06:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9257 and previous config saved to /var/cache/conftool/dbconfig/20191008-090616-marostegui.json
[09:06:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:06:23] <stashbot>	 T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625
[09:09:39] <moritzm>	 !log draining ganeti1004 for upcoming reboot (combined kernel/qemu security updates)
[09:09:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:20] <marostegui>	 !log Compress logging table on db2088:3312 for idwiki,plwiki,ptwiki,zhwiki
[09:20:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:05] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: ORES: Make redis AOF configurable [puppet] - 10https://gerrit.wikimedia.org/r/540912 (https://phabricator.wikimedia.org/T233831)
[09:26:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P9258 and previous config saved to /var/cache/conftool/dbconfig/20191008-092627-marostegui.json
[09:26:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:22] <icinga-wm>	 RECOVERY - snapshot of s7 in eqiad on db1115 is OK: snapshot for s7 at eqiad taken less than 4 days ago and larger than 90 GB: Last one 2019-10-08 07:55:13 from db1116.eqiad.wmnet:3317 (866 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[09:31:59] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] ORES: Make redis AOF configurable [puppet] - 10https://gerrit.wikimedia.org/r/540912 (https://phabricator.wikimedia.org/T233831) (owner: 10Alexandros Kosiaris)
[09:33:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P9259 and previous config saved to /var/cache/conftool/dbconfig/20191008-093309-marostegui.json
[09:33:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:59] <wikibugs>	 (03PS1) 10Jcrespo: bacula: Change pool/storage names for new bacula director [puppet] - 10https://gerrit.wikimedia.org/r/541517 (https://phabricator.wikimedia.org/T229209)
[09:36:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] bacula: Change pool/storage names for new bacula director [puppet] - 10https://gerrit.wikimedia.org/r/541517 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo)
[09:40:16] <wikibugs>	 (03PS2) 10Jcrespo: bacula: Change pool/storage names for new bacula director [puppet] - 10https://gerrit.wikimedia.org/r/541517 (https://phabricator.wikimedia.org/T229209)
[09:44:42] <wikibugs>	 (03PS3) 10Jcrespo: bacula: Change pool/storage names for new bacula director [puppet] - 10https://gerrit.wikimedia.org/r/541517 (https://phabricator.wikimedia.org/T229209)
[09:44:53] <wikibugs>	 (03PS4) 10Jcrespo: bacula: Change pool/storage names for new bacula director [puppet] - 10https://gerrit.wikimedia.org/r/541517 (https://phabricator.wikimedia.org/T229209)
[09:45:16] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] bacula: Change pool/storage names for new bacula director [puppet] - 10https://gerrit.wikimedia.org/r/541517 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo)
[09:46:40] <icinga-wm>	 RECOVERY - snapshot of s5 in eqiad on db1115 is OK: snapshot for s5 at eqiad taken less than 4 days ago and larger than 90 GB: Last one 2019-10-08 08:59:18 from db1102.eqiad.wmnet:3315 (666 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[09:56:41] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] bacula: Change pool/storage names for new bacula director [puppet] - 10https://gerrit.wikimedia.org/r/541517 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo)
[10:05:47] <wikibugs>	 (03CR) 10Muehlenhoff: "Some comments inline (on PS1, PS3 and this PS4) :-)" (035 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/506188 (owner: 10Jbond)
[10:08:46] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[10:08:47] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:08:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:58] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[10:08:58] <logmsgbot>	 !log mobrovac@deploy1001 Started deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ
[10:08:59] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:09:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:07] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: RBAC: Add an api-metrics ClusterRole and binding [deployment-charts] - 10https://gerrit.wikimedia.org/r/541520
[10:13:38] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2005 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most read articles for January 1, 2016) timed out before a response was received: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mob
[10:13:38] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:13:48] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] RBAC: Add an api-metrics ClusterRole and binding [deployment-charts] - 10https://gerrit.wikimedia.org/r/541520 (owner: 10Alexandros Kosiaris)
[10:14:00] <wikibugs>	 (03Merged) 10jenkins-bot: RBAC: Add an api-metrics ClusterRole and binding [deployment-charts] - 10https://gerrit.wikimedia.org/r/541520 (owner: 10Alexandros Kosiaris)
[10:14:32] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most read articles for January 1, 2016) timed out before a response was received: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mob
[10:15:08] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[10:15:08] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:15:30] <logmsgbot>	 !log mobrovac@deploy1001 Finished deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ (duration: 06m 32s)
[10:15:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:38] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: lvs::monitor_service: partial refactoring [puppet] - 10https://gerrit.wikimedia.org/r/541522
[10:16:04] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[10:16:36] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[10:16:38] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:16:46] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[10:16:48] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:16:52] <icinga-wm>	 PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/mobile-html/{title} (Get page content HTML for test page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[10:16:53] <logmsgbot>	 !log akosiaris@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[10:16:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:28] <icinga-wm>	 RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[10:19:10] <moritzm>	 !log draining ganeti1005 for upcoming reboot (combined kernel/qemu security updates)
[10:19:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:26] <icinga-wm>	 PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[10:21:15] <logmsgbot>	 !log akosiaris@ helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[10:21:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:22:04] <icinga-wm>	 RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[10:22:05] <akosiaris>	 argon complaining about etcd requests latencies is probably because of the ganeti moves
[10:22:16] <logmsgbot>	 !log akosiaris@ helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[10:22:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:17] <wikibugs>	 (03PS5) 10Jbond: refactor: Refactor script and use the PyYAML [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/506188
[10:26:09] <wikibugs>	 (03CR) 10Jbond: refactor: Refactor script and use the PyYAML (035 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/506188 (owner: 10Jbond)
[10:30:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" (032 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/506188 (owner: 10Jbond)
[10:41:18] <wikibugs>	 (03CR) 10Jbond: "about to merge however i just a note for history, i noticed this package already has a pyyaml dependency due to debdeploy_updatespec.py" (032 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/506188 (owner: 10Jbond)
[10:41:27] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] refactor: Refactor script and use the PyYAML [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/506188 (owner: 10Jbond)
[10:47:23] <wikibugs>	 (03PS6) 10Jbond: refactor: Refactor script and use the PyYAML [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/506188
[10:47:35] <wikibugs>	 10Operations, 10observability, 10Availability, 10Goal: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10jcrespo)
[10:47:44] <wikibugs>	 10Operations, 10observability, 10Availability, 10Goal: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10jcrespo) p:05Triage→03High
[10:48:27] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] refactor: Refactor script and use the PyYAML [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/506188 (owner: 10Jbond)
[10:49:11] <wikibugs>	 (03PS1) 10Jcrespo: bacula: Force install bacula-director, not a dependency on buster [puppet] - 10https://gerrit.wikimedia.org/r/541523 (https://phabricator.wikimedia.org/T229209)
[10:49:47] <wikibugs>	 (03PS2) 10Jcrespo: bacula: Force install bacula-director, not a dependency on buster [puppet] - 10https://gerrit.wikimedia.org/r/541523 (https://phabricator.wikimedia.org/T229209)
[10:50:17] <wikibugs>	 (03CR) 10Jcrespo: "¯\_(ツ)_/¯" [puppet] - 10https://gerrit.wikimedia.org/r/541523 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo)
[10:54:13] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Match HTTP transaction activity timeout and TTFB timeouts [puppet] - 10https://gerrit.wikimedia.org/r/541524 (https://phabricator.wikimedia.org/T234887)
[10:56:46] <jclark-ctr>	 Starting Pdu swap eqiad A2 in 5 minutes https://phabricator.wikimedia.org/T227138
[10:57:11] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset
[10:57:11] <logmsgbot>	 !log jbond@cumin1001 END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
[10:57:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:19] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset
[10:57:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:50] <jbond42>	 !log testing ipmi reset cookbook.  using the current pass for both old and new so no reset actully occures 
[10:57:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:00] <logmsgbot>	 !log jbond@cumin1001 Updating IPMI password on 1253 hosts - jbond@cumin1001
[10:58:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:30] <logmsgbot>	 !log jbond@cumin1001 END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
[10:58:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:41] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset
[10:58:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:47] <logmsgbot>	 !log jbond@cumin1001 Updating IPMI password on 1253 hosts - jbond@cumin1001
[10:58:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191008T1100).
[11:00:04] <jouncebot>	 Jhs: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:11] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1002/18781/" [puppet] - 10https://gerrit.wikimedia.org/r/541524 (https://phabricator.wikimedia.org/T234887) (owner: 10Vgutierrez)
[11:00:14] <Jhs>	 I'm here!
[11:00:25] <Urbanecm>	 I can SWAT today!
[11:01:29] <Urbanecm>	 Jhs: was T234892 discussed on-wiki?
[11:01:29] <stashbot>	 T234892: Enable more import sources for hiwikisource - https://phabricator.wikimedia.org/T234892
[11:01:44] <Jhs>	 Urbanecm, no
[11:02:08] <Jhs>	 it just doesn't make sense that a new wikisource should not be able to import from oldwikisource, where it used to be located
[11:03:35] <Jhs>	 Urbanecm, come to think of it, maybe a better change would be to add oldwikisource to the generic one for Wikisource, listed at the top of $wgImportSources?
[11:04:23] <Lucas_WMDE>	 o/
[11:05:15] <Urbanecm>	 Jhs: well, makes sense, +2'ing.
[11:05:17] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable more transwiki import sources for hiwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541511 (https://phabricator.wikimedia.org/T234892) (owner: 10Jon Harald Søby)
[11:05:45] <Jhs>	 thanks Urbanecm :)
[11:06:07] <wikibugs>	 (03Merged) 10jenkins-bot: Enable more transwiki import sources for hiwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541511 (https://phabricator.wikimedia.org/T234892) (owner: 10Jon Harald Søby)
[11:08:02] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: fb49404: Enable more transwiki import sources for hiwikisource (T234892) (duration: 00m 55s)
[11:08:02] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Honour 180 secs timeout on backend instances [puppet] - 10https://gerrit.wikimedia.org/r/541525 (https://phabricator.wikimedia.org/T234887)
[11:08:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:08:06] <stashbot>	 T234892: Enable more import sources for hiwikisource - https://phabricator.wikimedia.org/T234892
[11:08:13] <Urbanecm>	 Jhs: done
[11:08:18] <Urbanecm>	 Lucas_WMDE: you want to do your stuff?
[11:09:19] <Jhs>	 thank you Urbanecm 
[11:09:23] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: lvs::monitor_service: partial refactoring [puppet] - 10https://gerrit.wikimedia.org/r/541522
[11:09:27] <Urbanecm>	 yw Jhs 
[11:10:02] <Lucas_WMDE>	 I have nothing to do
[11:10:12] <Lucas_WMDE>	 o/ is just my signal that I’m here and available :)
[11:10:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ATS: Honour 180 secs timeout on backend instances [puppet] - 10https://gerrit.wikimedia.org/r/541525 (https://phabricator.wikimedia.org/T234887) (owner: 10Vgutierrez)
[11:10:17] <Lucas_WMDE>	 (I was a bit late this time, sorry)
[11:10:28] <Lucas_WMDE>	 Urbanecm: feel free to close the SWAT
[11:10:57] <Urbanecm>	 !log EU SWAT done
[11:10:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:00] <Urbanecm>	 thanks Lucas_WMDE 
[11:12:31] <wikibugs>	 (03PS2) 10Vgutierrez: ATS: Honour 180 secs timeout on backend instances [puppet] - 10https://gerrit.wikimedia.org/r/541525 (https://phabricator.wikimedia.org/T234887)
[11:12:41] <wikibugs>	 10Operations, 10DBA, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) I got finally the director running, but sadly it won't start with no devices or clients provisioned, so I created a duplicate of the ones pup...
[11:14:34] <icinga-wm>	 RECOVERY - snapshot of s4 in codfw on db1115 is OK: snapshot for s4 at codfw taken less than 4 days ago and larger than 90 GB: Last one 2019-10-08 09:38:56 from db2099.codfw.wmnet:3314 (1087 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[11:19:15] <wikibugs>	 10Operations, 10serviceops, 10HHVM, 10Performance-Team (Radar): Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki)
[11:22:24] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: lvs::monitor_service: partial refactoring [puppet] - 10https://gerrit.wikimedia.org/r/541522
[11:23:50] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: CloudVPS: use wikimediacloud.org domain for Neutron-related IP addresses [dns] - 10https://gerrit.wikimedia.org/r/541526 (https://phabricator.wikimedia.org/T234836)
[11:26:24] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "pcc seems happy https://puppet-compiler.wmflabs.org/compiler1001/18786/" [puppet] - 10https://gerrit.wikimedia.org/r/541525 (https://phabricator.wikimedia.org/T234887) (owner: 10Vgutierrez)
[11:29:08] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: lvs::monitor_service: partial refactoring [puppet] - 10https://gerrit.wikimedia.org/r/541522
[11:33:46] <wikibugs>	 10Operations, 10serviceops: tmpreaper possible race condition - https://phabricator.wikimedia.org/T151304 (10jijiki) 05Open→03Resolved a:03jijiki I think we can mark this as resolved, tmpreaper will be going away as we are reimaging mediawiki servers
[11:33:50] <wikibugs>	 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10jijiki)
[11:37:58] <wikibugs>	 10Operations, 10serviceops, 10HHVM, 10Performance-Team (Radar): Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki)
[11:38:26] <wikibugs>	 (03Abandoned) 10Vgutierrez: ATS: Honour 180 secs timeout on backend instances [puppet] - 10https://gerrit.wikimedia.org/r/541525 (https://phabricator.wikimedia.org/T234887) (owner: 10Vgutierrez)
[11:39:52] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
[11:39:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:48:47] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install cloudcephmon100[123] - https://phabricator.wikimedia.org/T228102 (10Cmjohnson)
[11:54:19] <wikibugs>	 (03PS2) 10Vgutierrez: ATS: Adjust timeouts in ats-tls and ats-backend instances [puppet] - 10https://gerrit.wikimedia.org/r/541524 (https://phabricator.wikimedia.org/T234887)
[11:54:50] <wikibugs>	 10Operations, 10serviceops: tmpreaper possible race condition - https://phabricator.wikimedia.org/T151304 (10MoritzMuehlenhoff) 05Resolved→03Open See earlier discussion on task, this is still used by Toolforge, so WMCS SREs might still want to tweak the log spam.
[11:54:53] <wikibugs>	 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10MoritzMuehlenhoff)
[11:56:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ATS: Adjust timeouts in ats-tls and ats-backend instances [puppet] - 10https://gerrit.wikimedia.org/r/541524 (https://phabricator.wikimedia.org/T234887) (owner: 10Vgutierrez)
[11:56:40] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Legacy (Watching / External), and 3 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10dr0ptp4kt) @Milimetric the visual treatment depends on a few factors, although yes, I think we'll want a p...
[11:57:33] <wikibugs>	 (03PS3) 10Vgutierrez: ATS: Adjust timeouts in ats-tls and ats-backend instances [puppet] - 10https://gerrit.wikimedia.org/r/541524 (https://phabricator.wikimedia.org/T234887)
[11:58:39] <wikibugs>	 (03PS4) 10Vgutierrez: ATS: Adjust timeouts in ats-tls and ats-backend instances [puppet] - 10https://gerrit.wikimedia.org/r/541524 (https://phabricator.wikimedia.org/T234887)
[11:59:33] <wikibugs>	 (03PS1) 10Urbanecm: Initial configuration for banwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541527 (https://phabricator.wikimedia.org/T234768)
[11:59:47] <wikibugs>	 (03PS1) 10Muehlenhoff: Add a minimal setup.py and switch to dh-python [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/541528
[12:00:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for banwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541527 (https://phabricator.wikimedia.org/T234768) (owner: 10Urbanecm)
[12:00:14] <icinga-wm>	 PROBLEM - Host an-worker1079 is DOWN: PING CRITICAL - Packet loss = 100%
[12:00:30] <marostegui>	 ^ I guess that's from the PDU maintenance?
[12:01:08] <cmjohnson1>	 marostegui it was plugged in but it looks like it has a bad PSU
[12:01:14] <icinga-wm>	 PROBLEM - Host ps1-a2-eqiad is DOWN: PING CRITICAL - Packet loss = 100%
[12:01:39] <marostegui>	 elukey: ^
[12:01:45] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1002/18789/" [puppet] - 10https://gerrit.wikimedia.org/r/541524 (https://phabricator.wikimedia.org/T234887) (owner: 10Vgutierrez)
[12:01:53] <wikibugs>	 (03PS2) 10Muehlenhoff: Add a minimal setup.py and switch to dh-python [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/541528
[12:02:09] <wikibugs>	 (03PS2) 10Urbanecm: Initial configuration for banwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541527 (https://phabricator.wikimedia.org/T234768)
[12:02:27] <elukey>	 marostegui: thanks :)
[12:02:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for banwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541527 (https://phabricator.wikimedia.org/T234768) (owner: 10Urbanecm)
[12:03:49] <wikibugs>	 (03PS3) 10Urbanecm: Initial configuration for banwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541527 (https://phabricator.wikimedia.org/T234768)
[12:03:54] <icinga-wm>	 RECOVERY - Host an-worker1079 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms
[12:07:01] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10Wikispore: Creation of Wikispore mailing list - https://phabricator.wikimedia.org/T232961 (10Pharos) Can we create it this week? This will be a vital tool in building up community discussion and participation, and we want to do a kind of public launch for the projec...
[12:10:46] <icinga-wm>	 PROBLEM - IPMI Sensor Status on kafka-jumbo1002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[12:11:32] <wikibugs>	 10Operations, 10ops-eqiad: Move YHSM from auth1001 to auth1002 - https://phabricator.wikimedia.org/T233821 (10Cmjohnson) @MoritzMuehlenhoff It should be working now
[12:12:20] <icinga-wm>	 PROBLEM - Host ms-be1044.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:12:24] <icinga-wm>	 PROBLEM - Host ms-be1019.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:13:30] <icinga-wm>	 PROBLEM - Host cloudelastic1001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:13:34] <icinga-wm>	 PROBLEM - Host db1075.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:14:12] <wikibugs>	 10Operations, 10ops-eqiad: Move YHSM from auth1001 to auth1002 - https://phabricator.wikimedia.org/T233821 (10MoritzMuehlenhoff) 05Open→03Resolved Confirmed, thanks.
[12:14:26] <icinga-wm>	 PROBLEM - Host ms-be1045.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:14:34] <icinga-wm>	 PROBLEM - Host db1074.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:15:06] <icinga-wm>	 PROBLEM - Host es1011.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:15:06] <icinga-wm>	 PROBLEM - Host kafka-jumbo1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:15:36] <icinga-wm>	 PROBLEM - Host an-worker1078.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:15:36] <icinga-wm>	 PROBLEM - Host tungsten.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:15:36] <icinga-wm>	 PROBLEM - Host an-presto1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:16:14] <icinga-wm>	 PROBLEM - IPMI Sensor Status on es1012 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[12:16:26] <icinga-wm>	 PROBLEM - Host db1082.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:18:02] <icinga-wm>	 RECOVERY - Host ms-be1044.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.16 ms
[12:18:06] <icinga-wm>	 RECOVERY - Host ms-be1019.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms
[12:18:32] <wikibugs>	 10Operations, 10DC-Ops, 10decommission: decommission auth1001 - https://phabricator.wikimedia.org/T234909 (10MoritzMuehlenhoff)
[12:18:36] <icinga-wm>	 RECOVERY - Host db1075.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms
[12:19:10] <icinga-wm>	 RECOVERY - Host cloudelastic1001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms
[12:19:49] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/541528 (owner: 10Muehlenhoff)
[12:20:08] <icinga-wm>	 RECOVERY - Host ms-be1045.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.07 ms
[12:20:16] <icinga-wm>	 RECOVERY - Host db1074.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms
[12:21:06] <icinga-wm>	 RECOVERY - Host es1011.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.32 ms
[12:21:06] <icinga-wm>	 RECOVERY - Host an-presto1002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.38 ms
[12:21:06] <icinga-wm>	 RECOVERY - Host kafka-jumbo1002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.37 ms
[12:21:06] <icinga-wm>	 RECOVERY - Host an-worker1078.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms
[12:21:06] <icinga-wm>	 RECOVERY - Host tungsten.mgmt is UP: PING OK - Packet loss = 0%, RTA = 2.39 ms
[12:21:06] <icinga-wm>	 RECOVERY - Host db1082.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.90 ms
[12:23:06] <icinga-wm>	 RECOVERY - IPMI Sensor Status on kafka-jumbo1002 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[12:23:36] <icinga-wm>	 RECOVERY - IPMI Sensor Status on an-worker1079 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[12:24:01] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool es1012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541530
[12:24:20] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] db-eqiad.php: Depool es1012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541530 (owner: 10Marostegui)
[12:24:28] <marostegui>	 thanks jynus 
[12:25:12] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool es1012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541530 (owner: 10Marostegui)
[12:26:01] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool es1012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541530 (owner: 10Marostegui)
[12:27:11] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool es1012 T227138 (duration: 00m 51s)
[12:27:13] <marostegui>	 !log Stop MySQL on es1012 for onsite maintenance
[12:27:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:16] <stashbot>	 T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138
[12:27:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:11] <librenms-wmf>	 04Critical Alert for device asw2-a-eqiad.mgmt.eqiad.wmnet - Juniper alarm active
[12:34:24] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool es1012" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541531
[12:35:56] <icinga-wm>	 RECOVERY - IPMI Sensor Status on es1012 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[12:36:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool es1012" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541531 (owner: 10Marostegui)
[12:37:21] <wikibugs>	 (03PS1) 10Matthias Mullie: Increase rate limits for newbie non-ip users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541532 (https://phabricator.wikimedia.org/T231463)
[12:37:37] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool es1012" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541531 (owner: 10Marostegui)
[12:37:47] <wikibugs>	 10Operations, 10DC-Ops, 10decommission: decommission auth1001 - https://phabricator.wikimedia.org/T234909 (10MoritzMuehlenhoff)
[12:37:49] <wikibugs>	 (03CR) 10Matthias Mullie: [C: 04-2] "TBD" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541532 (https://phabricator.wikimedia.org/T231463) (owner: 10Matthias Mullie)
[12:38:40] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool es1012 T227138 (duration: 00m 51s)
[12:38:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:44] <stashbot>	 T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138
[12:39:26] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Slowly repool es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541533
[12:40:23] <wikibugs>	 (03PS2) 10Marostegui: db-eqiad.php: Slowly repool es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541533
[12:41:28] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541533 (owner: 10Marostegui)
[12:42:15] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541533 (owner: 10Marostegui)
[12:43:25] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool es1011 (duration: 00m 51s)
[12:43:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add a minimal setup.py and switch to dh-python [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/541528 (owner: 10Muehlenhoff)
[12:44:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P9261 and previous config saved to /var/cache/conftool/dbconfig/20191008-124417-marostegui.json
[12:44:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:47:22] <wikibugs>	 (03PS1) 10Elukey: role::analytics_test_cluster::coordinator: add druid load [puppet] - 10https://gerrit.wikimedia.org/r/541535
[12:48:23] <wikibugs>	 10Operations, 10MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), 10Patch-For-Review, 10User-Ladsgroup, and 2 others: Create Wikisource Hindi - https://phabricator.wikimedia.org/T218155 (10jhsoby) I have imported all Hindi-specific pages now (everything that was in the Hindi category on mulwikisource). What remai...
[12:52:00] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::analytics_test_cluster::coordinator: add druid load [puppet] - 10https://gerrit.wikimedia.org/r/541535 (owner: 10Elukey)
[12:53:36] <wikibugs>	 (03PS1) 10Muehlenhoff: Bump changelog for new release [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/541536
[12:54:11] <elukey>	 akosiaris: merged your changes for labs private
[12:54:20] <akosiaris>	 elukey: ah, thanks!
[12:55:00] <icinga-wm>	 PROBLEM - Check status of defined EventLogging jobs on eventlog1002 is CRITICAL: CRITICAL: Stopped EventLogging jobs: eventlogging-consumer@mysql-m4-master-00 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging
[12:57:35] <elukey>	 ah this is downtime expired --^
[12:57:38] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Fully repool es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541538
[12:59:50] <icinga-wm>	 RECOVERY - Check status of defined EventLogging jobs on eventlog1002 is OK: OK: All defined EventLogging jobs are runnning. https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging
[13:02:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "minor comment, rest LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/541522 (owner: 10Giuseppe Lavagetto)
[13:05:57] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541538 (owner: 10Marostegui)
[13:06:17] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] CloudVPS: use wikimediacloud.org domain for Neutron-related IP addresses [dns] - 10https://gerrit.wikimedia.org/r/541526 (https://phabricator.wikimedia.org/T234836) (owner: 10Arturo Borrero Gonzalez)
[13:06:42] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541538 (owner: 10Marostegui)
[13:07:56] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool es1011 (duration: 00m 51s)
[13:07:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:12:21] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "lgtm, but let's triple check with search as well!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539094 (https://phabricator.wikimedia.org/T204735) (owner: 10Mforns)
[13:15:07] <wikibugs>	 (03PS5) 10Ottomata: reportupdater: fix typo in config-file param [puppet] - 10https://gerrit.wikimedia.org/r/541324 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns)
[13:15:36] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] reportupdater: fix typo in config-file param [puppet] - 10https://gerrit.wikimedia.org/r/541324 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns)
[13:17:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9262 and previous config saved to /var/cache/conftool/dbconfig/20191008-131752-marostegui.json
[13:17:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:13] <wikibugs>	 (03PS1) 10Elukey: profile::analytics::refinery::job::druid_load: add kerb support [puppet] - 10https://gerrit.wikimedia.org/r/541541 (https://phabricator.wikimedia.org/T226698)
[13:21:25] <wikibugs>	 (03PS2) 10Elukey: profile::analytics::refinery::job::druid_load: add kerb support [puppet] - 10https://gerrit.wikimedia.org/r/541541 (https://phabricator.wikimedia.org/T226698)
[13:26:19] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/18791/" [puppet] - 10https://gerrit.wikimedia.org/r/541541 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey)
[13:29:07] <logmsgbot>	 !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543
[13:29:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:32:09] <logmsgbot>	 !log marostegui@cumin2001 dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9263 and previous config saved to /var/cache/conftool/dbconfig/20191008-133208-marostegui.json
[13:32:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:12] <logmsgbot>	 !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543 (duration: 06m 04s)
[13:35:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:41:53] <logmsgbot>	 !log marostegui@cumin2001 dbctl commit (dc=all): 'More traffic for db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9264 and previous config saved to /var/cache/conftool/dbconfig/20191008-134152-marostegui.json
[13:41:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:33] <wikibugs>	 (03PS1) 10Jbond: puppet::rsync: disable chroot on volatile and ssl rsync [puppet] - 10https://gerrit.wikimedia.org/r/541545
[13:43:12] <wikibugs>	 (03PS3) 10Mholloway: Update wikifeeds chart to 0.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/540967 (https://phabricator.wikimedia.org/T170455)
[13:44:13] <wikibugs>	 (03PS2) 10Jbond: puppet::rsync: disable chroot on volatile and ssl rsync [puppet] - 10https://gerrit.wikimedia.org/r/541545 (https://phabricator.wikimedia.org/T234315)
[13:44:35] <wikibugs>	 (03CR) 10Mholloway: [V: 03+2 C: 03+2] Update wikifeeds chart to 0.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/540967 (https://phabricator.wikimedia.org/T170455) (owner: 10Mholloway)
[13:46:57] <logmsgbot>	 !log @ helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
[13:47:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:12] <logmsgbot>	 !log @ helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
[13:49:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:34] <logmsgbot>	 !log marostegui@cumin2001 dbctl commit (dc=all): 'Fully repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9265 and previous config saved to /var/cache/conftool/dbconfig/20191008-135033-marostegui.json
[13:50:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:59] <logmsgbot>	 !log marostegui@cumin2001 dbctl commit (dc=all): 'Depool db1103:3312 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9266 and previous config saved to /var/cache/conftool/dbconfig/20191008-135058-marostegui.json
[13:51:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:03] <stashbot>	 T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625
[13:53:20] <wikibugs>	 (03PS1) 10Elukey: profile::analytics::refinery::job::test::druid_load: use analytics1041 [puppet] - 10https://gerrit.wikimedia.org/r/541547
[13:53:38] <logmsgbot>	 !log @ helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
[13:53:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:25] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: disable WMFSF, keep archives - https://phabricator.wikimedia.org/T233883 (10herron) 05Open→03Resolved a:03herron Hello, the WMFSF list has been disabled and archives will remain in place.  I'll transition to resolved now.  Thanks!
[13:55:18] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/18792/analytics1030.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/541547 (owner: 10Elukey)
[14:00:15] <icinga-wm>	 RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:00:55] <icinga-wm>	 RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:01:46] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: lvs::monitor_service: partial refactoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/541522 (owner: 10Giuseppe Lavagetto)
[14:02:48] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: lvs::monitor_service: partial refactoring [puppet] - 10https://gerrit.wikimedia.org/r/541522
[14:05:30] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] lvs::monitor_service: partial refactoring [puppet] - 10https://gerrit.wikimedia.org/r/541522 (owner: 10Giuseppe Lavagetto)
[14:08:35] <wikibugs>	 10Operations, 10serviceops, 10HHVM, 10Performance-Team (Radar): Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki)
[14:09:18] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Looks good!" [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/536376 (owner: 10Cwhite)
[14:11:48] <wikibugs>	 (03PS1) 10Mholloway: Update charts/index.yaml to add wikifeeds v0.0.4 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/541548 (https://phabricator.wikimedia.org/T170455)
[14:15:38] <wikibugs>	 (03CR) 10Muehlenhoff: "This sounds perfectly fine, the chroot protection is rather weak anyway (adding hardening to the systemd unit would gain more protection i" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/541545 (https://phabricator.wikimedia.org/T234315) (owner: 10Jbond)
[14:29:33] <wikibugs>	 (03PS3) 10Jbond: puppet::rsync: disable chroot on volatile and ssl rsync [puppet] - 10https://gerrit.wikimedia.org/r/541545 (https://phabricator.wikimedia.org/T234315)
[14:30:07] <wikibugs>	 (03CR) 10Jbond: "thanks" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/541545 (https://phabricator.wikimedia.org/T234315) (owner: 10Jbond)
[14:42:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/541545 (https://phabricator.wikimedia.org/T234315) (owner: 10Jbond)
[14:44:29] <wikibugs>	 10Operations, 10Core Platform Team, 10Editing-team, 10Parsing-Team, and 9 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Pcoombe) We're in peak fundraising season now, and I'm worried this might affect links to https://donate.wikimedia.o...
[14:48:38] <wikibugs>	 (03CR) 10Mobrovac: [C: 03+1] Update charts/index.yaml to add wikifeeds v0.0.4 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/541548 (https://phabricator.wikimedia.org/T170455) (owner: 10Mholloway)
[14:50:34] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+1] Gerrit: Switch replication url for replica to gerrit-replica [puppet] - 10https://gerrit.wikimedia.org/r/541386 (owner: 10Paladox)
[14:53:50] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[14:56:15] <wikibugs>	 10Operations, 10Core Platform Team, 10Editing-team, 10Fundraising-Backlog, and 10 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10DStrine)
[15:08:06] <wikibugs>	 (03PS1) 10Elukey: profile::analytics::cluster::users: ensure user druid [puppet] - 10https://gerrit.wikimedia.org/r/541554
[15:12:09] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] CloudVPS: use wikimediacloud.org domain for Neutron-related IP addresses [dns] - 10https://gerrit.wikimedia.org/r/541526 (https://phabricator.wikimedia.org/T234836) (owner: 10Arturo Borrero Gonzalez)
[15:13:49] <XioNoX>	 !log renumber BGP session to AS4761 on cr1-eqsin
[15:13:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:51] <wikibugs>	 10Operations, 10Wikimedia-Logstash: Upgrade ELK Stack - https://phabricator.wikimedia.org/T234854 (10herron)
[15:15:04] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[15:18:14] <XioNoX>	 !log add BGP sessions to AS2635 on cr2-eqiad
[15:18:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:18:58] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10Effeietsanders) p:05Normal→03High Now the subscriptions were not just disabled, but some 30+ were actually unsubscribed. We're doing a huge disservice to community members that...
[15:20:59] <XioNoX>	 !log add BGP sessions to AS199524 on cr2-eqdfw
[15:21:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:56] <wikibugs>	 10Operations, 10Core Platform Team, 10Editing-team, 10Fundraising-Backlog, and 10 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Krinkle) @Pcoombe I don't think this will go live before January, but if it helps, let's just exclude any an...
[15:30:47] <XioNoX>	 !log remove 2 more sessions to AS12871 on cr2-esams - T232617
[15:30:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:30:52] <stashbot>	 T232617: BGP sessions down on cr2-esams - https://phabricator.wikimedia.org/T232617
[15:31:20] <wikibugs>	 10Operations, 10netops: BGP sessions down on cr2-esams - https://phabricator.wikimedia.org/T232617 (10ayounsi) 05Open→03Resolved Seems like they had 4 sessions in total.
[15:35:19] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Bump changelog for new release [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/541536 (owner: 10Muehlenhoff)
[15:50:40] <wikibugs>	 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10Cmjohnson)
[15:51:01] <wikibugs>	 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10Cmjohnson) @fgiunchedi all the on-site work has been completed...they need production DNS
[15:52:17] <wikibugs>	 (03PS1) 10Muehlenhoff: debdeploy: Fix update_type type [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/541558
[15:55:37] <wikibugs>	 (03CR) 10Mholloway: [V: 03+2 C: 03+2] Update charts/index.yaml to add wikifeeds v0.0.4 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/541548 (https://phabricator.wikimedia.org/T170455) (owner: 10Mholloway)
[15:56:28] <wikibugs>	 (03PS2) 10Elukey: profile::analytics::cluster::users: ensure user druid [puppet] - 10https://gerrit.wikimedia.org/r/541554
[15:57:37] <logmsgbot>	 !log @ helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
[15:57:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:58] <logmsgbot>	 !log @ helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
[15:59:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:00:04] <jouncebot>	 godog and _joe_: #bothumor My software never has bugs. It just develops random features. Rise for Puppet SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191008T1600).
[16:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:00:10] <logmsgbot>	 !log @ helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
[16:00:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:00:30] <icinga-wm>	 RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
[16:01:30] <icinga-wm>	 RECOVERY - wikifeeds codfw on wikifeeds.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
[16:02:41] <wikibugs>	 (03PS3) 10Elukey: profile::analytics::cluster::users: ensure user druid [puppet] - 10https://gerrit.wikimedia.org/r/541554
[16:04:34] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10elukey) ping again on this :)
[16:04:43] <wikibugs>	 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10wiki_willy) a:05Jclark-ctr→03RobH @RobH - can you take care of DNS for this to get things completed from the dc-ops side for this install?  This one's super urgent, so if you...
[16:10:02] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/18795/" [puppet] - 10https://gerrit.wikimedia.org/r/541554 (owner: 10Elukey)
[16:15:25] <wikibugs>	 (03PS3) 10Herron: logstash: output mediawiki type to logstash-medaiwiki ES index [puppet] - 10https://gerrit.wikimedia.org/r/540486
[16:20:14] <wikibugs>	 (03CR) 10Herron: [C: 03+2] logstash: output mediawiki type to logstash-medaiwiki ES index [puppet] - 10https://gerrit.wikimedia.org/r/540486 (owner: 10Herron)
[16:29:10] <wikibugs>	 (03CR) 10Daimona Eaytoy: [C: 03+1] [cirrus] drop support for HHVM connection pooling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541425 (owner: 10DCausse)
[16:35:56] <wikibugs>	 (03PS1) 10RobH: ms-be105[1-6] production dns [dns] - 10https://gerrit.wikimedia.org/r/541567 (https://phabricator.wikimedia.org/T232367)
[16:49:29] <wikibugs>	 (03CR) 10RobH: [C: 03+2] ms-be105[1-6] production dns [dns] - 10https://gerrit.wikimedia.org/r/541567 (https://phabricator.wikimedia.org/T232367) (owner: 10RobH)
[16:50:50] <robh>	 godog: heyas did you wanna handle the puppet and os install on new ms-be hosts or should i?
[16:50:59] <robh>	 production dns is now in place
[16:51:16] <robh>	 (either answer is fine i just dunno if you reimage them from role spare or not so asking here)
[16:53:23] <godog>	 robh: hey, reimaging / puppet run into their swift::storage role is fine, thanks!
[16:53:36] <godog>	 as in, they won't enter service even with the role applied
[16:54:02] <robh>	 ahh, ok so install and apply normal role immediatly no spare
[16:54:14] <robh>	 want me to install?
[16:54:41] <godog>	 robh: yes please, thanks!
[16:54:54] <robh>	 ok, ill do now and you should have the task assigned to you when you start tomorrow =]
[16:55:02] <godog>	 should work on first attempt, let me know if it doesn't
[16:55:16] <godog>	 robh: working PDT hours this week (in Vancouver) so I'm around !
[16:55:33] <robh>	 oh, cool
[16:55:38] <robh>	 then you'll have shortly
[16:57:39] <godog>	 awesome, thanks for your help
[16:58:19] <robh>	 welcome
[17:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and accraze: Dear deployers, time to do the Services – Graphoid / Parsoid / Citoid / ORES deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191008T1700).
[17:00:37] <subbu>	 no parsoid deploy today
[17:02:15] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10RobH)
[17:11:26] <wikibugs>	 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10RobH)
[17:15:03] <wikibugs>	 (03PS1) 10RobH: setting most new ms-be hosts mac addresses [puppet] - 10https://gerrit.wikimedia.org/r/541578 (https://phabricator.wikimedia.org/T232367)
[17:16:21] <wikibugs>	 (03CR) 10RobH: [C: 03+2] setting most new ms-be hosts mac addresses [puppet] - 10https://gerrit.wikimedia.org/r/541578 (https://phabricator.wikimedia.org/T232367) (owner: 10RobH)
[17:16:48] <wikibugs>	 (03PS3) 10Krinkle: Set "allow_tcp_nagle_delay" to false in mc.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521967 (owner: 10Aaron Schulz)
[17:16:54] <wikibugs>	 (03PS4) 10Krinkle: Set "allow_tcp_nagle_delay" to false in mc.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521967 (owner: 10Aaron Schulz)
[17:19:52] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10Cmjohnson) I don't know what you need me to do...the servers were setup correctly.
[17:22:37] <wikibugs>	 (03CR) 10Brennen Bearnes: "> Patch Set 2: Code-Review+1" [deployment-charts] - 10https://gerrit.wikimedia.org/r/541371 (https://phabricator.wikimedia.org/T234578) (owner: 10Jeena Huneidi)
[17:23:56] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[17:28:11] <wikibugs>	 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10RobH)
[17:30:16] <XioNoX>	 chaomodus: ^
[17:30:28] <chaomodus>	 yejp
[17:32:16] <marxarelli>	 !log cutting wmf/1.35.0-wmf.1
[17:32:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:08] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[17:45:45] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138 (10Cmjohnson)
[17:46:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138 (10Cmjohnson) the pdu swap is over, we did lose an-worker1079 due to the PSUs not failing over.  Everything is cabled and they're linked together. still needs updating.
[17:50:09] <wikibugs>	 (03PS1) 10RobH: fixing mac entries for new ms-be systems [puppet] - 10https://gerrit.wikimedia.org/r/541579 (https://phabricator.wikimedia.org/T232367)
[17:52:08] <wikibugs>	 (03CR) 10RobH: [C: 03+2] fixing mac entries for new ms-be systems [puppet] - 10https://gerrit.wikimedia.org/r/541579 (https://phabricator.wikimedia.org/T232367) (owner: 10RobH)
[17:54:04] <wikibugs>	 (03PS8) 10Cwhite: initial commit [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/536376
[17:56:24] <wikibugs>	 (03CR) 10Ottomata: "I think if you just include the druid user in profile::hadoop::master::hadoop_user_groups, the hdfs home dir will be auto-create." [puppet] - 10https://gerrit.wikimedia.org/r/541554 (owner: 10Elukey)
[18:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191008T1800)
[18:12:53] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Use new dev image for parsoid [deployment-charts] - 10https://gerrit.wikimedia.org/r/541371 (https://phabricator.wikimedia.org/T234578) (owner: 10Jeena Huneidi)
[18:13:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Use new dev image for parsoid [deployment-charts] - 10https://gerrit.wikimedia.org/r/541371 (https://phabricator.wikimedia.org/T234578) (owner: 10Jeena Huneidi)
[18:15:08] <wikibugs>	 (03PS1) 10Jforrester: [Beta Cluster] Disable wgLegacyJavaScriptGlobals [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541581
[18:16:33] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138 (10wiki_willy) a:05Cmjohnson→03RobH Re-assigning to @RobH to complete install/updating of new PDU.  Thanks, Willy
[18:19:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] Fix phatality deployment script [puppet] - 10https://gerrit.wikimedia.org/r/540117 (owner: 1020after4)
[18:19:16] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Fix phatality deployment script [puppet] - 10https://gerrit.wikimedia.org/r/540117 (owner: 1020after4)
[18:20:06] <wikibugs>	 (03PS2) 10Dzahn: parsoid/conftool: add wtp servers as apache appservers [puppet] - 10https://gerrit.wikimedia.org/r/541377 (https://phabricator.wikimedia.org/T233654)
[18:20:46] <wikibugs>	 (03PS9) 10Cwhite: initial commit [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/536376
[18:21:43] <wikibugs>	 (03PS3) 10Jeena Huneidi: Use new dev image for parsoid [deployment-charts] - 10https://gerrit.wikimedia.org/r/541371 (https://phabricator.wikimedia.org/T234578)
[18:22:32] <wikibugs>	 (03CR) 1020after4: [C: 03+1] site/phabricator: apply phab role on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/536712 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)
[18:26:12] <wikibugs>	 (03PS1) 10Dduvall: Group0 to 1.35.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541582
[18:35:35] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Use new dev image for parsoid [deployment-charts] - 10https://gerrit.wikimedia.org/r/541371 (https://phabricator.wikimedia.org/T234578) (owner: 10Jeena Huneidi)
[18:43:08] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[18:43:12] <wikibugs>	 (03PS2) 10Jforrester: [Beta Cluster] Disable wgLegacyJavaScriptGlobals [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541581 (https://phabricator.wikimedia.org/T72470)
[18:44:29] <wikibugs>	 (03Merged) 10jenkins-bot: Use new dev image for parsoid [deployment-charts] - 10https://gerrit.wikimedia.org/r/541371 (https://phabricator.wikimedia.org/T234578) (owner: 10Jeena Huneidi)
[18:45:23] <logmsgbot>	 !log dduvall@deploy1001 Pruned MediaWiki: 1.34.0-wmf.24 (duration: 08m 24s)
[18:45:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:15] <godog>	 !log codfw-prod: more weight to ms-be205[1-6] - T233638
[18:53:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:20] <stashbot>	 T233638: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638
[18:54:21] <logmsgbot>	 !log dduvall@deploy1001 Started scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache
[18:54:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:55:03] <wikibugs>	 (03PS10) 10Cwhite: initial commit [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/536376
[18:57:20] <Reedy>	 "Currently active MediaWiki versions:" is broken on noc/conf :(
[18:58:45] <Reedy>	 looks to be a cached issue
[18:59:01] <Reedy>	 works locally on deploy1001
[18:59:02] <Reedy>	 <p>Currently active MediaWiki versions: 1.34.0-wmf.25, 1.35.0-wmf.1</p>
[18:59:45] <wikibugs>	 10Operations, 10media-storage, 10User-fgiunchedi: ms-be1020 - host went down - https://phabricator.wikimedia.org/T234698 (10fgiunchedi) Indeed we'd need to upgrade its firmware as per {T141756}, holding off once we have new swift hw in place in eqiad to not "jinx it" if we possibly can
[19:00:04] <jouncebot>	 marxarelli: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train - American version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191008T1900).
[19:00:49] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] "Fixed deb package build." [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/536376 (owner: 10Cwhite)
[19:02:22] <wikibugs>	 (03PS1) 10CRusnov: Add netbox geodns entries. [dns] - 10https://gerrit.wikimedia.org/r/541602
[19:12:08] <icinga-wm>	 PROBLEM - ElasticSearch shard size check - 9243 on search.svc.eqiad.wmnet is CRITICAL: CRITICAL - commonswiki_content_1556235298(77gb) https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed
[19:13:16] <wikibugs>	 (03CR) 10Dzahn: "let' use topic branch "gerrit-migration-day" on all the patches we want to merge on the day of" [dns] - 10https://gerrit.wikimedia.org/r/541393 (owner: 10Paladox)
[19:13:42] <logmsgbot>	 !log dduvall@deploy1001 Finished scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache (duration: 19m 21s)
[19:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:14:25] <wikibugs>	 (03PS3) 10Dzahn: Gerrit: Switch replication url for replica to gerrit-replica [puppet] - 10https://gerrit.wikimedia.org/r/541386 (owner: 10Paladox)
[19:14:53] <paladox>	 mutante remember gerrit requires a restart after merging that :) (just making sure that your aware).
[19:15:11] <paladox>	 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/541115/
[19:15:16] <paladox>	 should stop us having to restart.
[19:15:27] <mutante>	 paladox: yes, i am aware. that is great news though 
[19:15:33] <paladox>	 ok :)
[19:15:55] <mutante>	 hmm. yea. the autoReload thing
[19:16:27] <mutante>	 setting that to _false_ makes it .. eh.. reload config ??
[19:16:33] <mutante>	 reads that again
[19:17:06] <mutante>	 i am not sure we actually prefer that over having control over both merge and restart separately
[19:17:54] <mutante>	 first let me merge the part that is already reviewed and we all agree
[19:18:26] <wikibugs>	 (03PS1) 10Jforrester: CommonSettings-labs: array_merge on NULL returns NULL, not [], what fun [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541603
[19:18:28] <wikibugs>	 (03PS1) 10Jforrester: CommonSettings: Split out the CSP configuration s it can be more easily over-ridden [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541604
[19:18:34] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Gerrit: Switch replication url for replica to gerrit-replica [puppet] - 10https://gerrit.wikimedia.org/r/541386 (owner: 10Paladox)
[19:19:00] <mutante>	 paladox: let's use new topic branch name "gerrit-migration-day" and slap it on the patches for the "day of"
[19:19:10] <paladox>	 ok, yup!
[19:19:12] <paladox>	 thanks!
[19:29:16] <shdubsh>	 !log adding swagger exporter to apt repo
[19:29:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:25] <logmsgbot>	 !log dduvall@deploy1001 Synchronized php-1.35.0-wmf.1/skins/MinervaNeue/: sync T233521 backport prior to group0 (duration: 00m 59s)
[19:38:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:30] <stashbot>	 T233521: Language selector and edit button icon not displayed on first load in iOS 13 - https://phabricator.wikimedia.org/T233521
[19:39:02] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch shard size check - 9243 on search.svc.eqiad.wmnet is CRITICAL: CRITICAL - commonswiki_content_1556235298(77gb) Mathew.onipe Ill silence this for. Will keep an eye to see if it recovers. If it doesnt, then reindex is imminent. - The acknowledgement expires at: 2019-10-09 19:36:56. https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed
[19:39:06] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+1] [cirrus] drop support for HHVM connection pooling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541425 (owner: 10DCausse)
[19:40:13] <wikibugs>	 (03PS1) 10Phamhi: tools-webservice: Disable access.log feature by default [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/541609 (https://phabricator.wikimedia.org/T233347)
[19:40:36] <wikibugs>	 (03CR) 10Dduvall: [C: 03+2] Group0 to 1.35.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541582 (owner: 10Dduvall)
[19:41:39] <wikibugs>	 (03Merged) 10jenkins-bot: Group0 to 1.35.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541582 (owner: 10Dduvall)
[19:43:40] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.1
[19:43:43] <wikibugs>	 (03CR) 10Phamhi: tools-webservice: Disable access.log feature by default (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/541609 (https://phabricator.wikimedia.org/T233347) (owner: 10Phamhi)
[19:43:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:47] <marxarelli>	 !log 1.35.0-wmf.1 promoted to group0, cc: T233849. no rise in error rates. no new relevant errors
[19:51:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:50] <stashbot>	 T233849: 1.35.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T233849
[19:52:44] <icinga-wm>	 PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 27976 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops
[19:54:30] <wikibugs>	 (03PS6) 10Brennen Bearnes: mediawiki-dev: use wikimedia/mediawiki-core:dev [deployment-charts] - 10https://gerrit.wikimedia.org/r/535342 (https://phabricator.wikimedia.org/T234391)
[19:54:32] <wikibugs>	 (03PS1) 10Dzahn: add wtp1025/wtp2001 to list of servers using Parsoid/PHP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541611 (https://phabricator.wikimedia.org/T233654)
[19:57:28] <mutante>	 jouncebot: now
[19:57:28] <jouncebot>	 For the next 0 hour(s) and 2 minute(s): MediaWiki train - American version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191008T1900)
[19:59:32] <wikibugs>	 (03CR) 10SBassett: [C: 03+1] CommonSettings-labs: array_merge on NULL returns NULL, not [], what fun [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541603 (owner: 10Jforrester)
[20:00:33] <wikibugs>	 (03CR) 10SBassett: [C: 03+1] CommonSettings: Split out the CSP configuration s it can be more easily over-ridden [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541604 (owner: 10Jforrester)
[20:09:23] <wikibugs>	 10Operations, 10Cloud-Services: login on wikitech wiki fails - https://phabricator.wikimedia.org/T234996 (10Dzahn)
[20:11:36] <wikibugs>	 10Operations, 10Traffic: Make Netbox Active/Active - https://phabricator.wikimedia.org/T234997 (10crusnov)
[20:12:00] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[20:12:12] <wikibugs>	 (03PS2) 10CRusnov: Add netbox geodns entries. [dns] - 10https://gerrit.wikimedia.org/r/541602 (https://phabricator.wikimedia.org/T234997)
[20:13:06] <wikibugs>	 (03CR) 10Subramanya Sastry: add wtp1025/wtp2001 to list of servers using Parsoid/PHP (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541611 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[20:15:18] <icinga-wm>	 RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops
[20:24:30] <mutante>	 !log labweb1001 - edit /srv/mediawiki/wmf-config/wikitech.php to and change "false" to "true" on line 52 to enable LDAP debug logging for T234996
[20:24:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:34] <stashbot>	 T234996: login on wikitech wiki fails - https://phabricator.wikimedia.org/T234996
[20:25:44] <wikibugs>	 (03PS1) 10Cwhite: profile, prometheus: install swagger exporter on icinga [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870)
[20:28:09] <wikibugs>	 10Operations, 10Cloud-Services: login on wikitech wiki fails - https://phabricator.wikimedia.org/T234996 (10Dzahn) Enabled the debug log as suggested by Krenair.  Debug log shows a restCall to cloudcontrol1003 to get a token:  OpenStackNovaController::restCall fullurl: http://cloudcontrol1003.wikimedia.org  fo...
[20:29:00] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create wikimedia sustainability mailing list - https://phabricator.wikimedia.org/T234999 (10mepps)
[20:30:10] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create wikimedia sustainability mailing list - https://phabricator.wikimedia.org/T234999 (10mepps)
[20:38:51] <mutante>	 !log labweb1001 - disabled 2fa for myself on Wikitech using disableOATHAuthForUser.php --wiki=labswiki to debug T234996
[20:38:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:38:55] <stashbot>	 T234996: login on wikitech wiki fails - https://phabricator.wikimedia.org/T234996
[20:43:05] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] "LGTM, but we ought to have @BPirkle take a look (since he did the same for sessionstore)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/540731 (https://phabricator.wikimedia.org/T222851) (owner: 10Catrope)
[21:00:01] <wikibugs>	 (03CR) 10Subramanya Sastry: add wtp1025/wtp2001 to list of servers using Parsoid/PHP (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541611 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[21:01:06] <icinga-wm>	 PROBLEM - configured eth on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[21:01:22] <icinga-wm>	 PROBLEM - MD RAID on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[21:01:32] <icinga-wm>	 PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1003&var-datasource=eqiad+prometheus/ops
[21:01:42] <icinga-wm>	 PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:02:14] <icinga-wm>	 PROBLEM - dhclient process on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[21:02:26] <icinga-wm>	 PROBLEM - Check size of conntrack table on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[21:02:34] <icinga-wm>	 PROBLEM - DPKG on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[21:02:36] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[21:03:10] <icinga-wm>	 RECOVERY - Disk space on notebook1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1003&var-datasource=eqiad+prometheus/ops
[21:03:20] <icinga-wm>	 RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:03:28] <mutante>	 notebook1003 - echo "please don't use all the RAM" | wall
[21:03:37] <chaomodus>	 ;)
[21:03:40] <James_F>	 :-D
[21:03:48] <chaomodus>	 mmaybe we could run nagios-nrpe-server in a slice that can't get oomed
[21:03:52] <icinga-wm>	 PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 23073 MB (4% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops
[21:04:10] <mutante>	 chaomodus: that would be nice indeed
[21:07:11] <chaomodus>	 i guess we can exempt it from oom killer (well reduce its score enough to exempt it)
[21:08:00] <icinga-wm>	 PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1003&var-datasource=eqiad+prometheus/ops
[21:08:10] <icinga-wm>	 PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:08:12] <chaomodus>	 it died again ha
[21:08:40] <icinga-wm>	 RECOVERY - dhclient process on notebook1003 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[21:08:52] <icinga-wm>	 RECOVERY - Check size of conntrack table on notebook1003 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[21:09:00] <icinga-wm>	 RECOVERY - DPKG on notebook1003 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[21:09:02] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on notebook1003 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[21:09:10] <icinga-wm>	 RECOVERY - configured eth on notebook1003 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[21:09:26] <icinga-wm>	 RECOVERY - MD RAID on notebook1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[21:09:29] <mutante>	 or we exclude this host from icinga alerts and declare it a test server
[21:09:36] <icinga-wm>	 RECOVERY - Disk space on notebook1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1003&var-datasource=eqiad+prometheus/ops
[21:09:46] <icinga-wm>	 RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:09:46] <chaomodus>	 !log restarted nagios-nrpe-server on notebook1003
[21:09:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:52] <chaomodus>	 yah i suppose, or give it more ram
[21:10:59] <chaomodus>	 oom killeri s super killy on that box
[21:11:19] <mutante>	 https://phabricator.wikimedia.org/T212824
[21:11:42] <mutante>	 see " Introduce profile::analytics::cluster::limits::statistics"  etc
[21:12:18] <chaomodus>	 oic
[21:12:18] <mutante>	 https://phabricator.wikimedia.org/T212824#4967798
[21:18:51] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: disable WMFSF, keep archives - https://phabricator.wikimedia.org/T233883 (10Varnent) @herron - just want to verify that the old list will forward to the new one in case people use the old SF address. New list is: sf-foundation-local@wikimedia.org
[21:23:10] <icinga-wm>	 RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops
[21:23:36] <wikibugs>	 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): login on wikitech wiki fails - https://phabricator.wikimedia.org/T234996 (10bd808)
[21:24:12] <wikibugs>	 (03PS2) 10Subramanya Sastry: Add wtp1025/wtp2001 to the list of servers using Parsoid/PHP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541611 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[21:25:15] <wikibugs>	 (03CR) 10Subramanya Sastry: "mutante: Uploaded a PS2 with changes I was recommending." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541611 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[21:26:12] <wikibugs>	 (03CR) 10Subramanya Sastry: "Added CPT members + Gergo in case they have opinions about how this is configured in production." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541611 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[21:26:22] <wikibugs>	 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): login on wikitech wiki fails - https://phabricator.wikimedia.org/T234996 (10bd808) Possibly a duplicate of {T234686} reported on 2019-10-04
[21:29:15] <wikibugs>	 (03PS1) 10Dzahn: logstash: add wtp1025/wtp2001 to filter-mediawiki with parsoid-php channel [puppet] - 10https://gerrit.wikimedia.org/r/541645 (https://phabricator.wikimedia.org/T233654)
[21:30:50] <wikibugs>	 (03CR) 10Subramanya Sastry: [C: 03+1] logstash: add wtp1025/wtp2001 to filter-mediawiki with parsoid-php channel [puppet] - 10https://gerrit.wikimedia.org/r/541645 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[21:31:32] <wikibugs>	 (03CR) 10Subramanya Sastry: [C: 03+1] "Added parsing team members as reviewers as an FYI." [puppet] - 10https://gerrit.wikimedia.org/r/541645 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[21:42:28] <wikibugs>	 (03PS7) 10Brennen Bearnes: mediawiki-dev: use wikimedia/mediawiki-core:dev [deployment-charts] - 10https://gerrit.wikimedia.org/r/535342 (https://phabricator.wikimedia.org/T234391)
[21:49:43] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] [Beta Cluster] Disable wgLegacyJavaScriptGlobals [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541581 (https://phabricator.wikimedia.org/T72470) (owner: 10Jforrester)
[21:53:44] <James_F>	 Prod clear? Going to deploy some fixes.
[21:54:02] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] [Beta Cluster] Disable wgLegacyJavaScriptGlobals [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541581 (https://phabricator.wikimedia.org/T72470) (owner: 10Jforrester)
[21:54:07] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] CommonSettings-labs: array_merge on NULL returns NULL, not [], what fun [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541603 (owner: 10Jforrester)
[21:54:09] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] CommonSettings: Split out the CSP configuration s it can be more easily over-ridden [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541604 (owner: 10Jforrester)
[21:55:10] <wikibugs>	 (03Merged) 10jenkins-bot: [Beta Cluster] Disable wgLegacyJavaScriptGlobals [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541581 (https://phabricator.wikimedia.org/T72470) (owner: 10Jforrester)
[21:55:37] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings-labs: array_merge on NULL returns NULL, not [], what fun [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541603 (owner: 10Jforrester)
[21:55:46] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings: Split out the CSP configuration s it can be more easily over-ridden [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541604 (owner: 10Jforrester)
[21:58:25] <logmsgbot>	 !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: Split out the CSP configuration s it can be more easily over-ridden (duration: 00m 59s)
[21:58:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:17] <wikibugs>	 (03PS9) 10Filippo Giunchedi: WIP: turn on swiftrepl on swift frontends [puppet] - 10https://gerrit.wikimedia.org/r/537613
[22:06:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: turn on swiftrepl on swift frontends [puppet] - 10https://gerrit.wikimedia.org/r/537613 (owner: 10Filippo Giunchedi)
[22:07:46] <wikibugs>	 (03PS1) 10RobH: ms-be105[15] dhcp info [puppet] - 10https://gerrit.wikimedia.org/r/541652 (https://phabricator.wikimedia.org/T232367)
[22:08:15] <wikibugs>	 (03PS10) 10Filippo Giunchedi: WIP: turn on swiftrepl on swift frontends [puppet] - 10https://gerrit.wikimedia.org/r/537613
[22:08:17] <wikibugs>	 (03CR) 10RobH: [C: 03+2] ms-be105[15] dhcp info [puppet] - 10https://gerrit.wikimedia.org/r/541652 (https://phabricator.wikimedia.org/T232367) (owner: 10RobH)
[22:09:42] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 39 probes of 463 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[22:10:50] <wikibugs>	 (03PS11) 10Filippo Giunchedi: site: turn on swiftrepl on swift frontends [puppet] - 10https://gerrit.wikimedia.org/r/537613 (https://phabricator.wikimedia.org/T162123)
[22:11:31] <wikibugs>	 (03PS1) 10EBernhardson: yarn: Add sequential scheduler queue for heavy jobs [puppet] - 10https://gerrit.wikimedia.org/r/541654
[22:15:20] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 25 probes of 463 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[22:21:30] <wikibugs>	 (03PS1) 10Jforrester: [Beta Cluster] Enable wmgUseCSPReportOnly for all [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541655 (https://phabricator.wikimedia.org/T211539)
[22:21:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1002/18799/" [puppet] - 10https://gerrit.wikimedia.org/r/537613 (https://phabricator.wikimedia.org/T162123) (owner: 10Filippo Giunchedi)
[22:25:24] <wikibugs>	 (03CR) 10SBassett: [C: 03+1] [Beta Cluster] Enable wmgUseCSPReportOnly for all [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541655 (https://phabricator.wikimedia.org/T211539) (owner: 10Jforrester)
[22:26:04] <wikibugs>	 (03PS2) 10Dzahn: site/phabricator: apply phab role on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/536712 (https://phabricator.wikimedia.org/T190568)
[22:27:32] <wikibugs>	 (03PS3) 10Dzahn: site/phabricator: apply phab role on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/536712 (https://phabricator.wikimedia.org/T190568)
[22:28:18] <wikibugs>	 (03CR) 10Krinkle: "Sounds good to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 (owner: 10Aaron Schulz)
[22:44:27] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] site/phabricator: apply phab role on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/536712 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)
[22:44:41] <wikibugs>	 (03PS4) 10Dzahn: site/phabricator: apply phab role on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/536712 (https://phabricator.wikimedia.org/T190568)
[22:51:07] <icinga-wm>	 ACKNOWLEDGEMENT - Host ps1-a2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn check with dcops for status of PDU work
[22:54:48] <icinga-wm>	 PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.02772 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[23:00:04] <jouncebot>	 MaxSem, RoanKattouw, Niharika, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191008T2300).
[23:00:04] <jouncebot>	 ebernhardson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:01:15] <ebernhardson>	 i can ship it
[23:01:52] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541425 (owner: 10DCausse)
[23:02:39] <wikibugs>	 (03Merged) 10jenkins-bot: [cirrus] drop support for HHVM connection pooling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541425 (owner: 10DCausse)
[23:03:07] <icinga-wm>	 ACKNOWLEDGEMENT - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.05351 ge 0.01 daniel_zahn thats just 4 hosts and the trigger was 3 - 4 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[23:05:18] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized wmf-config/: [cirrus] drop support for HHVM connection pooling (duration: 00m 59s)
[23:05:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:06:39] <ebernhardson>	 SWAT complete
[23:18:41] <wikibugs>	 (03PS1) 10Dzahn: phabricator: support buster with PHP 7.3 packages [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568)
[23:20:35] <wikibugs>	 (03PS2) 10Dzahn: phabricator: support buster with PHP 7.3 packages [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568)
[23:22:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] phabricator: support buster with PHP 7.3 packages [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)
[23:24:46] <wikibugs>	 (03CR) 10Dzahn: "needs https://gerrit.wikimedia.org/r/c/operations/puppet/+/541666 and more follow-ups, a couple things not working yet" [puppet] - 10https://gerrit.wikimedia.org/r/536712 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)
[23:26:16] <wikibugs>	 (03CR) 10Paladox: phabricator: support buster with PHP 7.3 packages (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)
[23:28:53] <mutante>	 !log phab1001 - replacing tin.eqiad.wmnet with deploy1001.eqiad.wmnet in phabricator/deployment-cache/.config:git_server - wondering if we can ever get rid of tin (T190568)
[23:29:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:29:01] <stashbot>	 T190568: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568
[23:31:57] <wikibugs>	 (03CR) 10Paladox: phabricator: support buster with PHP 7.3 packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)
[23:33:56] <wikibugs>	 (03PS3) 10Dzahn: phabricator: support buster with PHP 7.3 packages [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568)
[23:36:51] <wikibugs>	 (03CR) 10Dzahn: "also changes on phab1003 ? https://puppet-compiler.wmflabs.org/compiler1001/18803/" [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)
[23:42:12] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests: Requesting access to analytics cluster for Djellel Difallah - https://phabricator.wikimedia.org/T234473 (10leila) @Nuria is your approval needed on this task?
[23:46:00] <wikibugs>	 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): login on wikitech wiki fails - https://phabricator.wikimedia.org/T234996 (10bd808) I set a custom message at https://wikitech.wikimedia.org/wiki/MediaWiki:Loginprompt that will show up on the login screen.{F30597255}
[23:54:25] <wikibugs>	 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10RobH)
[23:56:21] <wikibugs>	 (03PS4) 10Dzahn: phabricator: support buster with PHP 7.3 packages [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568)
[23:56:29] <wikibugs>	 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10RobH)
[23:57:16] <wikibugs>	 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API - https://phabricator.wikimedia.org/T234996 (10bd808)
[23:57:19] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "there are more "php72" require lines.. need to repeat a bunch of code or do it nicer some way" [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)
[23:58:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] phabricator: support buster with PHP 7.3 packages [puppet] - 10https://gerrit.wikimedia.org/r/541666 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn)