[00:01:12] <wikibugs>	 (03PS1) 10Dzahn: hieradata/labs: set cluster_search and deployment server for staging instance [puppet] - 10https://gerrit.wikimedia.org/r/561929
[00:02:31] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] hieradata/labs: set cluster_search and deployment server for staging instance [puppet] - 10https://gerrit.wikimedia.org/r/561929 (owner: 10Dzahn)
[00:07:17] <wikibugs>	 (03PS2) 10Dzahn: hieradata/labs: set cluster_search hosts and puppetmaster for devtools phab [puppet] - 10https://gerrit.wikimedia.org/r/561929
[00:09:26] <wikibugs>	 (03PS2) 10BryanDavis: support tools: Add script to rebuild all images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561730
[00:11:32] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] hieradata/labs: set cluster_search hosts and puppetmaster for devtools phab [puppet] - 10https://gerrit.wikimedia.org/r/561929 (owner: 10Dzahn)
[00:29:59] <wikibugs>	 (03CR) 10BryanDavis: support tools: Add script to rebuild all images (032 comments) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561730 (owner: 10BryanDavis)
[00:31:13] <wikibugs>	 (03CR) 10BryanDavis: "> Patch Set 2: Code-Review+2" [puppet] - 10https://gerrit.wikimedia.org/r/561437 (https://phabricator.wikimedia.org/T210993) (owner: 10BryanDavis)
[01:19:07] <wikibugs>	 (03PS1) 10Dzahn: mediawiki::php: allow setting PHP version to 7.3 for buster [puppet] - 10https://gerrit.wikimedia.org/r/561931
[01:20:55] <wikibugs>	 (03CR) 10Dzahn: "We tried to setup the first deployment_server on buster in labs and ran into this issue not being able to set the PHP version for extensio" [puppet] - 10https://gerrit.wikimedia.org/r/561931 (owner: 10Dzahn)
[01:27:23] <wikibugs>	 (03PS1) 10BryanDavis: Bump python version to 3.7 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561932
[01:27:25] <wikibugs>	 (03PS1) 10BryanDavis: Update deployment for new Kubernetes cluster [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561933
[01:27:59] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Bump python version to 3.7 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561932 (owner: 10BryanDavis)
[01:28:22] <wikibugs>	 (03Merged) 10jenkins-bot: Bump python version to 3.7 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561932 (owner: 10BryanDavis)
[01:45:51] <wikibugs>	 (03PS2) 10BryanDavis: Update deployment and control script for new Kubernetes cluster [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561933
[01:59:36] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Update deployment and control script for new Kubernetes cluster [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561933 (owner: 10BryanDavis)
[01:59:55] <wikibugs>	 (03Merged) 10jenkins-bot: Update deployment and control script for new Kubernetes cluster [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561933 (owner: 10BryanDavis)
[02:03:35] <wikibugs>	 (03PS1) 10BryanDavis: Add missing selector to Deployment [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561935
[02:03:51] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Add missing selector to Deployment [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561935 (owner: 10BryanDavis)
[02:04:13] <wikibugs>	 (03Merged) 10jenkins-bot: Add missing selector to Deployment [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561935 (owner: 10BryanDavis)
[02:05:46] <wikibugs>	 (03PS1) 10BryanDavis: Add missing label to Deployment [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561936
[02:05:59] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Add missing label to Deployment [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561936 (owner: 10BryanDavis)
[02:06:21] <wikibugs>	 (03Merged) 10jenkins-bot: Add missing label to Deployment [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561936 (owner: 10BryanDavis)
[02:09:00] <wikibugs>	 (03PS1) 10BryanDavis: bin/jouncebot.sh: use /usr/bin/kubectl consistently [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561937
[02:09:26] <bd808>	 jouncebot: refresh
[02:09:26] <jouncebot>	 I refreshed my knowledge about deployments.
[02:10:05] <bd808>	 jouncebot: next
[02:10:06] <jouncebot>	 In 57 hour(s) and 19 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200106T1130)
[02:15:05] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] bin/jouncebot.sh: use /usr/bin/kubectl consistently [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561937 (owner: 10BryanDavis)
[02:15:27] <wikibugs>	 (03Merged) 10jenkins-bot: bin/jouncebot.sh: use /usr/bin/kubectl consistently [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561937 (owner: 10BryanDavis)
[03:30:05] <wikibugs>	 (03PS1) 10BryanDavis: Update for mwclient v0.10.0 Site constructor change [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561942
[03:30:07] <wikibugs>	 (03PS1) 10BryanDavis: Add Black and flake8 add-on lint checks [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561943
[03:30:42] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Update for mwclient v0.10.0 Site constructor change [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561942 (owner: 10BryanDavis)
[03:31:05] <wikibugs>	 (03Merged) 10jenkins-bot: Update for mwclient v0.10.0 Site constructor change [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561942 (owner: 10BryanDavis)
[03:31:58] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Add Black and flake8 add-on lint checks [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561943 (owner: 10BryanDavis)
[03:32:25] <wikibugs>	 (03Merged) 10jenkins-bot: Add Black and flake8 add-on lint checks [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561943 (owner: 10BryanDavis)
[03:33:38] <wikibugs>	 (03PS1) 10BryanDavis: Bump mwclient minimum version [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561944
[03:33:52] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Bump mwclient minimum version [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561944 (owner: 10BryanDavis)
[03:34:18] <wikibugs>	 (03Merged) 10jenkins-bot: Bump mwclient minimum version [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/561944 (owner: 10BryanDavis)
[07:14:24] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241873 (10ops-monitoring-bot)
[07:26:38] <wikibugs>	 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241873 (10Peachey88)
[08:24:54] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T228853 (10Andrew)
[08:24:55] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10Andrew)
[08:25:25] <wikibugs>	 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 (10Andrew)
[08:25:29] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10Andrew)
[08:25:34] <wikibugs>	 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241873 (10Andrew)
[08:25:37] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10Andrew)
[09:29:38] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and researchers for Aroraakhil - https://phabricator.wikimedia.org/T241096 (10Nuria) Approved on my end
[11:30:30] <wikibugs>	 (03CR) 10Phamhi: [C: 03+1] "LGTM" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561730 (owner: 10BryanDavis)
[11:31:56] <wikibugs>	 (03CR) 10Phamhi: [C: 03+2] toolforge: replace diamond redis monitoring with prometheus [puppet] - 10https://gerrit.wikimedia.org/r/561437 (https://phabricator.wikimedia.org/T210993) (owner: 10BryanDavis)
[11:33:25] <wikibugs>	 (03CR) 10Phamhi: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/561437 (https://phabricator.wikimedia.org/T210993) (owner: 10BryanDavis)
[11:33:51] <wikibugs>	 (03CR) 10Phamhi: [C: 03+2] support tools: Add script to rebuild all images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561730 (owner: 10BryanDavis)
[11:34:23] <wikibugs>	 (03Merged) 10jenkins-bot: support tools: Add script to rebuild all images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561730 (owner: 10BryanDavis)
[13:06:31] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 22330088 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:08:15] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 38453368 and 8 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:08:19] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 193592 and 42 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:10:03] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 38568 and 7 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:53:42] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241881 (10ops-monitoring-bot)
[14:26:01] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10ops-monitoring-bot)
[14:56:15] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm) I see   ` [Sat Jan  4 08:56:39 2020] megaraid_sas 0000:18:00.0: 155794 (631458161s/0x0004/CRIT) - Enclosure PD 20(c None/p1) phy bad for slot 4 ` in dmesg.  Checking some other things quick because...
[14:57:44] <wikibugs>	 (03PS1) 10Andrew Bogott: nova: depool cloudvirt1016 [puppet] - 10https://gerrit.wikimedia.org/r/561985 (https://phabricator.wikimedia.org/T241882)
[14:58:36] <wikibugs>	 (03PS2) 10Andrew Bogott: nova: depool cloudvirt1016 [puppet] - 10https://gerrit.wikimedia.org/r/561985 (https://phabricator.wikimedia.org/T241882)
[15:00:51] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm) This is the behavior that led to T216218 and then T230289  In fact, this is pretty much exactly the same as {T230289}.  Checking that the filesystem is still mounted ok.
[15:02:05] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova: depool cloudvirt1016 [puppet] - 10https://gerrit.wikimedia.org/r/561985 (https://phabricator.wikimedia.org/T241882) (owner: 10Andrew Bogott)
[15:03:23] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm) It seems like the filesystem is ok, but there are no hot spares at this point, so if it kicks 2 more disks out, it'll cause problems.  So far so good on that.
[15:06:42] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm) To be clear, I am relating this to T230289 is because it thinks the disks are //removed//, not failed.
[15:06:56] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm)
[15:06:59] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10Bstorm)
[15:07:50] <wikibugs>	 (03PS1) 10Andrew Bogott: Depool cloudvirt1024, raid controller issues [puppet] - 10https://gerrit.wikimedia.org/r/561987 (https://phabricator.wikimedia.org/T241884)
[15:08:13] <wikibugs>	 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241873 (10Bstorm)
[15:08:16] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm)
[15:09:39] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Depool cloudvirt1024, raid controller issues [puppet] - 10https://gerrit.wikimedia.org/r/561987 (https://phabricator.wikimedia.org/T241884) (owner: 10Andrew Bogott)
[15:12:02] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm) According to the "livecycle" logs in idrac, it had trouble communicating with the disks and then marked them removed.  Basically the same as before and again, 2 disks on the sa...
[15:18:17] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm)
[15:23:37] <wikibugs>	 (03CR) 10Bstorm: "There's a mistake in this patch.  I think the buster images were all "toolforge" in webservice, not just the sssd ones (https://github.com" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561730 (owner: 10BryanDavis)
[15:32:41] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Deprecate Diamond collectors in Cloud VPS - https://phabricator.wikimedia.org/T210993 (10bd808)
[15:41:50] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install labvirt101[5-8] - https://phabricator.wikimedia.org/T165531 (10aborrero)
[15:42:07] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241886 (10ops-monitoring-bot)
[15:44:20] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install/deploy labvirt1012 labvirt1013 labvirt1014 nodes (cloudvirt1012 cloudvirt1013 cloudvirt1014) - https://phabricator.wikimedia.org/T138509 (10aborrero)
[15:44:36] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install/deploy labvirt1012 labvirt1013 labvirt1014 nodes (cloudvirt1012 cloudvirt1013 cloudvirt1014) - https://phabricator.wikimedia.org/T138509 (10aborrero)
[16:18:57] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10bd808) p:05Triage→03High
[16:28:25] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241886 (10Bstorm)
[16:28:25] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm)
[16:34:22] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[16:34:22] <logmsgbot>	 !log aborrero@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[16:34:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:26] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[16:34:26] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:34:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:31:35] <wikibugs>	 (03PS1) 10BryanDavis: toolschecker: update k8s config reading [puppet] - 10https://gerrit.wikimedia.org/r/561996 (https://phabricator.wikimedia.org/T240923)
[18:33:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] toolschecker: update k8s config reading [puppet] - 10https://gerrit.wikimedia.org/r/561996 (https://phabricator.wikimedia.org/T240923) (owner: 10BryanDavis)
[18:38:08] <wikibugs>	 (03PS2) 10BryanDavis: toolschecker: update k8s config reading [puppet] - 10https://gerrit.wikimedia.org/r/561996 (https://phabricator.wikimedia.org/T240923)
[18:48:32] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] toolschecker: update k8s config reading [puppet] - 10https://gerrit.wikimedia.org/r/561996 (https://phabricator.wikimedia.org/T240923) (owner: 10BryanDavis)
[19:08:58] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 47330016 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[19:10:44] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 97624 and 96 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[19:11:01] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Bstorm) @aborrero When migrating cyberbot-db-01, the script died with  ` total size is 304,445,262,469  speedup is 1.00 wmcs-cold-migrate: INFO: cyber...
[19:11:07] <wikibugs>	 (03PS1) 10BryanDavis: toolschecker: check node ready status on new k8s cluster [puppet] - 10https://gerrit.wikimedia.org/r/562000
[20:00:49] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+1] GrowthExperiments: use local search in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561927 (https://phabricator.wikimedia.org/T235717) (owner: 10Gergő Tisza)
[20:04:48] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 24548336 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[20:06:36] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 19256 and 21 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[20:32:53] <wikibugs>	 10Puppet, 10Cloud-VPS: role::simplelamp fails to start mysql due to apparmor - https://phabricator.wikimedia.org/T128642 (10bd808)
[21:27:14] <wikibugs>	 10Puppet, 10Cloud-VPS: role::simplelamp takes ownership of all content in /etc/apache2/sites-enabled - https://phabricator.wikimedia.org/T169368 (10bd808)
[23:55:51] <wikibugs>	 10Operations, 10MediaWiki-Authentication-and-authorization, 10Security-Team, 10Traffic, 10Security: Investigate usefulness of SameSite cookies for logged-in accounts - https://phabricator.wikimedia.org/T158604 (10Tgr) Support is decent nowadays, with only some mobile browsers not recognizing it. (Related...