[00:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Evening SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T0000).
[00:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[00:03:58] <wikibugs>	 (03PS2) 10Dzahn: releases: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563548
[00:04:06] <wikibugs>	 (03PS4) 10Holger Knust: Migrate changeprop & cpjobqueue to kubernetes [deployment-charts] - 10https://gerrit.wikimedia.org/r/554576
[00:05:28] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/20331/releases1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/563548 (owner: 10Dzahn)
[00:08:21] <wikibugs>	 (03PS1) 10Nray: Temporarily turn off AmcOutreach until T242491 regression is resolved [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564157 (https://phabricator.wikimedia.org/T242491)
[00:09:37] <wikibugs>	 (03PS2) 10Dzahn: microsites: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563552
[00:11:31] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/20332/bromine.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/563552 (owner: 10Dzahn)
[00:14:46] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] codesearch: Install docker-ce from thirdparty/kubeadm-k8s component [puppet] - 10https://gerrit.wikimedia.org/r/563633 (owner: 10Legoktm)
[00:21:58] <wikibugs>	 (03PS1) 10Dzahn: codesearch: fix parameters of apt::package_from:component [puppet] - 10https://gerrit.wikimedia.org/r/564167
[00:27:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] codesearch: fix parameters of apt::package_from:component [puppet] - 10https://gerrit.wikimedia.org/r/564167 (owner: 10Dzahn)
[00:34:34] <wikibugs>	 (03Abandoned) 10Nray: Temporarily turn off AmcOutreach until T242491 regression is resolved [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564157 (https://phabricator.wikimedia.org/T242491) (owner: 10Nray)
[00:38:21] <wikibugs>	 10Operations, 10Design-Research, 10Domains, 10Traffic: Register wikipersonas.org and redirect URL - https://phabricator.wikimedia.org/T241944 (10Dzahn)
[00:43:34] <wikibugs>	 (03PS1) 10Dzahn: admin: upgrade Hugh Nowlan to root shell user (ops) [puppet] - 10https://gerrit.wikimedia.org/r/564171 (https://phabricator.wikimedia.org/T242309)
[00:45:25] <wikibugs>	 (03CR) 10Dzahn: "follow-up: https://gerrit.wikimedia.org/r/c/operations/puppet/+/564167" [puppet] - 10https://gerrit.wikimedia.org/r/563633 (owner: 10Legoktm)
[00:46:14] <wikibugs>	 (03CR) 10Dzahn: "E: Failed to fetch http://apt.wikimedia.org/wikimedia/dists/stretch-wikimedia/InRelease  Unable to find expected entry 'thirdparty/kubeadm" [puppet] - 10https://gerrit.wikimedia.org/r/563633 (owner: 10Legoktm)
[00:47:11] <wikibugs>	 (03CR) 10Bstorm: "Bryan noticed that individual jobs do not have the rerun bit set (per defaults).  This is correct." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/564095 (https://phabricator.wikimedia.org/T242397) (owner: 10Bstorm)
[00:47:34] <wikibugs>	 (03Abandoned) 10Bstorm: gridengine: Make webservices "not rerunable" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/564095 (https://phabricator.wikimedia.org/T242397) (owner: 10Bstorm)
[00:59:07] <wikibugs>	 (03CR) 10Dzahn: "works on buster without puppet errors now! you can go ahead and create a new buster instance in your cloud VPS project, then just apply "r" [puppet] - 10https://gerrit.wikimedia.org/r/563633 (owner: 10Legoktm)
[01:00:20] <mutante>	 legoktm: ready to create the new buster instance in "codesearch" project
[01:02:19] <mutante>	 then just click puppet config, apply "role::codesearch" as class and run puppet agent -tv  and it should have no errors
[01:03:11] <mutante>	 if it says something about not finding base_dir then put "profile::codesearch::base_dir: '/srv'" in the Hiera form. But it shouldn't because we already added it in the repo too.. (hmm)
[01:03:34] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.35.0-wmf.14/extensions/GrowthExperiments/: Various topic search-related cherry-picks (duration: 00m 57s)
[01:03:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:12:07] <wikibugs>	 (03PS1) 10Bstorm: gridengine: set the webgrid queues to not rerunable [puppet] - 10https://gerrit.wikimedia.org/r/564174 (https://phabricator.wikimedia.org/T242397)
[01:14:58] <Krinkle>	 mutante: the hieradata/labs patch is applied in beta, but puppet still fails the same way
[01:17:20] <wikibugs>	 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure: Puppet agent unable to run in Beta Cluster (Evaluation Error: Error while evaluating a Resource Statement) - https://phabricator.wikimedia.org/T242658 (10Krinkle) The above patch is live on the beta cluster puppet master:  `name=deployment-puppetmaster03...
[01:17:36] <mutante>	 Krinkle: ack, it's almost like ./project/common.yaml does not get applied 
[01:17:54] <wikibugs>	 (03PS1) 10Dzahn: define 2 API appservers per row in codfw as canary API appservers [puppet] - 10https://gerrit.wikimedia.org/r/564175 (https://phabricator.wikimedia.org/T242606)
[01:18:36] <Krinkle>	 mutante: I'm not familiar with that file. afaik deployment-prep.yaml is the highest level file relevant to beta
[01:19:40] <Krinkle>	 ah yeah, that's the one
[01:19:40] <mutante>	 it's 
[01:19:41] <mutante>	 hieradata/labs/deployment-prep/common.yaml
[01:19:45] <Krinkle>	 deployment-prep/common.yaml
[01:19:49] <Krinkle>	 yeah that should be fine
[01:20:04] <mutante>	 yea, and the value it is missing is in there
[01:20:20] <mutante>	 so right now i have no idea why it's still missing it
[01:20:59] <mutante>	 it is also the same as in prod hieradata/commom.yaml
[01:21:25] <mutante>	 on another project i also noticed something wasn't applied that was in $projectname/common.yaml 
[01:21:43] <mutante>	 but if that was the case we'd have other issues too
[01:22:50] <mutante>	 gotta stare at it again tomorrow, bbl
[01:23:41] <Krinkle>	 mutante: found it
[01:23:43] <Krinkle>	 https://horizon.wikimedia.org/project/puppet/
[01:23:50] <Krinkle>	 etcd_client_srv_domain: ''
[01:23:56] <wikibugs>	 (03PS5) 10EBernhardson: Perform weekly dumps of all public media urls [puppet] - 10https://gerrit.wikimedia.org/r/561356 (https://phabricator.wikimedia.org/T240520)
[01:24:08] <Krinkle>	 I guess someone put it there to fix an issue they may have seen with it being undefined
[01:24:11] <Krinkle>	 which obscured the issue
[01:24:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Perform weekly dumps of all public media urls [puppet] - 10https://gerrit.wikimedia.org/r/561356 (https://phabricator.wikimedia.org/T240520) (owner: 10EBernhardson)
[01:25:20] * Krinkle removes it
[01:25:50] <Krinkle>	 https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/HEAD/deployment-prep/_.yaml
[01:29:27] <wikibugs>	 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure: Puppet agent unable to run in Beta Cluster (Evaluation Error: Error while evaluating a Resource Statement) - https://phabricator.wikimedia.org/T242658 (10Krinkle) It didn't work because at the Horizon layer there was a project-level override for this Hie...
[01:29:40] <wikibugs>	 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure: Puppet agent unable to run in Beta Cluster (Evaluation Error: Error while evaluating a Resource Statement) - https://phabricator.wikimedia.org/T242658 (10Krinkle) 05Open→03Resolved a:03Krinkle Puppet agent now runs cleanly in Beta.
[01:29:45] <wikibugs>	 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure: Puppet agent unable to run in Beta Cluster (Evaluation Error: Error while evaluating a Resource Statement) - https://phabricator.wikimedia.org/T242658 (10Krinkle) a:05Krinkle→03Dzahn
[01:33:51] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+1] gridengine: set the webgrid queues to not rerunable [puppet] - 10https://gerrit.wikimedia.org/r/564174 (https://phabricator.wikimedia.org/T242397) (owner: 10Bstorm)
[01:37:41] <wikibugs>	 (03PS6) 10EBernhardson: Perform weekly dumps of all public media urls [puppet] - 10https://gerrit.wikimedia.org/r/561356 (https://phabricator.wikimedia.org/T240520)
[01:39:14] <legoktm>	 mutante: sweet, will try in a few :D
[01:39:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2098 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1080.66 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[01:44:51] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:46:41] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:00:02] <wikibugs>	 (03PS1) 10Papaul: DNS: Add mgmt and production DNS for logstash202[6-9] [dns] - 10https://gerrit.wikimedia.org/r/564181
[02:04:13] <wikibugs>	 (03CR) 10BryanDavis: k8s: Don't restart all k8s machinery to reboot a basic webservice (032 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563624 (https://phabricator.wikimedia.org/T228499) (owner: 10Bstorm)
[02:11:05] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: (Needed By 31st January) codfw: rack/setup/install  es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul)
[02:11:26] <wikibugs>	 (03PS1) 10Catrope: GrowthExperiments: Enable topic search, behind a hidden preference [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564183 (https://phabricator.wikimedia.org/T242698)
[02:23:19] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] "> Note that it does get through quite a few images before this" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563610 (owner: 10Bstorm)
[03:12:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2098 is OK: OK slave_sql_lag Replication lag: 0.42 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[03:44:58] <wikibugs>	 (03PS1) 10Andrew Bogott: Upgrade cloudservices nodes (designate) to OpenStack Pike [puppet] - 10https://gerrit.wikimedia.org/r/564280 (https://phabricator.wikimedia.org/T241348)
[03:48:25] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:55:39] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:08:21] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:17:27] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:26:31] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:27:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Upgrade cloudservices nodes (designate) to OpenStack Pike [puppet] - 10https://gerrit.wikimedia.org/r/564280 (https://phabricator.wikimedia.org/T241348) (owner: 10Andrew Bogott)
[04:30:53] <wikibugs>	 (03PS1) 10Andrew Bogott: Fix a VERY OBVIOUS typo setting the designate version [puppet] - 10https://gerrit.wikimedia.org/r/564331 (https://phabricator.wikimedia.org/T241348)
[04:31:57] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Fix a VERY OBVIOUS typo setting the designate version [puppet] - 10https://gerrit.wikimedia.org/r/564331 (https://phabricator.wikimedia.org/T241348) (owner: 10Andrew Bogott)
[04:39:17] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:42:55] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:56:15] <wikibugs>	 (03PS1) 10Andrew Bogott: designate: include designate-mdns package [puppet] - 10https://gerrit.wikimedia.org/r/564362
[04:56:56] <wikibugs>	 10Operations, 10cloud-services-team (Kanban): rack/setup/install cloudcontrol2001-dev & cloudvirt200[123]-dev - https://phabricator.wikimedia.org/T214448 (10Andrew) 05Resolved→03Open I think these are resolved now (I just reinstalled some packages; not sure what went wrong originally.)
[04:57:28] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] designate: include designate-mdns package [puppet] - 10https://gerrit.wikimedia.org/r/564362 (owner: 10Andrew Bogott)
[05:04:53] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:08:18] <wikibugs>	 (03PS1) 10Andrew Bogott: designate monitoring: allow for different python versions in service monitoring [puppet] - 10https://gerrit.wikimedia.org/r/564366 (https://phabricator.wikimedia.org/T241348)
[05:09:33] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] designate monitoring: allow for different python versions in service monitoring [puppet] - 10https://gerrit.wikimedia.org/r/564366 (https://phabricator.wikimedia.org/T241348) (owner: 10Andrew Bogott)
[05:10:19] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:12:35] <wikibugs>	 (03PS1) 10Andrew Bogott: nova monitoring: allow for different python versions in service [puppet] - 10https://gerrit.wikimedia.org/r/564373 (https://phabricator.wikimedia.org/T241347)
[05:29:05] <andrewbogott>	 !log rebooting cloudservices1004 to make sure all my upgrades are sustainable
[05:29:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:43] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:48:39] <wikibugs>	 10Operations: Sort out plan for install* servers in edge sites - https://phabricator.wikimedia.org/T242602 (10faidon) Splitting the internal apt repository from the install roles/servers sounds good -- it's more of a historical artifact than anything else. You probably know this already but do note that the inst...
[05:50:23] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:59:25] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:00:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1105:3312 after removing partitions from revision table', diff saved to https://phabricator.wikimedia.org/P10140 and previous config saved to /var/cache/conftool/dbconfig/20200114-060003-marostegui.json
[06:00:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10141 and previous config saved to /var/cache/conftool/dbconfig/20200114-060116-marostegui.json
[06:01:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:25] <stashbot>	 T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453
[06:01:33] <marostegui>	 !log Remove partitions from revision table on db1103:3312
[06:01:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:03:05] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:15:49] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:19:31] <marostegui>	 Looks like a spike of https://phabricator.wikimedia.org/T242437
[06:20:03] <marostegui>	 !log Deploy schema change on s3 master for officewiki and techconductwiki T242688
[06:20:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:20:10] <stashbot>	 T242688: Extend flow_ext_ref.ref_src_wiki - https://phabricator.wikimedia.org/T242688
[06:23:22] <marostegui>	 !log Deploy schema change on labswiki (wikitech) T242688
[06:23:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:25:48] <marostegui>	 !log Deploy schema change on flowdb (x1) directly on the master T242688
[06:25:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:25:51] <stashbot>	 T242688: Extend flow_ext_ref.ref_src_wiki - https://phabricator.wikimedia.org/T242688
[06:26:51] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:28:49] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:28:51] <icinga-wm>	 PROBLEM - puppet last run on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:28:55] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:28:57] <icinga-wm>	 PROBLEM - configured eth on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:28:59] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:28:59] <icinga-wm>	 PROBLEM - Disk space on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2009&var-datasource=codfw+prometheus/ops
[06:29:01] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:29:01] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:29:01] <icinga-wm>	 PROBLEM - Check systemd state on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:01] <icinga-wm>	 PROBLEM - dhclient process on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:29:05] <icinga-wm>	 PROBLEM - Check systemd state on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:07] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:29:07] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:29:09] <icinga-wm>	 PROBLEM - DPKG on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:29:13] <icinga-wm>	 PROBLEM - Check systemd state on ores2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:17] <icinga-wm>	 PROBLEM - Check systemd state on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:17] <icinga-wm>	 PROBLEM - Check systemd state on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:19] <icinga-wm>	 PROBLEM - dhclient process on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:29:21] <icinga-wm>	 PROBLEM - configured eth on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:29:23] <icinga-wm>	 PROBLEM - MD RAID on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:29:25] <icinga-wm>	 PROBLEM - Disk space on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2005&var-datasource=codfw+prometheus/ops
[06:29:27] <icinga-wm>	 PROBLEM - puppet last run on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:29:29] <icinga-wm>	 PROBLEM - Disk space on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2006&var-datasource=codfw+prometheus/ops
[06:29:31] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:29:31] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:29:33] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:29:35] <icinga-wm>	 PROBLEM - configured eth on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:29:39] <icinga-wm>	 PROBLEM - MD RAID on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:29:39] <icinga-wm>	 PROBLEM - MD RAID on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:29:41] <icinga-wm>	 PROBLEM - Check systemd state on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:45] <icinga-wm>	 PROBLEM - MD RAID on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:29:45] <icinga-wm>	 PROBLEM - configured eth on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:29:53] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:29:55] <icinga-wm>	 PROBLEM - configured eth on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:29:57] <icinga-wm>	 PROBLEM - configured eth on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:29:57] <icinga-wm>	 PROBLEM - Check systemd state on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:57] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:30:03] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:30:05] <icinga-wm>	 PROBLEM - Check systemd state on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:30:07] <icinga-wm>	 PROBLEM - MD RAID on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:30:07] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:30:07] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:30:11] <icinga-wm>	 PROBLEM - Disk space on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2008&var-datasource=codfw+prometheus/ops
[06:30:11] <icinga-wm>	 PROBLEM - dhclient process on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:30:13] <icinga-wm>	 PROBLEM - Disk space on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2002&var-datasource=codfw+prometheus/ops
[06:30:13] <icinga-wm>	 PROBLEM - DPKG on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:30:15] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:30:17] <icinga-wm>	 PROBLEM - DPKG on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:30:21] <icinga-wm>	 PROBLEM - dhclient process on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:30:23] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:30:23] <icinga-wm>	 PROBLEM - MD RAID on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:30:29] <icinga-wm>	 PROBLEM - configured eth on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:30:31] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:30:33] <icinga-wm>	 PROBLEM - dhclient process on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:30:33] <icinga-wm>	 PROBLEM - Disk space on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2003&var-datasource=codfw+prometheus/ops
[06:30:33] <icinga-wm>	 PROBLEM - Disk space on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2001&var-datasource=codfw+prometheus/ops
[06:30:35] <icinga-wm>	 PROBLEM - dhclient process on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:30:37] <marostegui>	 errrr?
[06:30:37] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:30:37] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:30:37] <icinga-wm>	 PROBLEM - DPKG on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:30:39] <marostegui>	 what's going on?
[06:30:43] <icinga-wm>	 PROBLEM - DPKG on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:30:43] <icinga-wm>	 PROBLEM - DPKG on ores2003 is CRITICAL: connect to address 10.192.16.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:30:49] <icinga-wm>	 RECOVERY - configured eth on ores2006 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:30:57] <icinga-wm>	 RECOVERY - Check systemd state on ores2006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:30:57] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2006 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:31:03] <icinga-wm>	 PROBLEM - puppet last run on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:31:19] <icinga-wm>	 RECOVERY - Disk space on ores2006 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2006&var-datasource=codfw+prometheus/ops
[06:31:47] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP
[06:31:55] <icinga-wm>	 RECOVERY - Check systemd state on ores2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:32:01] <icinga-wm>	 RECOVERY - Disk space on ores2002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2002&var-datasource=codfw+prometheus/ops
[06:32:05] <icinga-wm>	 PROBLEM - puppet last run on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:32:05] <icinga-wm>	 RECOVERY - DPKG on ores2002 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:32:09] <icinga-wm>	 PROBLEM - puppet last run on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:32:21] <wikibugs>	 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10User-DannyS712: Pages whose title ends with semicolon (;) are intermittently inaccessible - https://phabricator.wikimedia.org/T238285 (10DannyS712)
[06:32:23] <icinga-wm>	 RECOVERY - dhclient process on ores2002 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:32:27] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2002 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:32:39] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2002 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:33:19] <icinga-wm>	 RECOVERY - MD RAID on ores2002 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:33:33] <icinga-wm>	 RECOVERY - configured eth on ores2002 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:33:45] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2008 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:33:47] <icinga-wm>	 RECOVERY - Disk space on ores2008 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2008&var-datasource=codfw+prometheus/ops
[06:33:57] <icinga-wm>	 RECOVERY - dhclient process on ores2008 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:34:15] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2008 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:34:19] <icinga-wm>	 RECOVERY - DPKG on ores2008 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:34:25] <icinga-wm>	 RECOVERY - Disk space on ores2009 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2009&var-datasource=codfw+prometheus/ops
[06:34:47] <icinga-wm>	 RECOVERY - dhclient process on ores2009 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:34:49] <icinga-wm>	 RECOVERY - configured eth on ores2008 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:34:49] <icinga-wm>	 RECOVERY - MD RAID on ores2008 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:35:07] <icinga-wm>	 RECOVERY - MD RAID on ores2009 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:35:09] <icinga-wm>	 RECOVERY - Check systemd state on ores2008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:35:13] <icinga-wm>	 RECOVERY - configured eth on ores2009 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:35:19] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2009 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:35:23] <icinga-wm>	 RECOVERY - Check systemd state on ores2009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:35:25] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2009 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:35:41] <icinga-wm>	 RECOVERY - DPKG on ores2009 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:36:51] <icinga-wm>	 RECOVERY - puppet last run on ores2009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:37:59] <icinga-wm>	 RECOVERY - puppet last run on ores2008 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:39:00] <icinga-wm>	 RECOVERY - configured eth on ores2003 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:39:11] <icinga-wm>	 RECOVERY - MD RAID on ores2003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:39:35] <icinga-wm>	 RECOVERY - dhclient process on ores2003 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:39:37] <icinga-wm>	 RECOVERY - Disk space on ores2003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2003&var-datasource=codfw+prometheus/ops
[06:39:41] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2003 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:39:47] <icinga-wm>	 RECOVERY - DPKG on ores2003 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:39:55] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2005 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:39:55] <icinga-wm>	 RECOVERY - dhclient process on ores2005 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:40:11] <icinga-wm>	 RECOVERY - Check systemd state on ores2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:40:11] <icinga-wm>	 RECOVERY - Check systemd state on ores2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:40:17] <icinga-wm>	 RECOVERY - Disk space on ores2005 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2005&var-datasource=codfw+prometheus/ops
[06:40:25] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2003 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:40:25] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2005 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:40:29] <icinga-wm>	 RECOVERY - puppet last run on ores2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:40:29] <icinga-wm>	 RECOVERY - configured eth on ores2005 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:40:39] <icinga-wm>	 RECOVERY - MD RAID on ores2005 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:41:17] <icinga-wm>	 RECOVERY - MD RAID on ores2001 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:41:21] <icinga-wm>	 RECOVERY - configured eth on ores2001 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:41:27] <icinga-wm>	 RECOVERY - Disk space on ores2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2001&var-datasource=codfw+prometheus/ops
[06:41:29] <icinga-wm>	 RECOVERY - DPKG on ores2005 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:41:43] <icinga-wm>	 RECOVERY - Check systemd state on ores2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:41:49] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2001 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:41:51] <icinga-wm>	 RECOVERY - DPKG on ores2001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:42:11] <icinga-wm>	 PROBLEM - ores_workers_running on ores2001 is CRITICAL: PROCS CRITICAL: 66 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[06:42:49] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:42:53] <icinga-wm>	 PROBLEM - ores_workers_running on ores2004 is CRITICAL: PROCS CRITICAL: 0 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[06:42:53] <icinga-wm>	 RECOVERY - dhclient process on ores2001 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:43:43] <icinga-wm>	 RECOVERY - puppet last run on ores2005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:44:01] <icinga-wm>	 RECOVERY - ores_workers_running on ores2001 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[06:46:53] <icinga-wm>	 RECOVERY - puppet last run on ores2001 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:48:34] <wikibugs>	 (03PS1) 10Marostegui: install_server: Allow reimage of db1107 [puppet] - 10https://gerrit.wikimedia.org/r/564445
[06:48:41] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:49:54] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Allow reimage of db1107 [puppet] - 10https://gerrit.wikimedia.org/r/564445 (owner: 10Marostegui)
[06:51:13] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) p:05Triage→03High
[06:51:40] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: (Needed by 31st January) eqiad: rack/setup/install  es102[0-5].eqiad.wmnet - https://phabricator.wikimedia.org/T241359 (10Marostegui) p:05Triage→03Normal
[06:52:19] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:54:43] <icinga-wm>	 RECOVERY - Check systemd state on ores2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:55:37] <icinga-wm>	 RECOVERY - ores_workers_running on ores2004 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[07:02:33] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on ores2005 is OK: OK: synced at Tue 2020-01-14 07:02:32 UTC. https://wikitech.wikimedia.org/wiki/NTP
[07:06:45] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:08:13] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:08:25] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 54, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:15:27] <elukey>	 this is Telia's transport -^
[07:15:47] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:16:02] <elukey>	 there is a planned outage about it, all good
[07:19:07] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:19:19] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 56, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:21:15] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:25:39] <wikibugs>	 (03PS1) 10Legoktm: codesearch: Ensure /srv/hound is writable by codesearch user [puppet] - 10https://gerrit.wikimedia.org/r/564466 (https://phabricator.wikimedia.org/T242319)
[07:26:23] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:26:35] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 54, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:31:14] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team: Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10elukey) p:05Triage→03High
[07:31:27] <elukey>	 marostegui: --^ if you want to add more info
[07:33:02] <XioNoX>	 !log add peering to AS26744 in eqiad, eqord and eqdfw
[07:33:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:40:59] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:41:11] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 56, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:43:03] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:46:02] <wikibugs>	 10Puppet, 10VPS-project-codesearch, 10Patch-For-Review: Puppetize codesearch - https://phabricator.wikimedia.org/T242319 (10Legoktm) The codesearch5 instance is now running Debian Buster plus the `role::codesearch` puppet role with lots of help from @Dzahn   Remaining todos: * Make /srv/hound writable by cod...
[07:55:53] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:59:29] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:02:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] DNS: Add mgmt and production DNS for logstash202[6-9] [dns] - 10https://gerrit.wikimedia.org/r/564181 (owner: 10Papaul)
[08:05:16] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, although please add tests using existing examples in modules/mtail" [puppet] - 10https://gerrit.wikimedia.org/r/564129 (https://phabricator.wikimedia.org/T236505) (owner: 10Cwhite)
[08:13:05] <wikibugs>	 (03PS6) 10Joal: Add mediawiki-history-dumps rsync to labstore [puppet] - 10https://gerrit.wikimedia.org/r/564066
[08:27:56] <wikibugs>	 (03PS1) 10Muehlenhoff: Add dpifke to exception list, uses Yubikey backed key [puppet] - 10https://gerrit.wikimedia.org/r/564524
[08:29:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add dpifke to exception list, uses Yubikey backed key [puppet] - 10https://gerrit.wikimedia.org/r/564524 (owner: 10Muehlenhoff)
[08:29:50] <wikibugs>	 10Operations: Anycast for webproxies - https://phabricator.wikimedia.org/T242715 (10ayounsi) p:05Triage→03Normal
[08:31:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove debug proxy roles/classes [puppet] - 10https://gerrit.wikimedia.org/r/564044 (https://phabricator.wikimedia.org/T224567) (owner: 10Muehlenhoff)
[08:32:23] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:38:08] <wikibugs>	 10Operations: Anycast for webproxies - https://phabricator.wikimedia.org/T242715 (10Joe) One problem I see with this is - proxy IPs regularly get banned by third-party services by accident. So having multiple *external* IPs, and being able to switch between them, is a plus.  I think you're right that the proxies...
[08:38:11] <Amir1>	 bblack: very valid question, the documentation says it can go at any time before the wiki creation. 
[08:38:30] <Amir1>	 but I can't say 100% it would break anything if you merge the patch and I hit ng.wikimedia.org
[08:39:15] <_joe_>	 Amir1: if you still haven't configured apache and/or mediawiki, that would cause nothing
[08:39:29] <wikibugs>	 (03CR) 10Elukey: "Now it renders as:" [puppet] - 10https://gerrit.wikimedia.org/r/564066 (owner: 10Joal)
[08:40:25] <Amir1>	 _joe_: oh thanks
[08:41:13] <_joe_>	 I think ng.wikimedia.org matches the virtualhost for *.wikimedia.org
[08:41:48] <_joe_>	 so it goes to the www portal
[08:41:51] <Amir1>	 The DNS record exists already https://phabricator.wikimedia.org/diffusion/ODNS/browse/master/templates/wikimedia.org
[08:41:56] <_joe_>	 https://ng.wikimedia.org/ <- it works
[08:42:00] <Amir1>	 https://phabricator.wikimedia.org/diffusion/ODNS/browse/master/templates/wikimedia.org$764
[08:42:19] <_joe_>	 ok so the only problem is that now the caches have something about that site memorized
[08:42:35] <_joe_>	 we might need to purge them once the setup is done
[08:50:32] <wikibugs>	 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10ayounsi) a:05ayounsi→03BBlack That sounds like a good idea to me, @BBlack for a final opinion, and I can take care of it this Q if good to go.
[09:03:08] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch conf/codfw and notebook* servers to standard Partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/564550 (https://phabricator.wikimedia.org/T156955)
[09:04:24] <wikibugs>	 (03CR) 10Urbanecm: "Is this necessary? I've already enabled partial blocks at enwiki per T242569, and given also commons asked for partial blocks itself, they" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564121 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[09:06:43] <wikibugs>	 10Operations, 10netops: Stale LibreNMS ports - https://phabricator.wikimedia.org/T242318 (10ayounsi) 05Open→03Resolved `root@cumin1001:~# for i in `mysql.py -hdb1135 -e "select table_name from information_schema.columns where column_name like 'device_id'" -BN`; do echo $i; mysql.py -hdb1135 librenms -e "de...
[09:10:45] <wikibugs>	 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10serviceops, 10Patch-For-Review: Remove obsoleted docker images - https://phabricator.wikimedia.org/T242604 (10MoritzMuehlenhoff) p:05Triage→03Normal
[09:12:15] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:14:17] <Amir1>	 quickly going to deploy this: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/564555
[09:15:51] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:35:28] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Switch conf/codfw and notebook* servers to standard Partman recipe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564550 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[09:38:08] <wikibugs>	 (03PS1) 10Elukey: Increase Spark's crypto settings in Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/564562 (https://phabricator.wikimedia.org/T240934)
[09:40:15] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Switch conf/codfw and notebook* servers to standard Partman recipe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564550 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[09:40:25] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Increase Spark's crypto settings in Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/564562 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey)
[09:43:10] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10jijiki) @Jclark-ctr Can you provide a date that is convenient for you for racking these? Thank you!
[09:44:04] <wikibugs>	 (03PS2) 10Filippo Giunchedi: ores: ship to logstash via the kafka logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/502527 (https://phabricator.wikimedia.org/T213899) (owner: 10Herron)
[09:44:19] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.14/extensions/Wikibase/lib/includes/Store/Sql/Terms: [[gerrit:564555|wbterms: Add Statsd metrics in critical parts of the new term store]] (duration: 00m 57s)
[09:44:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:07] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1003/20335/ores1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/502527 (https://phabricator.wikimedia.org/T213899) (owner: 10Herron)
[09:46:50] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch conf/codfw and notebook* servers to standard Partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/564550 (https://phabricator.wikimedia.org/T156955)
[09:47:09] <wikibugs>	 (03CR) 10Muehlenhoff: Switch conf/codfw and notebook* servers to standard Partman recipe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564550 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[09:50:17] <wikibugs>	 (03PS1) 10Ayounsi: esams/knams: advertise 185.15.58.0/23 instead of 185.15.56.0/22 [homer/public] - 10https://gerrit.wikimedia.org/r/564564 (https://phabricator.wikimedia.org/T207753)
[09:50:45] <godog>	 akosiaris: I think the ores logging change is good to go, re: deployment I was thinking puppet-merge then https://wikitech.wikimedia.org/wiki/ORES/Deployment#Puppet-managed_config_changes
[09:51:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Switch conf/codfw and notebook* servers to standard Partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/564550 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[09:53:11] <wikibugs>	 (03CR) 10Ayounsi: "Diff for 3 devices: ['cr2-esams.wikimedia.org', 'cr3-esams.wikimedia.org', 'cr3-knams.wikimedia.org']" [homer/public] - 10https://gerrit.wikimedia.org/r/564564 (https://phabricator.wikimedia.org/T207753) (owner: 10Ayounsi)
[09:54:32] <wikibugs>	 (03Abandoned) 10Ayounsi: Depool esams for esams/knams work [dns] - 10https://gerrit.wikimedia.org/r/552792 (owner: 10Ayounsi)
[10:06:29] <Amir1>	 awight: Hey, I see lots of Cite-related fatals and errors in logs, is it on the radar? https://logstash.wikimedia.org/app/kibana#/dashboard/mediawiki-errors https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor
[10:07:09] <Amir1>	 oh we have this: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Cite/+/564002
[10:07:23] <moritzm>	 !log installing remaining cyrus-sasl security updates
[10:07:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:38] <wikibugs>	 (03PS1) 10Vgutierrez: Release 8.0.5-1wm12 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/564584 (https://phabricator.wikimedia.org/T242620)
[10:09:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Release 8.0.5-1wm12 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/564584 (https://phabricator.wikimedia.org/T242620) (owner: 10Vgutierrez)
[10:09:54] <vgutierrez>	 that was fast :/
[10:10:12] <_joe_>	 yeah we added a condition to jerkins-bot
[10:10:28] <_joe_>	 if ZUUL_SUBMITTER == "vgutierrez" fail
[10:10:37] <vgutierrez>	 oh cool
[10:10:44] <vgutierrez>	 /nick _joe_
[10:12:37] <_joe_>	 lol
[10:13:42] <vgutierrez>	 O:)
[10:21:47] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 04-1] esams/knams: advertise 185.15.58.0/23 instead of 185.15.56.0/22 (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/564564 (https://phabricator.wikimedia.org/T207753) (owner: 10Ayounsi)
[10:24:56] <wikibugs>	 (03PS2) 10Ayounsi: esams/knams: advertise 185.15.58.0/23 instead of 185.15.56.0/22 [homer/public] - 10https://gerrit.wikimedia.org/r/564564 (https://phabricator.wikimedia.org/T207753)
[10:25:15] <wikibugs>	 (03CR) 10Ayounsi: esams/knams: advertise 185.15.58.0/23 instead of 185.15.56.0/22 (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/564564 (https://phabricator.wikimedia.org/T207753) (owner: 10Ayounsi)
[10:27:41] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10observability, 10User-fgiunchedi: Move thumbor to the logging pipeline - https://phabricator.wikimedia.org/T242609 (10MoritzMuehlenhoff) p:05Triage→03Normal
[10:27:57] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10MoritzMuehlenhoff) p:05Triage→03Normal
[10:40:08] <vgutierrez>	 !log upgrade ats to 8.0.5-1wm12 in cp4026 and cp4032 - T242620
[10:40:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:11] <stashbot>	 T242620: ats-tls is having issues when varnish-fe goes away - https://phabricator.wikimedia.org/T242620
[10:51:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch url-downloader.codfw to urldownloader2001 [dns] - 10https://gerrit.wikimedia.org/r/564588 (https://phabricator.wikimedia.org/T224551)
[10:52:50] <wikibugs>	 10Operations, 10Mail: MediaWiki mail system for watchlist on it.wikipedia is delivering very slowly - https://phabricator.wikimedia.org/T240800 (10Mholloway)
[10:52:52] <wikibugs>	 10Operations, 10MassMessage, 10User-DannyS712: MassMessage not delivering - https://phabricator.wikimedia.org/T240777 (10Mholloway)
[10:52:55] <wikibugs>	 10Operations, 10Machine vision, 10Product-Infrastructure-Team-Backlog, 10Structured-Data-Backlog, and 5 others: Some jobs are not being processed / are processed slowly - https://phabricator.wikimedia.org/T240518 (10Mholloway) 05Open→03Resolved a:03Mholloway Incident report is in review.
[10:54:33] <wikibugs>	 (03CR) 10Vgutierrez: "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1002/20325/" [puppet] - 10https://gerrit.wikimedia.org/r/564046 (https://phabricator.wikimedia.org/T238900) (owner: 10Vgutierrez)
[10:54:57] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] redirects.dat: Funnel fixcopyright.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/564020 (https://phabricator.wikimedia.org/T239141) (owner: 10Vgutierrez)
[11:00:46] <wikibugs>	 (03PS1) 10Vgutierrez: lvs: Set realserver_ips on ncredir ulsfo instances [puppet] - 10https://gerrit.wikimedia.org/r/564598 (https://phabricator.wikimedia.org/T242321)
[11:04:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks great!" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564138 (owner: 10Volans)
[11:05:57] <wikibugs>	 (03PS1) 10Vgutierrez: lvs: Add ulsfo ncredir configuration [puppet] - 10https://gerrit.wikimedia.org/r/564603 (https://phabricator.wikimedia.org/T242321)
[11:09:01] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 106385384 and 12 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[11:09:11] <wikibugs>	 (03CR) 10Volans: [C: 03+2] binary packages: optimize queries (part 2) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564138 (owner: 10Volans)
[11:09:13] <wikibugs>	 (03CR) 10Ema: [C: 03+1] Add ncredir-lb.ulsfo.wikimedia.org DNS records [dns] - 10https://gerrit.wikimedia.org/r/564051 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:09:50] <wikibugs>	 (03CR) 10Ema: "pcc output would be good!" [puppet] - 10https://gerrit.wikimedia.org/r/564598 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:10:41] <wikibugs>	 (03CR) 10Ema: "pcc would be great here too" [puppet] - 10https://gerrit.wikimedia.org/r/564603 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:10:51] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2312 and 97 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[11:11:18] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: ats-tls is having issues when varnish-fe goes away - https://phabricator.wikimedia.org/T242620 (10ema) p:05Triage→03High
[11:11:22] <wikibugs>	 (03Merged) 10jenkins-bot: binary packages: optimize queries (part 2) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564138 (owner: 10Volans)
[11:13:41] <wikibugs>	 (03CR) 10Tchanders: "Urbanecm - the banner is now a post-deployment banner (see the final bullet points in the description of T240300)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564121 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[11:15:20] <vgutierrez>	 !log Updating puppet-compiler facts
[11:15:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:16:27] <wikibugs>	 (03PS1) 10Volans: host packages: optimize table [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564614
[11:17:56] <wikibugs>	 (03CR) 10Volans: "I've tested this on the codfw slave of m2 and has the expected effect." [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564614 (owner: 10Volans)
[11:24:04] <wikibugs>	 (03PS2) 10Matthias Mullie: Add 3d-patents page to wgForceUIMsgAsContentMsg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416730
[11:25:17] <wikibugs>	 (03Abandoned) 10Matthias Mullie: [WIP] Add 3d2png scap targets [puppet] - 10https://gerrit.wikimedia.org/r/406997 (owner: 10Matthias Mullie)
[11:28:36] <wikibugs>	 (03CR) 10Vgutierrez: "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1002/20338/" [puppet] - 10https://gerrit.wikimedia.org/r/564598 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:30:24] <wikibugs>	 (03CR) 10Vgutierrez: "pcc is happy here as well: https://puppet-compiler.wmflabs.org/compiler1001/20339/" [puppet] - 10https://gerrit.wikimedia.org/r/564603 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:31:29] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (Need By: Jan 10) rack/setup/install mc-gp100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T241795 (10elukey) Hi @Jclark-ctr, any timeline for these hosts to be racked?
[11:34:41] <wikibugs>	 (03PS2) 10Volans: hosts/images packages: optimize tables [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564614
[11:37:12] <wikibugs>	 (03PS2) 10Vgutierrez: Add ncredir-lb.ulsfo.wikimedia.org DNS records [dns] - 10https://gerrit.wikimedia.org/r/564051 (https://phabricator.wikimedia.org/T242321)
[11:37:47] <wikibugs>	 (03PS1) 10Ema: cache: raise vm.max_map_count sysctl [puppet] - 10https://gerrit.wikimedia.org/r/564616 (https://phabricator.wikimedia.org/T242417)
[11:39:06] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Add ncredir-lb.ulsfo.wikimedia.org DNS records [dns] - 10https://gerrit.wikimedia.org/r/564051 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:40:03] <wikibugs>	 (03CR) 10Ema: [C: 03+1] lvs: Set realserver_ips on ncredir ulsfo instances [puppet] - 10https://gerrit.wikimedia.org/r/564598 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:40:26] <wikibugs>	 (03CR) 10Ema: [C: 03+1] lvs: Add ulsfo ncredir configuration [puppet] - 10https://gerrit.wikimedia.org/r/564603 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:41:01] <wikibugs>	 (03CR) 10Volans: "Once applied to the test instance in cloud the relevant executed queries were:" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564614 (owner: 10Volans)
[11:41:39] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] lvs: Set realserver_ips on ncredir ulsfo instances [puppet] - 10https://gerrit.wikimedia.org/r/564598 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:42:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564614 (owner: 10Volans)
[11:42:53] <wikibugs>	 (03CR) 10Ema: "pcc here: https://puppet-compiler.wmflabs.org/compiler1002/20340/" [puppet] - 10https://gerrit.wikimedia.org/r/564616 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema)
[11:46:52] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] lvs: Add ulsfo ncredir configuration [puppet] - 10https://gerrit.wikimedia.org/r/564603 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez)
[11:47:58] <logmsgbot>	 !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: service=nginx,name=ncredir4001.ulsfo.wmnet
[11:47:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:48:04] <logmsgbot>	 !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: service=nginx,name=ncredir4002.ulsfo.wmnet
[11:48:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:49:22] <vgutierrez>	 !log restarting pybal on lvs4007 (secondary LVS) - T242321
[11:49:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:49:25] <stashbot>	 T242321: Provide non-canonical-redirect service from every datacenter - https://phabricator.wikimedia.org/T242321
[11:49:51] <wikibugs>	 (03PS1) 10Jbond: puppet-compiler: fix double owner definition [puppet] - 10https://gerrit.wikimedia.org/r/564617
[11:51:52] <vgutierrez>	 !log restarting pybal on lvs4005 (high-traffic1 LVS) - T242321
[11:51:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:28] <vgutierrez>	 willikins:~ vgutierrez$ curl --resolve en.wikipedia.com:443:$(dig +short ncredir-lb.ulsfo.wikimedia.org) https://en.wikipedia.com -o /dev/null -v 2>&1 |grep location:
[11:53:28] <vgutierrez>	 < location: https://en.wikipedia.org/
[11:53:29] <vgutierrez>	 :D
[11:53:53] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/564617 (owner: 10Jbond)
[11:54:36] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet-compiler: fix double owner definition [puppet] - 10https://gerrit.wikimedia.org/r/564617 (owner: 10Jbond)
[11:57:55] <wikibugs>	 (03PS1) 10Vgutierrez: Pool ulsfo for ncredir service [dns] - 10https://gerrit.wikimedia.org/r/564627 (https://phabricator.wikimedia.org/T242321)
[12:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T1200).
[12:00:04] <jouncebot>	 awight: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:39] <wikibugs>	 10Operations, 10Traffic, 10fixcopyright.wikimedia.org: Redirect all traffic for fixcopyright.wikimedia.org to https://policy.wikimedia.org/policy-landing/copyright/ - https://phabricator.wikimedia.org/T239141 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez ` vgutierrez@mw1321:~$ curl --resolve fixcopyr...
[12:00:54] <wikibugs>	 10Operations, 10Cleanup, 10Traffic, 10fixcopyright.wikimedia.org, and 4 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Vgutierrez)
[12:01:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "That seems sensible. Elasticsearch is also bumping this sysctl via it's init script and Cassandra hosts also raise the default via the cas" [puppet] - 10https://gerrit.wikimedia.org/r/564616 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema)
[12:01:32] * Urbanecm around, but leaving awight to do his own SWAT
[12:01:32] <Lucas_WMDE>	 o/
[12:01:53] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[12:01:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:15] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[12:02:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:23] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[12:02:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:29] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[12:02:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:03:01] <wikibugs>	 (03PS1) 10Jbond: add default ops for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/564636
[12:04:10] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] add default ops for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/564636 (owner: 10Jbond)
[12:10:27] <wikibugs>	 (03PS1) 10Jbond: profile::puppetdb fix jvm_opts in labs.yaml [puppet] - 10https://gerrit.wikimedia.org/r/564647
[12:11:20] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] profile::puppetdb fix jvm_opts in labs.yaml [puppet] - 10https://gerrit.wikimedia.org/r/564647 (owner: 10Jbond)
[12:11:46] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] cache: raise vm.max_map_count sysctl [puppet] - 10https://gerrit.wikimedia.org/r/564616 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema)
[12:13:03] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] nova monitoring: allow for different python versions in service [puppet] - 10https://gerrit.wikimedia.org/r/564373 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[12:14:51] <logmsgbot>	 !log vgutierrez@puppetmaster1001 conftool action : set/weight=1; selector: service=nginx,name=ncredir4002.ulsfo.wmnet
[12:14:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:58] <logmsgbot>	 !log vgutierrez@puppetmaster1001 conftool action : set/weight=1; selector: service=nginx,name=ncredir4001.ulsfo.wmnet
[12:14:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:16:25] <logmsgbot>	 !log vgutierrez@puppetmaster1001 conftool action : set/weight=1; selector: service=nginx,name=ncredir3001.esams.wmnet
[12:16:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:16:30] <logmsgbot>	 !log vgutierrez@puppetmaster1001 conftool action : set/weight=1; selector: service=nginx,name=ncredir3002.esams.wmnet
[12:16:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:28] <wikibugs>	 (03PS1) 10Jbond: profile::puppetdb add defaults to labs.yaml [puppet] - 10https://gerrit.wikimedia.org/r/564652
[12:21:11] <wikibugs>	 (03PS1) 10Vgutierrez: Add ncredir500[12] DNS records [dns] - 10https://gerrit.wikimedia.org/r/564655 (https://phabricator.wikimedia.org/T242321)
[12:21:34] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] haproxy for neutron:  As of pike, the healthcheck url returns 405. [puppet] - 10https://gerrit.wikimedia.org/r/561806 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[12:22:03] <wikibugs>	 (03PS1) 10Andrew Bogott: Horizon: put in maintenance mode for the ocata=>pike upgrade [puppet] - 10https://gerrit.wikimedia.org/r/564656 (https://phabricator.wikimedia.org/T241347)
[12:22:05] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "PCC no-op:" [puppet] - 10https://gerrit.wikimedia.org/r/564652 (owner: 10Jbond)
[12:22:07] <wikibugs>	 (03PS1) 10Andrew Bogott: Openstack: move eqiad1 to version 'pike' [puppet] - 10https://gerrit.wikimedia.org/r/564657 (https://phabricator.wikimedia.org/T241347)
[12:22:09] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "Horizon: put in maintenance mode for the ocata=>pike upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/564658 (https://phabricator.wikimedia.org/T241347)
[12:24:46] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Horizon: put in maintenance mode for the ocata=>pike upgrade [puppet] - 10https://gerrit.wikimedia.org/r/564656 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[12:24:53] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Openstack: move eqiad1 to version 'pike' [puppet] - 10https://gerrit.wikimedia.org/r/564657 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[12:25:02] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Horizon: put in maintenance mode for the ocata=>pike upgrade [puppet] - 10https://gerrit.wikimedia.org/r/564656 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[12:28:12] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Openstack: move eqiad1 to version 'pike' [puppet] - 10https://gerrit.wikimedia.org/r/564657 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[12:31:45] <wikibugs>	 (03PS1) 10Andrew Bogott: Remove hieradata/common/openstack.yaml [puppet] - 10https://gerrit.wikimedia.org/r/564662
[12:31:50] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] haproxy for neutron:  As of pike, the healthcheck url returns 405. [puppet] - 10https://gerrit.wikimedia.org/r/561806 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[12:36:38] <wikibugs>	 (03PS1) 10Jbond: profile::puppetdb::microsite add default for labs [puppet] - 10https://gerrit.wikimedia.org/r/564666
[12:37:35] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] profile::puppetdb::microsite add default for labs [puppet] - 10https://gerrit.wikimedia.org/r/564666 (owner: 10Jbond)
[12:43:55] <wikibugs>	 (03PS1) 10Jbond: profile::puppet_compiler fix call to conftool [puppet] - 10https://gerrit.wikimedia.org/r/564669
[12:44:34] <wikibugs>	 (03CR) 10ArielGlenn: "There will need to be an entry added to the cleanup manifests too, so that these don't accumulate forever. See https://gerrit.wikimedia.or" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/561356 (https://phabricator.wikimedia.org/T240520) (owner: 10EBernhardson)
[12:46:03] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] profile::puppet_compiler fix call to conftool [puppet] - 10https://gerrit.wikimedia.org/r/564669 (owner: 10Jbond)
[13:35:07] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch url-downloader.codfw to urldownloader2001 [dns] - 10https://gerrit.wikimedia.org/r/564588 (https://phabricator.wikimedia.org/T224551)
[13:37:55] <wikibugs>	 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10BBlack) +1 from me, this was one of the many things we made the ganeti clusters for :)
[13:41:22] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) @Papaul can you double check (maybe even with the vendor) if there is a way to disable the 10G port for now?
[13:44:07] <wikibugs>	 (03CR) 10Urbanecm: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564121 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[13:44:09] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Revert "Horizon: put in maintenance mode for the ocata=>pike upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/564658 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[13:48:23] <wikibugs>	 (03CR) 10Ema: [C: 03+2] cache: raise vm.max_map_count sysctl [puppet] - 10https://gerrit.wikimedia.org/r/564616 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema)
[13:49:52] <wikibugs>	 10Operations: Sort out plan for install* servers in edge sites - https://phabricator.wikimedia.org/T242602 (10MoritzMuehlenhoff) >>! In T242602#5800549, @faidon wrote: > Splitting the internal apt repository from the install roles/servers sounds good -- it's more of a historical artifact than anything else. You...
[13:52:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1080 for upgrade', diff saved to https://phabricator.wikimedia.org/P10142 and previous config saved to /var/cache/conftool/dbconfig/20200114-135238-marostegui.json
[13:52:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:49] <wikibugs>	 (03CR) 10Tchanders: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564121 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[13:54:25] <marostegui>	 !log Upgrade db1080
[13:54:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:42] <wikibugs>	 (03PS1) 10Andrew Bogott: nova-fullstack: update to track changes in novaclient bindings [puppet] - 10https://gerrit.wikimedia.org/r/564677 (https://phabricator.wikimedia.org/T241347)
[13:58:55] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: neutron l3 agent: pike: disable ravd [puppet] - 10https://gerrit.wikimedia.org/r/564678 (https://phabricator.wikimedia.org/T241347)
[14:00:04] <jouncebot>	 liw and brennen: It is that lovely time of the day again! You are hereby commanded to deploy Mediawiki train - European+American Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T1400).
[14:00:26] <wikibugs>	 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui)
[14:00:49] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova-fullstack: update to track changes in novaclient bindings [puppet] - 10https://gerrit.wikimedia.org/r/564677 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[14:01:10] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: bump 'global' retention to 2.25 years [puppet] - 10https://gerrit.wikimedia.org/r/564679
[14:01:12] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: bump 'ops' retention to 4.5 months [puppet] - 10https://gerrit.wikimedia.org/r/564680
[14:02:10] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:16] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1020 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:25] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1030 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:30] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1016 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:33] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1026 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:35] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1007 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:43] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1026 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:49] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1009 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:53] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1019 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:54] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1017 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:55] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1012 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:56] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:56] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1025 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:57] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1008 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:02:59] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:03:00] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1012 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:03:00] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[14:03:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:01] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:03:03] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:03:08] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:03:08] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:03:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: bump 'global' retention to 2.25 years [puppet] - 10https://gerrit.wikimedia.org/r/564679 (owner: 10Filippo Giunchedi)
[14:03:11] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:03:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:25] <liw>	 train: I'm running late, starting with branch cut
[14:06:53] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] openstack: neutron l3 agent: pike: disable ravd [puppet] - 10https://gerrit.wikimedia.org/r/564678 (https://phabricator.wikimedia.org/T241347) (owner: 10Arturo Borrero Gonzalez)
[14:07:13] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: neutron l3 agent: pike: disable ravd [puppet] - 10https://gerrit.wikimedia.org/r/564678 (https://phabricator.wikimedia.org/T241347) (owner: 10Arturo Borrero Gonzalez)
[14:09:48] <vgutierrez>	 !log upgrade ats to 8.0.5-1wm12 in cp5006 and cp5012 - T242620
[14:09:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:57] <stashbot>	 T242620: ats-tls is having issues when varnish-fe goes away - https://phabricator.wikimedia.org/T242620
[14:12:34] <liw>	 !log branch cut for 1.35.0-wmf.15
[14:12:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:40] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: neutron l3 agent: pike: fix radvd mask [puppet] - 10https://gerrit.wikimedia.org/r/564682 (https://phabricator.wikimedia.org/T241347)
[14:13:14] <wikibugs>	 (03PS1) 10Andrew Bogott: nova-compute monitoring: support multiple python versions [puppet] - 10https://gerrit.wikimedia.org/r/564683 (https://phabricator.wikimedia.org/T241347)
[14:13:35] <wikibugs>	 (03PS1) 10BBlack: Set up transparency-archive microsite [puppet] - 10https://gerrit.wikimedia.org/r/564684 (https://phabricator.wikimedia.org/T230638)
[14:13:41] <wikibugs>	 (03PS1) 10BBlack: Add transparency-archive.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/564685 (https://phabricator.wikimedia.org/T230638)
[14:14:24] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: neutron l3 agent: pike: fix radvd mask [puppet] - 10https://gerrit.wikimedia.org/r/564682 (https://phabricator.wikimedia.org/T241347) (owner: 10Arturo Borrero Gonzalez)
[14:14:31] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Add transparency-archive.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/564685 (https://phabricator.wikimedia.org/T230638) (owner: 10BBlack)
[14:15:37] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] nova-compute monitoring: support multiple python versions [puppet] - 10https://gerrit.wikimedia.org/r/564683 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[14:15:39] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova-compute monitoring: support multiple python versions [puppet] - 10https://gerrit.wikimedia.org/r/564683 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott)
[14:15:58] <XioNoX>	 !log push firewall policies to pfw3-codfw - T242681
[14:15:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Jclark-ctr) Replaced Disk #0
[14:17:02] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:08] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:11] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1026 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:22] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1026 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:30] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:31] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:33] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:34] <wikibugs>	 (03PS3) 10Muehlenhoff: Switch url-downloader.codfw to urldownloader2001 [dns] - 10https://gerrit.wikimedia.org/r/564588 (https://phabricator.wikimedia.org/T224551)
[14:17:34] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1025 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:34] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:35] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1008 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:37] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1028 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:38] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1012 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:39] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:41] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:45] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1028 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:17:46] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:18:13] <wikibugs>	 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10ayounsi) a:05BBlack→03ayounsi
[14:18:45] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:18:51] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1020 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:19:10] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1007 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:21:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch url-downloader.codfw to urldownloader2001 [dns] - 10https://gerrit.wikimedia.org/r/564588 (https://phabricator.wikimedia.org/T224551) (owner: 10Muehlenhoff)
[14:21:37] <XioNoX>	 !log push firewall policies to pfw3-eqiad - T242681
[14:21:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:47] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Set up transparency-archive microsite [puppet] - 10https://gerrit.wikimedia.org/r/564684 (https://phabricator.wikimedia.org/T230638) (owner: 10BBlack)
[14:22:22] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Marostegui) Thanks - it is now rebuilding. I will close the task once it is finished ` PD: 0 Information Enclosure Device ID: 32 Slot Number: 0 Drive's position: DiskGroup: 0, Span: 0, Arm: 0 Enclosure po...
[14:22:28] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] ATS: Deploy acme-chief version of unified certificate on text [puppet] - 10https://gerrit.wikimedia.org/r/561883 (https://phabricator.wikimedia.org/T234803) (owner: 10Ema)
[14:23:05] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:24:42] <marostegui>	 !log Stop db1080 and db1107 replication in sync
[14:24:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:02] <marostegui>	 !log Move db1114 under db1080
[14:26:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:07] <wikibugs>	 (03PS1) 10BBlack: transparency-archive: correct template name [puppet] - 10https://gerrit.wikimedia.org/r/564689 (https://phabricator.wikimedia.org/T230638)
[14:26:18] <wikibugs>	 (03CR) 10BBlack: [V: 03+2 C: 03+2] transparency-archive: correct template name [puppet] - 10https://gerrit.wikimedia.org/r/564689 (https://phabricator.wikimedia.org/T230638) (owner: 10BBlack)
[14:26:50] <wikibugs>	 (03PS10) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[14:26:52] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[14:27:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[14:27:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690 (owner: 10Giuseppe Lavagetto)
[14:37:40] <wikibugs>	 (03PS11) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[14:37:42] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[14:39:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690 (owner: 10Giuseppe Lavagetto)
[14:40:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[14:41:22] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Jgreen)
[14:42:16] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) >>! In T242481#5801763, @MoritzMuehlenhoff wrote: > So, I digged into this a little: Interface auto setup not working if on...
[14:43:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10143 and previous config saved to /var/cache/conftool/dbconfig/20200114-144341-marostegui.json
[14:43:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:07] <icinga-wm>	 RECOVERY - MegaRAID on db1100 is OK: OK: optimal, 1 logical, 10 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[14:45:14] <wikibugs>	 (03PS1) 10Lars Wirzenius: Group0 to 1.35.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564694
[14:47:02] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Place db1107 in s1 [puppet] - 10https://gerrit.wikimedia.org/r/564695 (https://phabricator.wikimedia.org/T242702)
[14:47:30] <logmsgbot>	 !log liw@deploy1001 Started scap: testwiki to php-1.35.0-wmf.15 and rebuild l10n cache
[14:47:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:32] <wikibugs>	 (03PS1) 10Ema: prometheus: collect varnishd_mmap_count for varnish-frontend [puppet] - 10https://gerrit.wikimedia.org/r/564696 (https://phabricator.wikimedia.org/T242417)
[14:48:50] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] prometheus: bump 'global' retention to 2.25 years [puppet] - 10https://gerrit.wikimedia.org/r/564679 (owner: 10Filippo Giunchedi)
[14:48:53] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] prometheus: bump 'ops' retention to 4.5 months [puppet] - 10https://gerrit.wikimedia.org/r/564680 (owner: 10Filippo Giunchedi)
[14:51:25] <logmsgbot>	 !log liw@deploy1001 scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_44869219" --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 03m 55s)
[14:51:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:31] <wikibugs>	 (03PS2) 10Ema: prometheus: collect varnishd_mmap_count for varnish-frontend [puppet] - 10https://gerrit.wikimedia.org/r/564696 (https://phabricator.wikimedia.org/T242417)
[14:59:01] <wikibugs>	 (03PS1) 10BBlack: Update webserver-misc-static cert [puppet] - 10https://gerrit.wikimedia.org/r/564697
[14:59:28] <wikibugs>	 (03PS2) 10BBlack: Update webserver-misc-static cert [puppet] - 10https://gerrit.wikimedia.org/r/564697 (https://phabricator.wikimedia.org/T230638)
[15:00:17] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Update webserver-misc-static cert [puppet] - 10https://gerrit.wikimedia.org/r/564697 (https://phabricator.wikimedia.org/T230638) (owner: 10BBlack)
[15:00:40] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime
[15:00:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1080 for tranfer', diff saved to https://phabricator.wikimedia.org/P10144 and previous config saved to /var/cache/conftool/dbconfig/20200114-150223-marostegui.json
[15:02:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:40] <marostegui>	 !log Copy data from db1080 to db1107 T242702
[15:02:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:43] <stashbot>	 T242702: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702
[15:02:53] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[15:02:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:27] <wikibugs>	 (03PS12) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[15:06:29] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[15:06:53] <wikibugs>	 (03PS1) 10Ema: cache: enable geoiplookup in labs [puppet] - 10https://gerrit.wikimedia.org/r/564700 (https://phabricator.wikimedia.org/T241239)
[15:09:08] <wikibugs>	 (03PS2) 10Bstorm: gridengine: set the webgrid queues to not rerunable [puppet] - 10https://gerrit.wikimedia.org/r/564174 (https://phabricator.wikimedia.org/T242397)
[15:10:26] <wikibugs>	 (03CR) 10Ema: [C: 03+2] cache: enable geoiplookup in labs [puppet] - 10https://gerrit.wikimedia.org/r/564700 (https://phabricator.wikimedia.org/T241239) (owner: 10Ema)
[15:11:03] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Marostegui) 05Open→03Resolved All good - thank you! ` Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name                : RAID Level          : Primary-1, Secondary-0, RAID Level Qualifie...
[15:13:49] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[15:14:45] <wikibugs>	 (03PS1) 10BBlack: Support missing /historical on transparency sites [puppet] - 10https://gerrit.wikimedia.org/r/564704 (https://phabricator.wikimedia.org/T230638)
[15:14:50] <wikibugs>	 (03PS1) 10BBlack: Redirect transparency.wm.o -> foundation site [puppet] - 10https://gerrit.wikimedia.org/r/564705 (https://phabricator.wikimedia.org/T230638)
[15:17:03] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) So adding the `01-` did the trick and es2020 is installing: ` append initrd=debian-installer/amd64/initrd.gz vga=normal aut...
[15:18:01] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) Forgot to thank @MoritzMuehlenhoff for all the help and time with the troubleshooting
[15:19:38] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Support missing /historical on transparency sites [puppet] - 10https://gerrit.wikimedia.org/r/564704 (https://phabricator.wikimedia.org/T230638) (owner: 10BBlack)
[15:19:58] <Zoranzoki21>	 Hi, what's happening with Jenkins? https://integration.wikimedia.org/ci/job/mwext-php72-phan-docker/30013/console
[15:20:02] <wikibugs>	 (03PS13) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[15:20:04] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[15:21:35] <wikibugs>	 10Operations, 10Traffic, 10Wikimedia-Logstash, 10observability, and 2 others: Port varnishlog consumers to log to syslog / logging infra - https://phabricator.wikimedia.org/T227108 (10ema)
[15:27:27] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] "The queries look good to me!" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564614 (owner: 10Volans)
[15:28:48] <wikibugs>	 10Operations, 10ops-eqiad, 10netops: (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300 - https://phabricator.wikimedia.org/T225121 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson
[15:28:52] <James_F>	 Zoranzoki21: Not sure. Investigating.
[15:30:39] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) On the phone with Dell support.
[15:32:06] <wikibugs>	 (03PS2) 10BBlack: Redirect transparency.wm.o -> foundation site [puppet] - 10https://gerrit.wikimedia.org/r/564705 (https://phabricator.wikimedia.org/T230638)
[15:32:08] <wikibugs>	 (03PS1) 10BBlack: Fixup for historical redirect [puppet] - 10https://gerrit.wikimedia.org/r/564706 (https://phabricator.wikimedia.org/T230638)
[15:33:25] <Zoranzoki21>	 James_F: I saw it on few patches, and always is same agent-docker
[15:33:40] <James_F>	 Yeah, I'm depooling integration-agent-docker-1003.
[15:34:48] <wikibugs>	 (03PS3) 10Bstorm: gridengine: set the webgrid queues to not rerunable [puppet] - 10https://gerrit.wikimedia.org/r/564174 (https://phabricator.wikimedia.org/T242397)
[15:37:00] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Fixup for historical redirect [puppet] - 10https://gerrit.wikimedia.org/r/564706 (https://phabricator.wikimedia.org/T230638) (owner: 10BBlack)
[15:40:04] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Set connect timeout and TTFB timeouts to different values [puppet] - 10https://gerrit.wikimedia.org/r/564708 (https://phabricator.wikimedia.org/T242620)
[15:41:22] <wikibugs>	 (03CR) 10Ema: varnish: format log consumer stdout as cee+json (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/563430 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi)
[15:42:29] <James_F>	 Zoranzoki21: If you see it again, please shout.
[15:45:33] <wikibugs>	 (03PS1) 10Lars Wirzenius: Group0 to 1.35.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564709
[15:47:43] <logmsgbot>	 !log liw@deploy1001 Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2)
[15:47:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:36] <wikibugs>	 (03PS1) 10Herron: mx: increase exim queue check monitoring threshold [puppet] - 10https://gerrit.wikimedia.org/r/564710
[15:48:40] <wikibugs>	 (03PS14) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[15:48:42] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[15:48:50] <wikibugs>	 (03CR) 10Volans: [C: 03+2] hosts/images packages: optimize tables [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564614 (owner: 10Volans)
[15:48:52] <halfak>	 o/ elukey.  Could you chmod those files on ores2001 so I can read them? :)  
[15:48:58] <halfak>	  /home/elukey/14012020_celery_oom/
[15:49:05] <elukey>	 halfak: sure sorry!
[15:49:16] <elukey>	 I thought they were other-readable
[15:49:22] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Redirect transparency.wm.o -> foundation site [puppet] - 10https://gerrit.wikimedia.org/r/564705 (https://phabricator.wikimedia.org/T230638) (owner: 10BBlack)
[15:49:24] <halfak>	 No worries ^_^ 
[15:49:38] <halfak>	 It's a persistent problem that our main.log is not readable :\ 
[15:50:11] <wikibugs>	 (03PS2) 10Vgutierrez: ATS: Set connect timeout and TTFB timeouts to different values [puppet] - 10https://gerrit.wikimedia.org/r/564708 (https://phabricator.wikimedia.org/T242620)
[15:50:12] <elukey>	 ah because it is owned by www-data and 660
[15:50:13] <wikibugs>	 (03PS1) 10Vgutierrez: ATS cp40[26|32], cp50[06|12]: Set connect timeout and TTFB timeouts to different values [puppet] - 10https://gerrit.wikimedia.org/r/564711 (https://phabricator.wikimedia.org/T242620)
[15:50:19] <elukey>	 640 actually
[15:51:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ATS cp40[26|32], cp50[06|12]: Set connect timeout and TTFB timeouts to different values [puppet] - 10https://gerrit.wikimedia.org/r/564711 (https://phabricator.wikimedia.org/T242620) (owner: 10Vgutierrez)
[15:51:26] <wikibugs>	 (03Merged) 10jenkins-bot: hosts/images packages: optimize tables [software/debmonitor] - 10https://gerrit.wikimedia.org/r/564614 (owner: 10Volans)
[15:51:27] <elukey>	 halfak: try nw
[15:51:30] <halfak>	 thanks
[15:51:47] <halfak>	 Now I can't even ls the directory :P 
[15:51:52] <halfak>	 elukey, ^ 
[15:52:10] <halfak>	 I also can't read any files in it. 
[15:52:53] <elukey>	 I just sudoed as you and I can see
[15:53:04] <wikibugs>	 (03PS7) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[15:53:24] <elukey>	 ah no wait I can ls but not read, lemme fix
[15:53:43] <wikibugs>	 (03PS2) 10Vgutierrez: ATS: Set connect timeout and TTFB timeouts to different values (test hosts only) [puppet] - 10https://gerrit.wikimedia.org/r/564711 (https://phabricator.wikimedia.org/T242620)
[15:53:47] <wikibugs>	 (03PS3) 10Vgutierrez: ATS: Set connect timeout and TTFB timeouts to different values [puppet] - 10https://gerrit.wikimedia.org/r/564708 (https://phabricator.wikimedia.org/T242620)
[15:55:07] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) Dell said that it is not possible to disable the 10Gb interface.
[15:55:13] <elukey>	 halfak: I just apt-get installed basic-file-permission in my brain, hope it works now
[15:55:31] <wikibugs>	 (03PS8) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[15:55:41] <halfak>	 Works now!  Thank you
[15:55:48] <elukey>	 sorry I need a coffe :D
[15:56:07] <elukey>	 I was giving rw instead of r-x
[15:56:11] <Zoranzoki21>	 Coffee is fuel :)
[15:56:29] <vgutierrez>	 we are more in beer than coffee tiem
[15:56:31] <vgutierrez>	 *time
[15:56:52] <vgutierrez>	 https://xkcd.com/323/
[15:57:00] <Zoranzoki21>	 Yes, and beer is good, you're right
[15:57:10] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "My comments are not in any way meant to block this - I'm just asking. I see now the two new wikis are in English anyway, so translations d" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564121 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[15:57:40] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10ayounsi) >>! In T242481#5798551, @Marostegui wrote: > - Enable to 10G even though it will go to a 1G switch port? Is that even possible...
[15:58:18] <wikibugs>	 (03CR) 10Vgutierrez: "pcc shows the expected changes on the right hosts: https://puppet-compiler.wmflabs.org/compiler1001/20351/" [puppet] - 10https://gerrit.wikimedia.org/r/564711 (https://phabricator.wikimedia.org/T242620) (owner: 10Vgutierrez)
[15:59:21] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10MoritzMuehlenhoff) >>! In T242481#5802185, @ayounsi wrote: >>>! In T242481#5798551, @Marostegui wrote: >> - Enable to 10G even though i...
[16:01:19] <wikibugs>	 (03PS6) 10CDanis: puppet-merge.py: SHA1 or explicit FETCH_HEAD is mandatory [puppet] - 10https://gerrit.wikimedia.org/r/559944 (https://phabricator.wikimedia.org/T241277)
[16:01:34] <wikibugs>	 (03PS1) 10Volans: Release v0.2.4 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/564717
[16:02:10] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10ayounsi) >>! In T242481#5802186, @MoritzMuehlenhoff wrote: > Is that because of different cables/connectors? Indeed, 1G switch ports ar...
[16:02:17] <wikibugs>	 (03CR) 10Volans: [V: 03+2 C: 03+2] "releasing the last 2 changes" [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/564717 (owner: 10Volans)
[16:02:51] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] gridengine: set the webgrid queues to not rerunable [puppet] - 10https://gerrit.wikimedia.org/r/564174 (https://phabricator.wikimedia.org/T242397) (owner: 10Bstorm)
[16:03:35] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "I don't think it's false positive, but it's a non-actionable alert anyway, so +1" [puppet] - 10https://gerrit.wikimedia.org/r/564710 (owner: 10Herron)
[16:04:36] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] puppet-merge.py: SHA1 or explicit FETCH_HEAD is mandatory [puppet] - 10https://gerrit.wikimedia.org/r/559944 (https://phabricator.wikimedia.org/T241277) (owner: 10CDanis)
[16:04:43] <wikibugs>	 (03PS1) 10Ayounsi: Add conditional for vcp-snmp-statistics [homer/public] - 10https://gerrit.wikimedia.org/r/564718
[16:04:45] <wikibugs>	 (03PS1) 10Ayounsi: Add tenant support for vlans [homer/public] - 10https://gerrit.wikimedia.org/r/564719
[16:05:12] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) So I asked Dell if it was possible to replace the NIC card we have now with 2 separate NiC cards ( 1 x10Gb NiC and 1 x1GB NIC)....
[16:05:54] <cdanis>	 bstorm_: okay to merge your gridengine: set the webgrid queues to not rerunable (19c44fa2c1) ?
[16:06:04] <bstorm_>	 Please do, I was just about to :)
[16:06:37] <wikibugs>	 (03PS15) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[16:06:38] <wikibugs>	 (03PS9) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[16:06:47] <cdanis>	 thanks
[16:07:04] <logmsgbot>	 !log volans@deploy1001 Started deploy [debmonitor/deploy@e72911c]: Release v0.2.4
[16:07:04] * cdanis just made some changes to puppet-merge; expected no-op aside from cleanups but please lmk if you have issues
[16:07:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:29] <Zoranzoki21>	 James_F: See this https://integration.wikimedia.org/ci/job/quibble-composer-mysql-php72-docker/9616/console
[16:07:44] <wikibugs>	 (03CR) 10Herron: [C: 03+2] mx: increase exim queue check monitoring threshold [puppet] - 10https://gerrit.wikimedia.org/r/564710 (owner: 10Herron)
[16:08:13] <logmsgbot>	 !log volans@deploy1001 Finished deploy [debmonitor/deploy@e72911c]: Release v0.2.4 (duration: 01m 09s)
[16:08:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:08:30] <James_F>	 Zoranzoki21: Is that from an LDAP extension that's expecting configuration?
[16:08:37] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Place db1107 in s1 [puppet] - 10https://gerrit.wikimedia.org/r/564695 (https://phabricator.wikimedia.org/T242702) (owner: 10Marostegui)
[16:09:04] <marostegui>	 herron: ok to merge your change?
[16:09:16] <herron>	 marostegui: sure thanks!
[16:09:22] <marostegui>	 merging
[16:09:51] <Zoranzoki21>	 James_F: Looks so
[16:10:22] <James_F>	 Zoranzoki21: There's at least one (maybe more?) extensions in gerrit that fundamentally don't pass in master. :-(
[16:10:54] <wikibugs>	 (03PS1) 10Marostegui: mariadb: es20[0-5] [puppet] - 10https://gerrit.wikimedia.org/r/564723 (https://phabricator.wikimedia.org/T241336)
[16:11:14] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review: puppet-merge can't accept an explicit SHA1 for an --ops merge - https://phabricator.wikimedia.org/T241277 (10CDanis) 05Open→03Resolved a:03CDanis
[16:11:43] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: es20[0-5] [puppet] - 10https://gerrit.wikimedia.org/r/564723 (https://phabricator.wikimedia.org/T241336) (owner: 10Marostegui)
[16:15:02] <wikibugs>	 (03PS1) 10Marostegui: install_server: Changing es2023 MAC to the 10G one [puppet] - 10https://gerrit.wikimedia.org/r/564724 (https://phabricator.wikimedia.org/T242481)
[16:16:31] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] install_server: Changing es2023 MAC to the 10G one [puppet] - 10https://gerrit.wikimedia.org/r/564724 (https://phabricator.wikimedia.org/T242481) (owner: 10Marostegui)
[16:17:11] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Changing es2023 MAC to the 10G one [puppet] - 10https://gerrit.wikimedia.org/r/564724 (https://phabricator.wikimedia.org/T242481) (owner: 10Marostegui)
[16:18:06] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10MoritzMuehlenhoff) pxelinux has a generic option to pass the MAC in place when receiving the boot image as BOOTIF...
[16:21:01] <Zoranzoki21>	 James_F: Ok, ty
[16:24:34] <wikibugs>	 (03PS2) 10Filippo Giunchedi: varnish: use syslog for varnishlog consumers [puppet] - 10https://gerrit.wikimedia.org/r/563977 (https://phabricator.wikimedia.org/T227108)
[16:24:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: varnish: format log consumer stdout as cee+json (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/563430 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi)
[16:25:42] <wikibugs>	 (03PS1) 10Ema: cache: add CAP_KILL to varnish-frontend capabilities [puppet] - 10https://gerrit.wikimedia.org/r/564726 (https://phabricator.wikimedia.org/T242411)
[16:26:17] <marostegui>	 !log Disable temporarily puppet on install1002 and install2002 - T242481
[16:26:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:26:32] <stashbot>	 T242481: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481
[16:27:59] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` es20...
[16:28:20] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/564719 (owner: 10Ayounsi)
[16:28:49] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/564718 (owner: 10Ayounsi)
[16:29:52] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Add tenant support for vlans [homer/public] - 10https://gerrit.wikimedia.org/r/564719 (owner: 10Ayounsi)
[16:30:57] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Add conditional for vcp-snmp-statistics [homer/public] - 10https://gerrit.wikimedia.org/r/564718 (owner: 10Ayounsi)
[16:31:12] <logmsgbot>	 !log liw@deploy1001 Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2) (duration: 43m 29s)
[16:31:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:16] <wikibugs>	 (03PS10) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690
[16:31:39] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:34:03] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:34:14] <wikibugs>	 (03PS1) 10Marostegui: Revert "install_server: Changing es2023 MAC to the 10G one" [puppet] - 10https://gerrit.wikimedia.org/r/564727
[16:34:46] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) >>! In T242481#5802260, @MoritzMuehlenhoff wrote: > pxelinux has a generic option to pass the MAC in...
[16:35:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM overall" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564696 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema)
[16:35:56] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "install_server: Changing es2023 MAC to the 10G one" [puppet] - 10https://gerrit.wikimedia.org/r/564727 (owner: 10Marostegui)
[16:37:47] <wikibugs>	 (03CR) 10Ema: [C: 03+1] ATS: Set connect timeout and TTFB timeouts to different values (test hosts only) [puppet] - 10https://gerrit.wikimedia.org/r/564711 (https://phabricator.wikimedia.org/T242620) (owner: 10Vgutierrez)
[16:39:33] <wikibugs>	 (03PS3) 10Filippo Giunchedi: varnish: format log consumer stdout as cee+json [puppet] - 10https://gerrit.wikimedia.org/r/563430 (https://phabricator.wikimedia.org/T227108)
[16:39:35] <wikibugs>	 (03PS3) 10Filippo Giunchedi: varnish: use syslog for varnishlog consumers [puppet] - 10https://gerrit.wikimedia.org/r/563977 (https://phabricator.wikimedia.org/T227108)
[16:40:48] <wikibugs>	 (03PS3) 10Ema: prometheus: collect varnishd_mmap_count for varnish-frontend [puppet] - 10https://gerrit.wikimedia.org/r/564696 (https://phabricator.wikimedia.org/T242417)
[16:41:24] <marostegui>	 !log Enable puppet back on install1002 and install2002 - T242481
[16:41:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:27] <stashbot>	 T242481: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481
[16:41:36] <wikibugs>	 (03CR) 10Ema: [C: 03+1] varnish: format log consumer stdout as cee+json [puppet] - 10https://gerrit.wikimedia.org/r/563430 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi)
[16:42:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] prometheus: collect varnishd_mmap_count for varnish-frontend [puppet] - 10https://gerrit.wikimedia.org/r/564696 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema)
[16:42:24] <wikibugs>	 (03CR) 10Ema: prometheus: collect varnishd_mmap_count for varnish-frontend (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564696 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema)
[16:42:28] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ATS: Set connect timeout and TTFB timeouts to different values (test hosts only) [puppet] - 10https://gerrit.wikimedia.org/r/564711 (https://phabricator.wikimedia.org/T242620) (owner: 10Vgutierrez)
[16:42:35] <wikibugs>	 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10serviceops, 10Patch-For-Review: Remove obsoleted docker images - https://phabricator.wikimedia.org/T242604 (10thcipriani)
[16:44:02] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime
[16:44:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:44:31] <liw>	 !log branch is cut for 1.35.0-wmv.15; train window is closed, but I'll continue train since the next time slot seems to not have anything
[16:44:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:45:52] <wikibugs>	 (03PS1) 10Muehlenhoff: Pass down MAC address of to installing system via BOOTIF [puppet] - 10https://gerrit.wikimedia.org/r/564729 (https://phabricator.wikimedia.org/T242481)
[16:46:10] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:46:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:47] <wikibugs>	 (03Abandoned) 10Lars Wirzenius: Group0 to 1.35.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564694 (owner: 10Lars Wirzenius)
[16:46:49] <wikibugs>	 (03CR) 10Herron: [C: 03+1] prometheus: bump 'global' retention to 2.25 years (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564679 (owner: 10Filippo Giunchedi)
[16:47:22] <wikibugs>	 (03CR) 10Lars Wirzenius: [C: 03+2] Group0 to 1.35.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564709 (owner: 10Lars Wirzenius)
[16:47:41] <wikibugs>	 (03CR) 10Herron: [C: 03+1] prometheus: bump 'ops' retention to 4.5 months [puppet] - 10https://gerrit.wikimedia.org/r/564680 (owner: 10Filippo Giunchedi)
[16:48:21] <wikibugs>	 (03Merged) 10jenkins-bot: Group0 to 1.35.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564709 (owner: 10Lars Wirzenius)
[16:48:23] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] "es2020 went fine indeed" [puppet] - 10https://gerrit.wikimedia.org/r/564729 (https://phabricator.wikimedia.org/T242481) (owner: 10Muehlenhoff)
[16:53:03] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2020.codfw.wmnet'] `  and were **ALL** successful.
[16:53:45] <logmsgbot>	 !log liw@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.15
[16:53:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:54:21] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: (Needed By 31st January) codfw: rack/setup/install  es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Marostegui)
[16:57:38] <wikibugs>	 10Operations, 10ops-codfw, 10Core Platform Team: (No Need By Date Provided) rack/setup/install restbase202[123] - https://phabricator.wikimedia.org/T241790 (10Papaul)
[16:58:35] <wikibugs>	 (03CR) 10Tchanders: "Thanks for checking Urbanecm" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564121 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[17:00:04] <jouncebot>	 godog and _joe_: Your horoscope predicts another unfortunate Puppet SWAT(Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T1700).
[17:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[17:01:07] <Urbanecm>	 https://gerrit.wikimedia.org/r/c/559073 would be nice to be deployed if anyone can do so 🙂
[17:01:47] <_joe_>	 rlazarus: ^^ :P
[17:03:43] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add a registryctl command-line utility [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/563482 (https://phabricator.wikimedia.org/T242604) (owner: 10Giuseppe Lavagetto)
[17:04:47] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-fgiunchedi: Ingest production logs with ELK7 - https://phabricator.wikimedia.org/T235891 (10fgiunchedi) Something else I realized today: with ELK7 we dropped our custom logstash template in favor of logstash's default, although we'll need to bu...
[17:05:05] <wikibugs>	 (03Merged) 10jenkins-bot: Add a registryctl command-line utility [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/563482 (https://phabricator.wikimedia.org/T242604) (owner: 10Giuseppe Lavagetto)
[17:09:01] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 94693184 and 12 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[17:09:16] <wikibugs>	 10Operations, 10cloud-services-team (Kanban): rack/setup/install cloudcontrol2001-dev & cloudvirt200[123]-dev - https://phabricator.wikimedia.org/T214448 (10aborrero) 05Open→03Resolved a:05aborrero→03None
[17:09:34] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "Interesting failure upon running puppet agent in Beta:" [puppet] - 10https://gerrit.wikimedia.org/r/559262 (https://phabricator.wikimedia.org/T241097) (owner: 10Krinkle)
[17:10:41] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 76 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[17:10:42] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: New debian version [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/564732
[17:11:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] New debian version [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/564732 (owner: 10Giuseppe Lavagetto)
[17:12:43] <wikibugs>	 (03Merged) 10jenkins-bot: New debian version [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/564732 (owner: 10Giuseppe Lavagetto)
[17:13:44] <wikibugs>	 10Operations, 10MediaWiki-General, 10serviceops-radar, 10Performance-Team (Radar), and 2 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Catrope)
[17:14:41] <wikibugs>	 (03PS2) 10Krinkle: mediawiki: Capture shutdown/destruct backtrace in php7-fatal-error.php [puppet] - 10https://gerrit.wikimedia.org/r/559262 (https://phabricator.wikimedia.org/T241097)
[17:18:29] <wikibugs>	 10Puppet, 10cloud-services-team (Kanban): Reduce the effects of puppet breakage on VPS - https://phabricator.wikimedia.org/T226270 (10Andrew) p:05Triage→03Normal This is still important but lacking a good way to move forward.
[17:21:36] <_joe_>	 !log upload docker-report 0.0.2 to {buster,stretch}-wikimedia T242604
[17:21:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:40] <stashbot>	 T242604: Remove obsoleted docker images - https://phabricator.wikimedia.org/T242604
[17:22:01] <wikibugs>	 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure: Puppet agent unable to run in Beta Cluster (Evaluation Error: Error while evaluating a Resource Statement) - https://phabricator.wikimedia.org/T242658 (10Krinkle)
[17:24:06] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "I tried to reproduce that error in Puppet compiler, but to my surprise, it not only has no error, it says this change is a no-op for MW ap" [puppet] - 10https://gerrit.wikimedia.org/r/559262 (https://phabricator.wikimedia.org/T241097) (owner: 10Krinkle)
[17:24:50] <Krinkle>	 effie: ^ Hm. can you think of a reason why puppet compiler does not see the edit to php7-fatal-error.php as a real change for mwdebug*/mw* servers?
[17:39:20] <vgutierrez>	 !log depooling cp4027 for some ats-tls parent balancing tests
[17:39:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:40:11] <rlazarus>	 Urbanecm, _joe_: sure, will do
[17:41:08] <Urbanecm>	 thanks rlazarus 
[17:41:40] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] Add ng.wikimedia.org as chapter site [puppet] - 10https://gerrit.wikimedia.org/r/559073 (https://phabricator.wikimedia.org/T240771) (owner: 10IAmNetx)
[17:42:34] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "fyi, technically you don't need to use 755 on directories because puppet always adds the exec bit automatically, so 644 would be the same." [puppet] - 10https://gerrit.wikimedia.org/r/564466 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm)
[17:43:33] <vgutierrez>	 !log repooling cp4027
[17:43:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:38] <rlazarus>	 Urbanecm: merged, tested at mwdebug1001, will be deployed everywhere within 30m
[17:46:21] <wikibugs>	 (03PS1) 10saper: Wikistats v2 need no symbolic link [puppet] - 10https://gerrit.wikimedia.org/r/564739 (https://phabricator.wikimedia.org/T237752)
[17:47:38] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10serviceops: Hundreds of tags for `wikimedia/mediawiki-core` image - https://phabricator.wikimedia.org/T242775 (10Joe) p:05Triage→03High
[17:50:01] <wikibugs>	 10Puppet, 10VPS-project-codesearch, 10Patch-For-Review: Puppetize codesearch - https://phabricator.wikimedia.org/T242319 (10Dzahn) Yea, it should be possible. You can run any command with the puppet [[ https://puppet.com/docs/puppet/latest/types/exec.html | exec resource type ]] and one way to do stuff only...
[17:50:38] <wikibugs>	 10Operations, 10ops-codfw, 10Core Platform Team: (No Need By Date Provided) rack/setup/install restbase202[123] - https://phabricator.wikimedia.org/T241790 (10Papaul)
[17:51:23] <wikibugs>	 (03CR) 10saper: "Hello - since I have tested it the Apache config on my personal server, I thought - why not propose the change here?" [puppet] - 10https://gerrit.wikimedia.org/r/564739 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[17:51:50] <wikibugs>	 (03CR) 10Dzahn: "or should i do one server per row as canary ? do you agree with the ticket we should have them in codfw?" [puppet] - 10https://gerrit.wikimedia.org/r/564175 (https://phabricator.wikimedia.org/T242606) (owner: 10Dzahn)
[17:54:58] <wikibugs>	 (03CR) 10Dzahn: Wikistats v2 need no symbolic link (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564739 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[17:59:00] <wikibugs>	 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure: Puppet agent unable to run in Beta Cluster (Evaluation Error: Error while evaluating a Resource Statement) - https://phabricator.wikimedia.org/T242658 (10Dzahn) @Krinkle cool, thanks for finding the additional override :)
[18:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and accraze: Time to snap out of that daydream and deploy Services – Graphoid / Parsoid / Citoid / ORES. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T1800).
[18:00:17] <wikibugs>	 (03PS2) 10saper: Wikistats v2 need no symbolic link [puppet] - 10https://gerrit.wikimedia.org/r/564739 (https://phabricator.wikimedia.org/T237752)
[18:00:19] <wikibugs>	 (03PS1) 10saper: Wikistats v2 go live [puppet] - 10https://gerrit.wikimedia.org/r/564745 (https://phabricator.wikimedia.org/T237752)
[18:03:25] <Urbanecm>	 thansk a lot rlazarus 
[18:06:50] <vgutierrez>	 !log depool cp5012 for some ats parent select debugging
[18:06:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:23] <wikibugs>	 (03PS3) 10saper: Wikistats v2 need no symbolic link [puppet] - 10https://gerrit.wikimedia.org/r/564739 (https://phabricator.wikimedia.org/T237752)
[18:07:30] <wikibugs>	 10Operations, 10WMF-Legal, 10serviceops, 10Patch-For-Review: Move old transparency report pages to historical URLs and setup redirect - https://phabricator.wikimedia.org/T230638 (10Jdforrester-WMF) 05Open→03Resolved a:03BBlack This looks fully done. Thank you!
[18:09:26] <wikibugs>	 (03CR) 10saper: "Small change suggested by dzahn@" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564739 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[18:10:28] <wikibugs>	 (03PS2) 10saper: Wikistats v2 go live [puppet] - 10https://gerrit.wikimedia.org/r/564745 (https://phabricator.wikimedia.org/T237752)
[18:11:09] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team: Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10Halfak) I had a look at the request log on ores2001 and I can't find any requests that look concerning.    Hypotheses: 1. celery got into a weird state and went crazy.  It may not happ...
[18:11:13] <vgutierrez>	 !log repooling cp5012
[18:11:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:18] <wikibugs>	 (03PS1) 10Papaul: DNS: Add mgmt and production DNS for puppetmaster2003 [dns] - 10https://gerrit.wikimedia.org/r/564746
[18:14:42] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Tiziano Piccardi shell request + analytics-privatedata-users - https://phabricator.wikimedia.org/T151969 (10tizianopiccardi) 05Resolved→03Open Hi all,  I have a problem to login via ssh. I did not change anything recently, but I'm abl...
[18:16:01] <wikibugs>	 (03CR) 10saper: "Hello, this is the change that would bring wikistats v2 onto the stats.wikimedia.org homepage." [puppet] - 10https://gerrit.wikimedia.org/r/564745 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[18:16:48] <wikibugs>	 10Operations, 10cloud-services-team (Kanban): rack/setup/install cloudcontrol2001-dev & cloudvirt200[123]-dev - https://phabricator.wikimedia.org/T214448 (10Dzahn) All notifications for these hosts are (permanently?) disabled. Wondering if that is desired or maybe they should just not be in monitoring in the f...
[18:17:00] <wikibugs>	 (03PS1) 10Elukey: admin: update user piccardi's ssh public key [puppet] - 10https://gerrit.wikimedia.org/r/564747 (https://phabricator.wikimedia.org/T151969)
[18:18:56] <elukey>	 mutante: o/ if you have time can you check --^ ? I didn't find a better way to solve Tiziano's issue, but I may have missed something obvious :(
[18:21:10] <wikibugs>	 (03PS3) 10Krinkle: mediawiki: Capture shutdown/destruct backtrace in php7-fatal-error.php [puppet] - 10https://gerrit.wikimedia.org/r/559262 (https://phabricator.wikimedia.org/T241097)
[18:22:13] <wikibugs>	 (03CR) 10Krinkle: Wikistats v2 go live (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564745 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[18:22:20] <mutante>	 elukey: looks like he is using the wrong username
[18:22:27] <mutante>	 correct would be: piccardi
[18:22:45] <mutante>	 oh wait.. no..checking 
[18:23:33] <effie>	 Krinkle: because we tell puppet to take the file as is and copy it on the server
[18:24:03] <wikibugs>	 (03CR) 10Krinkle: "Directing at /v2/ as Saper's alternative patch might be better long-term as it means we hand out canonical/permalinks. This means that whe" [puppet] - 10https://gerrit.wikimedia.org/r/563508 (https://phabricator.wikimedia.org/T237752) (owner: 10Elukey)
[18:24:37] <effie>	 Krinkle: if it were a template, we would see the changes, so this is normal
[18:25:24] <Krinkle>	 effie: should it not at least say that File[/etc/php7-fatal-error.php] has different content andor different md5 hash?
[18:25:39] <Krinkle>	 Or does that only work for templates? 
[18:26:28] <Krinkle>	 Interesting, OK. I thought I did something wrong, but I guess all files I previously edited happen to be templtes then. Thanks :)
[18:32:48] <wikibugs>	 10Operations, 10Traffic: ATS strict round robin parent select policy doesn't work as expected - https://phabricator.wikimedia.org/T242778 (10Vgutierrez)
[18:35:09] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Tiziano Piccardi shell request + analytics-privatedata-users - https://phabricator.wikimedia.org/T151969 (10Dzahn) hi @tizianopiccardi i can confirm your user exists on bast1002 and notebook1003 and your key has not been revoked. It is:...
[18:39:01] <wikibugs>	 (03CR) 10saper: Wikistats v2 go live (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564745 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[18:43:33] <wikibugs>	 (03CR) 10Krinkle: Wikistats v2 go live (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564745 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[18:45:08] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Tiziano Piccardi shell request + analytics-privatedata-users - https://phabricator.wikimedia.org/T151969 (10Dzahn) I see the SHA256 of the public key you are attempting to use is:   ` SHA256:wNTUKNPfq5Wyubriy6VGxmqrPq3m9l6GSiyF0SV/ywE `...
[18:50:13] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Tiziano Piccardi shell request + analytics-privatedata-users - https://phabricator.wikimedia.org/T151969 (10Dzahn) a:05RobH→03Muehlenhoff
[18:50:15] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10Dzahn) a:05Dzahn→03None
[18:50:17] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10Dzahn) a:03Muehlenhoff
[18:51:27] <wikibugs>	 (03CR) 10saper: "Sure, have a look at how I have tested it https://phabricator.wikimedia.org/T237752#5802478 - there is a tarball with all I needed." [puppet] - 10https://gerrit.wikimedia.org/r/564745 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[18:53:14] <effie>	 Krinkle: we will see that when we run puppet, but not on the compiler
[18:54:54] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Papaul)
[18:55:26] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Tiziano Piccardi shell request + analytics-privatedata-users - https://phabricator.wikimedia.org/T151969 (10tizianopiccardi) Hi Daniel,  yes, I'm quite sure I didn't change anything (unless OSX updated and changed files somewhere). Here a...
[18:57:26] <wikibugs>	 10Operations, 10ops-eqiad, 10SRE-swift-storage: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T242511 (10Cmjohnson) I am working on this now, a ticket has been created with HPE   Case ID: 5344411330 Case title: Degraded RAID Severity 3-Normal
[18:57:43] <wikibugs>	 10Operations, 10ops-eqiad, 10SRE-swift-storage: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T242511 (10Cmjohnson) a:05Jclark-ctr→03Cmjohnson
[18:58:05] <Zoranzoki21>	 What's happening with mwext-phpunit-coverage-docker-publish?? https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-docker-publish/16751/console
[18:59:54] <wikibugs>	 10Operations, 10ops-eqiad: frqueue1001 system battery needs replacement - https://phabricator.wikimedia.org/T237582 (10Cmjohnson) @Jgreen I have the batteries...when can we schedule to do this?
[19:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T1900).
[19:00:05] <jouncebot>	 Ammarpad: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[19:00:28] <wikibugs>	 10Operations, 10ops-eqiad: frqueue1001 system battery needs replacement - https://phabricator.wikimedia.org/T237582 (10Cmjohnson) a:05Jclark-ctr→03Cmjohnson
[19:05:31] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10Cmjohnson) @Jclark-ctr where are you with these?
[19:08:36] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10Jclark-ctr) @Cmjohnson  no nic installed or host moved yet.  @RobH  had helped with 10g interfaces
[19:13:19] <hauskatze>	 jouncebot: next
[19:13:20] <jouncebot>	 In 0 hour(s) and 46 minute(s): Mediawiki train - European+American Version (secondary timeslot) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T2000)
[19:13:25] <hauskatze>	 jouncebot: now
[19:13:25] <jouncebot>	 For the next 0 hour(s) and 46 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T1900)
[19:13:36] <hauskatze>	 are we still deploying?
[19:14:14] <Urbanecm>	 hauskatze: Ammarpad isn't here.
[19:14:26] <hauskatze>	 Urbanecm: mind if I add a patch?
[19:14:34] <Urbanecm>	 not at all :)
[19:14:35] <hauskatze>	 https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/564053/
[19:14:42] <hauskatze>	 it's for wikitech
[19:14:50] <hauskatze>	 you can deploy there right?
[19:16:23] <Urbanecm>	 yeah :)
[19:17:09] <wikibugs>	 (03PS3) 10Urbanecm: [wikitech] Restore contentadmin ability to manage abuse filters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564053 (https://phabricator.wikimedia.org/T242593) (owner: 10MarcoAurelio)
[19:17:33] <hauskatze>	 added to the calendar
[19:17:42] <Urbanecm>	 thanks
[19:17:53] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564053 (https://phabricator.wikimedia.org/T242593) (owner: 10MarcoAurelio)
[19:18:52] <hauskatze>	 I'd babysit Ammarpad's patches but I don't know about Minerva/MobileFrontend
[19:18:59] <wikibugs>	 (03Merged) 10jenkins-bot: [wikitech] Restore contentadmin ability to manage abuse filters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564053 (https://phabricator.wikimedia.org/T242593) (owner: 10MarcoAurelio)
[19:20:05] <Urbanecm>	 i see
[19:21:09] <Urbanecm>	 hauskatze: syncing
[19:22:10] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: e400916: [wikitech] Restore contentadmin ability to manage abuse filters (duration: 01m 05s)
[19:22:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:37] <Urbanecm>	 hauskatze: done! Lmk if you need anything else
[19:23:11] <hauskatze>	 I'll check it works
[19:23:18] <hauskatze>	 wikitech is so weird... ;)
[19:23:42] <Urbanecm>	 a wiki on its own
[19:23:53] <hauskatze>	 yup, fixed
[19:23:56] <hauskatze>	 thanky :)
[19:23:58] <Urbanecm>	 nice!
[19:32:47] <wikibugs>	 (03PS4) 10Dzahn: installserver: Convert tftp/dhcp ferm rules to ferm services [puppet] - 10https://gerrit.wikimedia.org/r/564010 (owner: 10Muehlenhoff)
[19:38:27] <Zoranzoki21>	 Hello Urbanecm
[19:38:35] <Zoranzoki21>	 I can help with Ammarpad's patches if you want?
[19:38:52] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] installserver: Convert tftp/dhcp ferm rules to ferm services [puppet] - 10https://gerrit.wikimedia.org/r/564010 (owner: 10Muehlenhoff)
[19:45:38] <wikibugs>	 (03CR) 10Dzahn: "ferm config changed on install1002/2002 and iptables -L output is unchanged (as expected)" [puppet] - 10https://gerrit.wikimedia.org/r/564010 (owner: 10Muehlenhoff)
[19:47:18] <wikibugs>	 (03CR) 10Dzahn: "should not be needed anymore. we talked on IRC and the issue is somewhere in the local ssh config. Tiziano could confirm it worked with th" [puppet] - 10https://gerrit.wikimedia.org/r/564747 (https://phabricator.wikimedia.org/T151969) (owner: 10Elukey)
[19:54:10] <wikibugs>	 (03CR) 10Dzahn: Wikistats v2 go live (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564745 (https://phabricator.wikimedia.org/T237752) (owner: 10saper)
[19:56:50] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Tiziano Piccardi shell request + analytics-privatedata-users - https://phabricator.wikimedia.org/T151969 (10Dzahn) We talked on IRC debugged a bit more and Tiziano could confirm logging in works with the existing key after moving the ssh...
[20:00:05] <jouncebot>	 liw and brennen: #bothumor My software never has bugs. It just develops random features. Rise for Mediawiki train - European+American Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200114T2000).
[20:02:23] <brennen>	 (european window deployment seems stable; nothing to be done at present.)
[20:07:00] <logmsgbot>	 !log milimetric@deploy1001 Started deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version
[20:07:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:10:15] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 187360016 and 14 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[20:11:49] <logmsgbot>	 !log milimetric@deploy1001 Finished deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version (duration: 04m 48s)
[20:11:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:12:03] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 72208 and 107 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[20:18:53] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Tiziano Piccardi shell request + analytics-privatedata-users - https://phabricator.wikimedia.org/T151969 (10tizianopiccardi) 05Open→03Resolved The problem was in the config file.  `ForwardAgent no IdentitiesOnly yes IdentityFile ~/.ss...
[20:21:30] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/instal (4) CI ganeti nodes - https://phabricator.wikimedia.org/T228926 (10Cmjohnson) The mgmt passwords have been updated.
[20:22:25] <wikibugs>	 10Operations, 10ops-eqiad, 10vm-requests: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10Cmjohnson) a:05Jclark-ctr→03Cmjohnson The mgmt passwords have been updated.  Expect these to be ready this week
[20:25:59] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "LGTM, assuming you've verified that is Hugh's key" [puppet] - 10https://gerrit.wikimedia.org/r/564171 (https://phabricator.wikimedia.org/T242309) (owner: 10Dzahn)
[20:27:18] <wikibugs>	 10Operations, 10DC-Ops, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-jessie): Replacement hardware for buster/stretch upgrade of contint1001 and contint2001 - https://phabricator.wikimedia.org/T239880 (10RobH) So this is a LOT of hardware churn that is non-desired by #dc-ops, at...
[20:27:25] <wikibugs>	 10Operations, 10DC-Ops, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-jessie): Replacement hardware for buster/stretch upgrade of contint1001 and contint2001 - https://phabricator.wikimedia.org/T239880 (10RobH) a:05RobH→03thcipriani
[20:29:50] <wikibugs>	 10Operations, 10DC-Ops, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-jessie): Replacement hardware for buster/stretch upgrade of contint1001 and contint2001 - https://phabricator.wikimedia.org/T239880 (10MoritzMuehlenhoff) @robh: You're completely right, see https://phabricator.w...
[20:31:40] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "Done testing :)" [puppet] - 10https://gerrit.wikimedia.org/r/559262 (https://phabricator.wikimedia.org/T241097) (owner: 10Krinkle)
[20:31:47] <Krinkle>	 effie: good to go whenever :)
[20:33:17] <icinga-wm>	 PROBLEM - Host lvs1015 is DOWN: PING CRITICAL - Packet loss = 100%
[20:33:33] <wikibugs>	 10Operations, 10DC-Ops, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-jessie): Replacement hardware for buster/stretch upgrade of contint1001 and contint2001 - https://phabricator.wikimedia.org/T239880 (10RobH) @thcipriani: Please comment with additional reasoning on why we need t...
[20:34:13] <wikibugs>	 10Operations, 10DC-Ops, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-jessie): Replacement hardware for buster/stretch upgrade of contint1001 and contint2001 - https://phabricator.wikimedia.org/T239880 (10RobH) >>! In T239880#5803332, @MoritzMuehlenhoff wrote: > @robh: You're comp...
[20:37:10] <wikibugs>	 10Operations, 10ops-codfw, 10Wikimedia-Logstash: (No Need By Date Provided) rack/setup/install logstash202[6-9].codfw.wmnet - https://phabricator.wikimedia.org/T240882 (10Papaul)
[20:39:23] <wikibugs>	 10Operations, 10ops-codfw, 10Wikimedia-Logstash: (No Need By Date Provided) rack/setup/install logstash202[6-9].codfw.wmnet - https://phabricator.wikimedia.org/T240882 (10Papaul) ` papaul@asw-c-codfw> show interfaces descriptions | match logstash2028  xe-7/0/11       up    up   logstash2028
[20:41:50] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (Need By: Jan 10) rack/setup/install mc-gp100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T241795 (10Cmjohnson)
[20:48:43] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: Relabel labmon1001.eqiad.wmnet to cloudmetrics1001eqiad.wmnet and labmon1002.eqiad.wmnet to cloudmetrics1002eqiad.wmnet - https://phabricator.wikimedia.org/T241155 (10Cmjohnson) 05Open→03Resolved updated physical label and switch label
[20:53:56] <wikibugs>	 (03PS1) 10Cmjohnson: Updating hostname to reflect requested change [dns] - 10https://gerrit.wikimedia.org/r/564786 (https://phabricator.wikimedia.org/T239250)
[20:55:23] <wikibugs>	 (03PS2) 10Cmjohnson: Updating hostname to reflect requested change [dns] - 10https://gerrit.wikimedia.org/r/564786 (https://phabricator.wikimedia.org/T239250)
[20:55:50] <sukhe>	 thanks cmjohnson1!
[20:55:55] <wikibugs>	 (03CR) 10Cmjohnson: [C: 03+2] Updating hostname to reflect requested change [dns] - 10https://gerrit.wikimedia.org/r/564786 (https://phabricator.wikimedia.org/T239250) (owner: 10Cmjohnson)
[20:57:40] <wikibugs>	 (03PS1) 10Papaul: DHCP: Add MAC address entires for logstash202[6-9] [puppet] - 10https://gerrit.wikimedia.org/r/564788 (https://phabricator.wikimedia.org/T240882)
[20:58:07] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Add mgmt and production DNS for puppetmaster2003 [dns] - 10https://gerrit.wikimedia.org/r/564746 (owner: 10Papaul)
[20:58:25] <wikibugs>	 (03PS2) 10Papaul: DNS: Add mgmt and production DNS for puppetmaster2003 [dns] - 10https://gerrit.wikimedia.org/r/564746
[20:59:21] <icinga-wm>	 PROBLEM - Persistent high iowait on labstore1006 is CRITICAL: 16.23 ge 10 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/dashboard/db/labs-monitoring
[20:59:51] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Add mgmt and production DNS for puppetmaster2003 [dns] - 10https://gerrit.wikimedia.org/r/564746 (owner: 10Papaul)
[21:00:29] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install censorship1001.eqiad.wmnet - https://phabricator.wikimedia.org/T239250 (10Cmjohnson) updated  hostname to cescout1001  [x] physical label [x] netbox [x] network switch [x] mgmt and production DNS updated
[21:00:57] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install censcout1001.eqiad.wmnet - https://phabricator.wikimedia.org/T239250 (10Cmjohnson)
[21:01:17] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install cescout1001.eqiad.wmnet - https://phabricator.wikimedia.org/T239250 (10Cmjohnson)
[21:05:51] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: (No Need By Date Provided) codfw: rack/setup/install puppetmaster2003.codfw.wmnet - https://phabricator.wikimedia.org/T239732 (10Papaul)
[21:08:24] <icinga-wm>	 RECOVERY - Persistent high iowait on labstore1006 is OK: (C)10 ge (W)5 ge 4.64 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/dashboard/db/labs-monitoring
[21:09:46] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: (No Need By Date Provided) codfw: rack/setup/install puppetmaster2003.codfw.wmnet - https://phabricator.wikimedia.org/T239732 (10Papaul) ` papaul@asw-b-codfw# show | compare           [edit interfaces interface-range vlan-private1-b-codfw]      member ge-8/0/3 {...
[21:09:59] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: (No Need By Date Provided) codfw: rack/setup/install puppetmaster2003.codfw.wmnet - https://phabricator.wikimedia.org/T239732 (10Papaul)
[21:13:15] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Papaul)
[21:16:28] <ori>	 thcipriani: yo, did new scap end up rolling out?
[21:18:04] <thcipriani>	 ori: no, not yet, I still need to do the packaging :(
[21:18:38] <ori>	 ok np, i'm just excited to see what the impact will be. there's no rush
[21:19:25] <thcipriani>	 I will be happy to see that change go out
[21:41:01] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Papaul) ` papaul@fasw-c-codfw# show | compare  [edit interfaces interface-range disabled] -    member ge-0/0/21; -    member ge-1/0/21; [edit interfaces interface-r...
[21:41:12] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Papaul)
[22:35:48] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "actually checked the MACs on DRAC" [puppet] - 10https://gerrit.wikimedia.org/r/564788 (https://phabricator.wikimedia.org/T240882) (owner: 10Papaul)
[22:38:00] <wikibugs>	 10Operations, 10ops-codfw, 10Wikimedia-Logstash, 10Patch-For-Review: (No Need By Date Provided) rack/setup/install logstash202[6-9].codfw.wmnet - https://phabricator.wikimedia.org/T240882 (10Dzahn) ready for OS install
[22:38:01] <wikibugs>	 (03CR) 10Dzahn: "@papaul you can start installing. also ran puppet on install2002 already" [puppet] - 10https://gerrit.wikimedia.org/r/564788 (https://phabricator.wikimedia.org/T240882) (owner: 10Papaul)
[22:39:07] <wikibugs>	 10Operations, 10Phabricator, 10Product-Analytics, 10WMF-NDA-Requests: Access to view #WMF-NDA tasks on Phabricator for jwang - https://phabricator.wikimedia.org/T242805 (10Dzahn)
[22:53:45] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users, researchers & wmf for jennifer wang (jwang) - https://phabricator.wikimedia.org/T242807 (10jwang)
[23:04:38] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install cescout1001.eqiad.wmnet - https://phabricator.wikimedia.org/T239250 (10RobH) a:05RobH→03Cmjohnson
[23:17:53] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[23:19:41] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[23:41:48] <wikibugs>	 10Operations, 10Cleanup, 10Traffic, 10fixcopyright.wikimedia.org, and 4 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) 05Stalled→03Open
[23:43:41] <wikibugs>	 (03PS1) 10Arlolra: Bump Parsoid/PHP cluster memory_limit again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564805 (https://phabricator.wikimedia.org/T239806)
[23:45:15] <wikibugs>	 (03PS2) 10Arlolra: Bump Parsoid/PHP cluster memory_limit again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564805 (https://phabricator.wikimedia.org/T239806)
[23:49:11] <wikibugs>	 (03PS2) 10Cwhite: mtail: track new subscription requests in prometheus [puppet] - 10https://gerrit.wikimedia.org/r/564129 (https://phabricator.wikimedia.org/T236505)
[23:49:41] <wikibugs>	 (03CR) 10Reedy: "How high can we go!?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/564805 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[23:50:57] <icinga-wm>	 PROBLEM - Check systemd state on labweb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:52:47] <icinga-wm>	 RECOVERY - Check systemd state on labweb1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:53:27] <icinga-wm>	 PROBLEM - Check systemd state on labweb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:57:03] <icinga-wm>	 RECOVERY - Check systemd state on labweb1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state