[00:21:52] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on db2052 is CRITICAL: cluster=mysql device=cciss,1 instance=db2052:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2052&var-datasource=codfw%2520prometheus%252Fops
[00:32:37] <wikibugs>	 (03PS27) 10EBernhardson: [WIP] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049
[00:33:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049 (owner: 10EBernhardson)
[00:37:23] <wikibugs>	 (03CR) 10EBernhardson: "this patch is already getting too big, I'm going to split a few parts out that can be applied on their own ahead of time to simplify thing" [puppet] - 10https://gerrit.wikimedia.org/r/440049 (owner: 10EBernhardson)
[01:10:23] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 417.39 seconds
[01:12:13] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 418.71 seconds
[01:15:13] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2074 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.98 seconds
[01:15:52] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.96 seconds
[01:16:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2050 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.19 seconds
[01:16:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.53 seconds
[01:16:13] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2036 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.49 seconds
[01:21:02] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 0.13 seconds
[01:32:06] <wikibugs>	 (03PS28) 10EBernhardson: [WIP] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049
[01:32:08] <wikibugs>	 (03PS1) 10EBernhardson: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498
[01:33:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049 (owner: 10EBernhardson)
[01:33:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 (owner: 10EBernhardson)
[02:12:02] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 383.44 seconds
[02:18:32] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 20.93 seconds
[03:20:23] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2050 is OK: OK slave_sql_lag Replication lag: 0.32 seconds
[03:20:32] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2043 is OK: OK slave_sql_lag Replication lag: 0.20 seconds
[03:20:42] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[03:20:43] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2074 is OK: OK slave_sql_lag Replication lag: 0.45 seconds
[03:21:22] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 0.45 seconds
[03:21:22] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2094 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[03:25:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 710.49 seconds
[03:28:23] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 266.74 seconds
[05:02:16] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1067" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440500
[05:03:53] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1067" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440500 (owner: 10Marostegui)
[05:05:22] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1067" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440500 (owner: 10Marostegui)
[05:06:38] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440501 (https://phabricator.wikimedia.org/T191316)
[05:06:44] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1067 after alter table (duration: 01m 07s)
[05:06:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:08:19] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1067" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440500 (owner: 10Marostegui)
[05:09:55] <wikibugs>	 10Operations, 10ops-codfw: Disk predictive failure on db2052 - https://phabricator.wikimedia.org/T197146#4284534 (10Marostegui) 05Resolved>03Open Another predictive failure for this host, the same disk, as it is an used disk, it is too surprising:  ```       physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS,...
[05:10:36] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440501 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[05:12:10] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440501 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[05:12:52] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440501 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[05:13:24] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1119 for alter table (duration: 00m 57s)
[05:13:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:13:28] <marostegui>	 !log Deploy schema change on db1119 T191316 T192926 T89737 T195193
[05:13:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:13:34] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[05:13:34] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[05:13:35] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[05:13:35] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[05:34:37] <moritzm>	 !log installing gnupg security updates on trusty (Debian already fixed)
[05:34:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:48:12] <wikibugs>	 (03CR) 10Dzahn: [C: 032] DHCP: Change backup2001 MAC address from 1G MAC to 10G MAC [puppet] - 10https://gerrit.wikimedia.org/r/440485 (https://phabricator.wikimedia.org/T196477) (owner: 10Papaul)
[05:50:06] <moritzm>	 !log slow rollout of debmonitor
[05:50:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:51:23] <wikibugs>	 (03PS4) 10Dzahn: DNS: Add production DNS entries for bast2002 [dns] - 10https://gerrit.wikimedia.org/r/439965 (https://phabricator.wikimedia.org/T196665) (owner: 10Papaul)
[05:52:03] <wikibugs>	 (03CR) 10Dzahn: [C: 032] DNS: Add production DNS entries for bast2002 [dns] - 10https://gerrit.wikimedia.org/r/439965 (https://phabricator.wikimedia.org/T196665) (owner: 10Papaul)
[05:55:41] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "this would result in 5 digits in the name. like bast20010" [puppet] - 10https://gerrit.wikimedia.org/r/440360 (https://phabricator.wikimedia.org/T196560) (owner: 10Papaul)
[05:59:05] <wikibugs>	 (03PS4) 10Dzahn: DHCP: Add MAC address and netboot entries for lvs2009 and lvs2010 [puppet] - 10https://gerrit.wikimedia.org/r/440360 (https://phabricator.wikimedia.org/T196560) (owner: 10Papaul)
[06:01:00] <wikibugs>	 (03CR) 10Dzahn: [C: 032] DHCP: Add MAC address and netboot entries for lvs2009 and lvs2010 [puppet] - 10https://gerrit.wikimedia.org/r/440360 (https://phabricator.wikimedia.org/T196560) (owner: 10Papaul)
[06:01:34] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "fixed it. actually adding "2009 and 2010". what recipe are 2007 and 2008 using?" [puppet] - 10https://gerrit.wikimedia.org/r/440360 (https://phabricator.wikimedia.org/T196560) (owner: 10Papaul)
[06:01:40] <wikibugs>	 (03PS5) 10Dzahn: DHCP: Add MAC address and netboot entries for lvs2009 and lvs2010 [puppet] - 10https://gerrit.wikimedia.org/r/440360 (https://phabricator.wikimedia.org/T196560) (owner: 10Papaul)
[06:12:53] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[06:13:04] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876 (owner: 10Krinkle)
[06:16:30] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: monitoring: Remove unused 'graphite_anomaly' command [puppet] - 10https://gerrit.wikimedia.org/r/437365 (owner: 10Krinkle)
[06:17:14] <wikibugs>	 (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for PDNS recursor Prometheus exporters [puppet] - 10https://gerrit.wikimedia.org/r/437949 (https://phabricator.wikimedia.org/T135991)
[06:17:32] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] monitoring: Remove unused 'graphite_anomaly' command [puppet] - 10https://gerrit.wikimedia.org/r/437365 (owner: 10Krinkle)
[06:21:43] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge.
[06:27:17] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: Swap mediawiki.org to use standard docroot naming scheme [puppet] - 10https://gerrit.wikimedia.org/r/421949 (owner: 10Chad)
[06:28:03] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/apt/keys/ubuntucloud.gpg]
[06:58:33] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:59:57] <wikibugs>	 (03CR) 10Elukey: [C: 032] Swap mediawiki.org to use standard docroot naming scheme [puppet] - 10https://gerrit.wikimedia.org/r/421949 (owner: 10Chad)
[07:24:33] <wikibugs>	 (03PS4) 10Ema: reload-vcl: add --separate-vcls [puppet] - 10https://gerrit.wikimedia.org/r/440342 (https://phabricator.wikimedia.org/T164609)
[07:26:02] <wikibugs>	 (03CR) 10Ema: [C: 032] reload-vcl: add --separate-vcls [puppet] - 10https://gerrit.wikimedia.org/r/440342 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[07:31:40] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609#4284629 (10ema)
[07:34:18] <wikibugs>	 (03PS3) 10Volans: admin: Port matrix.py to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/438116 (owner: 10Legoktm)
[07:34:53] <legoktm>	 volans: :D
[07:35:00] <volans>	 :-(
[07:35:03] <volans>	 sorry
[07:35:10] <wikibugs>	 (03CR) 10Volans: [C: 032] admin: Port matrix.py to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/438116 (owner: 10Legoktm)
[07:35:20] <volans>	 I completelty forgot about that :-) done
[07:42:51] <_joe_>	 lol
[07:47:10] <paravoid>	 did we fix the python3 ops/puppet ci yet btw?
[07:47:32] <_joe_>	 I don't know, I'll be honest
[07:48:29] <paravoid>	 https://phabricator.wikimedia.org/T184435 is the task, seems still open
[07:49:34] <_joe_>	 I am aware of the ticket, but I did nothing about it :(
[07:49:54] <volans>	 paravoid: not that I know of
[07:50:47] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "Looks good, minor comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440498 (owner: 10EBernhardson)
[07:51:10] <addshore>	 How can one see how full our memcached cluster is?
[07:52:00] <volans>	 addshore: https://grafana.wikimedia.org/dashboard/db/prometheus-memcached-dc-stats?orgId=1 has some info IIRC
[07:52:31] <addshore>	 I guess Current items is kind of the thing to look at, but that doesn't really say how full
[07:52:33] <_joe_>	 addshore: 100% full
[07:52:46] <_joe_>	 addshore: look at "evictions"
[07:52:48] <addshore>	 49 million is a fair few entries :P
[07:53:23] <_joe_>	 if you have more than 0 evictions, your cluster is somewhat full
[07:53:47] <addshore>	 gotcha
[07:54:03] <addshore>	 what determines level of fullness / when keys start being evicted? number of keys or disk?
[07:54:14] <_joe_>	 disk is untouched
[07:54:21] <_joe_>	 so the answer is somewhat complex
[07:54:35] <addshore>	 as always :)
[07:54:40] <_joe_>	 memcached divides its available memory into slabs for objects of the same size
[07:54:40] <volans>	 for the evictions also per slab is better, given that you might be full on one particular slab and have most evictions for those
[07:54:58] <_joe_>	 volans: nowadays slabs are dynamically allocated
[07:55:16] <_joe_>	 so whenever a slab is full, you start evicting thee
[07:55:32] <addshore>	 Looking at https://wikitech.wikimedia.org/wiki/Memcached I see there is a mcc.php maint script
[07:55:33] <_joe_>	 so as I was saying, slabs get enlarged and shrinked upon need
[07:55:41] <addshore>	 Is there a way to see how many keys exist for a given prefix somehow?
[07:55:42] <_joe_>	 but that works up to a point
[07:55:51] <_joe_>	 addshore: not that I know of
[07:56:00] <_joe_>	 memcached has no querying capabilities
[07:56:07] <addshore>	 ack
[07:56:09] <_joe_>	 you can dump all the keys on all 18 serves
[07:56:16] <addshore>	 that sounds large
[07:56:28] <addshore>	 Is that a bad idea or something that is fine to do?
[07:57:23] <_joe_>	 it is definitely a bad idea
[07:57:41] <addshore>	 I won't be doing that then :)
[07:57:53] <_joe_>	 yeah not on a friday, even
[07:57:58] <_joe_>	 ok, I'm out, bbl
[07:58:03] <addshore>	 o/
[07:58:11] <_joe_>	 elukey can answer more questions in my absence though :)
[07:58:20] <addshore>	 :D
[07:59:13] <addshore>	 elukey: the question is essentially, would doubling (roughly) the number of wikibase cache entries in memcached be a feasible idea. Relates to https://phabricator.wikimedia.org/T197252#4284642
[07:59:33] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Continuous-Integration-Config: Puppet tox: properly lint both Py2 and Py3 files - https://phabricator.wikimedia.org/T184435#3882789 (10Legoktm) In your tox.ini, you can do something like ``` [testenv:flake8] commands = flake8 deps = flake8 basepython = pytho...
[08:02:38] <bawolff>	 We need a phab admin to block https://phabricator.wikimedia.org/p/238482n375/
[08:05:38] <gehel>	 !log rolling restart of elasticsearch eqiad for plugin upgrade - T194245
[08:05:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:43] <stashbot>	 T194245: Implement searching of 'depicts' on commons with the 'quantity' qualifier - https://phabricator.wikimedia.org/T194245
[08:05:48] <elukey>	 addshore: my knowledge about memcached is a bit rusty, I would need to get some knowledge in my brain from swap first :D As first thought I'd check the size of the keys and their distribution, to get an idea about what slabs will be affected. The number of keys itself is a bit generic IIRC to get a precise statement.. it would also be good to know the number of those keys, just to have an idea
[08:05:54] <elukey>	 but it might be tricky
[08:05:57] <elukey>	  about the size
[08:06:09] <elukey>	 if you have patience I'll try to check this afternoon, but no promises :)
[08:06:10] <wikibugs>	 10Operations, 10Puppet, 10AbuseFilter, 10Analytics-Kanban, and 13 others: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4285292 (10238482n375) p:05Normal>03Lowest a:05fgiunchedi>03None SG9tZVBoYWJyaWNhdG9yCk5vIG1lc3NhZ2VzLiBObyBub3RpZmljYXRpb25zL...
[08:06:24] <addshore>	 elukey: I'm in no particular rush :)
[08:08:22] <wikibugs>	 10Operations, 10AbuseFilter, 10Analytics-Kanban, 10Commons, and 14 others: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#4285719 (10238482n375) p:05Triage>03Lowest a:05kaldari>03None SG9tZVBoYWJyaWNhdG9yCk5vIG1lc3NhZ2VzLiBObyBub3RpZmljYXRpb25zLg...
[08:08:26] <addshore>	 bawolff: yes, someone really needs to block that person
[08:08:42] <addshore>	 bawolff: why aren't you just a phab admin?
[08:08:43] <bawolff>	 blocked now
[08:08:47] <addshore>	 :D
[08:08:57] <bawolff>	 volons did the honours
[08:09:21] <bawolff>	 I was thinking I should ask Andre about that when its normal hours ;)
[08:16:03] <addshore>	 bawolff: I'm still getting notifications about that use doing stuff, but I guess that is just phab lag?
[08:16:15] <bawolff>	 I guess so
[08:16:21] * bawolff doesn't know much about phab
[08:16:34] <paladox>	 Yes
[08:26:29] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop::spark2: explicitly require hive client's config [puppet] - 10https://gerrit.wikimedia.org/r/440507
[08:27:17] <wikibugs>	 (03PS2) 10Elukey: profile::hadoop::spark2: explicitly require hive client's config [puppet] - 10https://gerrit.wikimedia.org/r/440507
[08:29:09] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review, 10User-Elukey: Configure puppetdb to export metrics via Prometheus JMX Agent - https://phabricator.wikimedia.org/T184796#4287892 (10Volans) p:05Lowest>03Normal a:03elukey
[08:33:28] <wikibugs>	 (03PS1) 10Ema: varnish::instance: pass -s argument to reload-vcl [puppet] - 10https://gerrit.wikimedia.org/r/440508 (https://phabricator.wikimedia.org/T164609)
[08:35:17] <Nikerabbit>	 now spammer: https://phabricator.wikimedia.org/T197406#4289005 ?
[08:35:26] <Nikerabbit>	 new*
[08:35:55] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Continuous-Integration-Config, 10Security: Puppet tox: properly lint both Py2 and Py3 files - https://phabricator.wikimedia.org/T184435#4289349 (10Volans)
[08:35:58] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Continuous-Integration-Config, 10Security: Puppet tox: properly lint both Py2 and Py3 files - https://phabricator.wikimedia.org/T184435#4289353 (10Volans)
[08:36:00] <wikibugs>	 (03PS2) 10Ema: varnish::instance: pass -s argument to reload-vcl [puppet] - 10https://gerrit.wikimedia.org/r/440508 (https://phabricator.wikimedia.org/T164609)
[08:36:08] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10Security: Puppet hosts with their cert revoked can still run puppet - https://phabricator.wikimedia.org/T184444#4289357 (10Volans)
[08:36:13] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10Security: Puppet hosts with their cert revoked can still run puppet - https://phabricator.wikimedia.org/T184444#4289362 (10Volans) p:05Lowest>03High a:03herron
[08:36:14] <Nikerabbit>	 ah, that was blocked too already
[08:39:56] <wikibugs>	 (03CR) 10Dzahn: [C: 032] Update group photo on people.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/440467 (https://phabricator.wikimedia.org/T197268) (owner: 10Framawiki)
[08:40:18] <wikibugs>	 (03PS2) 10Dzahn: Update group photo on people.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/440467 (https://phabricator.wikimedia.org/T197268) (owner: 10Framawiki)
[08:44:11] <wikibugs>	 10Operations, 10Patch-For-Review: Update people.wikimedia.org with the 2018 Wikimedia hackathon group photo - https://phabricator.wikimedia.org/T197268#4289419 (10Dzahn) 05Open>03Resolved Thanks! The photo has been updated.
[08:45:20] <wikibugs>	 10Operations, 10Puppet, 10Goal, 10Security: Modernize Puppet Configuration Management (2017-18 Q3 Goal) - https://phabricator.wikimedia.org/T184561#4289427 (10Volans)
[08:46:11] <wikibugs>	 10Operations, 10Puppet, 10Security: Plan Puppet 5 upgrade - https://phabricator.wikimedia.org/T184564#4289439 (10Volans)
[08:49:08] <wikibugs>	 10Operations, 10hardware-requests: hardware request for tin replacement - https://phabricator.wikimedia.org/T184481#4289521 (10Dzahn)
[08:49:31] <wikibugs>	 10Operations, 10hardware-requests: hardware request for tin replacement - https://phabricator.wikimedia.org/T184481#3884434 (10Dzahn)
[08:52:25] <wikibugs>	 (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for PDNS recursor Prometheus exporters [puppet] - 10https://gerrit.wikimedia.org/r/437949 (https://phabricator.wikimedia.org/T135991)
[08:53:59] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10Security: Investigate landscape of PuppetDB Frontends and Provision One - https://phabricator.wikimedia.org/T184563#4289952 (10Volans) p:05Lowest>03Normal a:03Volans
[08:54:01] <wikibugs>	 10Operations, 10hardware-requests: hardware request for bast1001 replacement - https://phabricator.wikimedia.org/T184480#4289958 (10Dzahn) p:05Lowest>03Normal a:03RobH
[08:54:03] <wikibugs>	 10Operations, 10hardware-requests: hardware request for bast1001 replacement - https://phabricator.wikimedia.org/T184480#4289964 (10Dzahn)
[08:54:52] <wikibugs>	 10Operations, 10Security: Update people.wikimedia.org with the 2017 Wikimedia hackathon group photo - https://phabricator.wikimedia.org/T184338#4289970 (10Dzahn) p:05Lowest>03Normal
[08:55:16] <wikibugs>	 10Operations: Update people.wikimedia.org with the 2017 Wikimedia hackathon group photo - https://phabricator.wikimedia.org/T184338#4289977 (10Dzahn)
[08:58:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for PDNS recursor Prometheus exporters [puppet] - 10https://gerrit.wikimedia.org/r/437949 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[08:58:18] <wikibugs>	 10Operations, 10Puppet, 10DBA: Move mariadb_maintenance away from terbium/wasat (mediawiki_maintenance) - https://phabricator.wikimedia.org/T184797#4289982 (10Dzahn) p:05Lowest>03Normal a:03jcrespo
[09:00:25] <wikibugs>	 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#4289989 (10Dzahn) p:05Lowest>03Normal a:03kaldari
[09:02:15] <wikibugs>	 (03PS1) 10Muehlenhoff: Restrict prometheus-pdns-rec-exporter auto restart to jessie and later [puppet] - 10https://gerrit.wikimedia.org/r/440509
[09:03:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Restrict prometheus-pdns-rec-exporter auto restart to jessie and later [puppet] - 10https://gerrit.wikimedia.org/r/440509 (owner: 10Muehlenhoff)
[09:03:56] <bawolff>	 !log deploy patch T197279
[09:03:57] <wikibugs>	 (03PS2) 10Muehlenhoff: Restrict prometheus-pdns-rec-exporter auto restart to jessie and later [puppet] - 10https://gerrit.wikimedia.org/r/440509
[09:04:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:04:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Restrict prometheus-pdns-rec-exporter auto restart to jessie and later [puppet] - 10https://gerrit.wikimedia.org/r/440509 (owner: 10Muehlenhoff)
[09:05:22] <wikibugs>	 (03PS3) 10Muehlenhoff: Restrict prometheus-pdns-rec-exporter auto restart to jessie and later [puppet] - 10https://gerrit.wikimedia.org/r/440509
[09:05:32] <icinga-wm>	 PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:40] <wikibugs>	 (03PS1) 10Aklapper: Phabricator: Block vandalism IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/440510
[09:06:19] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10Security, 10User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4290003 (10Volans) a:03fgiunchedi
[09:07:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Restrict prometheus-pdns-rec-exporter auto restart to jessie and later [puppet] - 10https://gerrit.wikimedia.org/r/440509 (owner: 10Muehlenhoff)
[09:07:32] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4290015 (10Volans)
[09:08:17] <wikibugs>	 10Operations, 10Mail, 10Security: All IP addresses used for sending emails by Wikimedia's services - https://phabricator.wikimedia.org/T184555#4290021 (10Dzahn)
[09:09:19] <wikibugs>	 10Operations, 10Mail: All IP addresses used for sending emails by Wikimedia's services - https://phabricator.wikimedia.org/T184555#4290035 (10Dzahn)
[09:09:34] <wikibugs>	 (03CR) 10Dnvjdvsj: [C: 04-1] "This is not a good solution, there's a proxies every where and i'll come back very soon" [puppet] - 10https://gerrit.wikimedia.org/r/440510 (owner: 10Aklapper)
[09:10:12] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received
[09:10:33] <icinga-wm>	 RECOVERY - puppet last run on labservices1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:11:13] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[09:15:33] <wikibugs>	 (03PS2) 10Dnvjdvsj: Phabricator: Block vandalism IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/440510 (owner: 10Aklapper)
[09:16:03] <icinga-wm>	 RECOVERY - Disk space on furud is OK: DISK OK
[09:20:46] <godog>	 !log fully remove ms-be1036 from swift due to hw failure - T196873
[09:20:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:51] <stashbot>	 T196873: ms-be1036 in power off status, not responsive to power on commands - https://phabricator.wikimedia.org/T196873
[09:26:25] <wikibugs>	 10Operations, 10Gadgets: test.wp shows the gadgets from test2.wp - https://phabricator.wikimedia.org/T197450#4290095 (10TheDJ)
[09:29:28] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review, 10Security: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4290110 (10Vgutierrez) p:05Lowest>03Normal a:03Cmjohnson
[09:30:31] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10MediaWiki-JobQueue, and 3 others: Clean up cpjobqueue metrics - https://phabricator.wikimedia.org/T196067#4290118 (10fgiunchedi) List of metrics at https://phabricator.wikimedia.org/P7262, I'll remove those if the list looks good.
[09:32:45] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10MediaWiki-JobQueue, and 3 others: Clean up cpjobqueue metrics - https://phabricator.wikimedia.org/T196067#4290128 (10Pchelolo) The list is insanely long, I've poked around and didn't find anything that should remain. LGTM.
[09:35:11] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4290145 (10Vgutierrez)
[09:43:52] <jynus>	 !log reducing temp. db2040 consistency to speed up slave lag catch up
[09:43:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:17] <wikibugs>	 (03PS3) 10Ema: varnish::instance: pass -s argument to reload-vcl [puppet] - 10https://gerrit.wikimedia.org/r/440508 (https://phabricator.wikimedia.org/T164609)
[09:45:41] <jynus>	 !log reenabling db2048 consistency after slaves caught up
[09:45:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] varnish::instance: pass -s argument to reload-vcl [puppet] - 10https://gerrit.wikimedia.org/r/440508 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[09:46:52] <wikibugs>	 10Operations, 10ops-esams, 10Security: To purchase for next esams visit - https://phabricator.wikimedia.org/T184522#4290193 (10Dzahn)
[09:47:12] <wikibugs>	 10Operations, 10ops-esams: To purchase for next esams visit - https://phabricator.wikimedia.org/T184522#4290197 (10Dzahn)
[09:47:29] <wikibugs>	 (03PS4) 10Ema: varnish::instance: pass -s argument to reload-vcl [puppet] - 10https://gerrit.wikimedia.org/r/440508 (https://phabricator.wikimedia.org/T164609)
[09:48:50] <godog>	 !log delete cpjobqueue metrics older than 10d - T196067
[09:48:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:55] <stashbot>	 T196067: Clean up cpjobqueue metrics - https://phabricator.wikimedia.org/T196067
[09:49:15] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10MediaWiki-JobQueue, and 3 others: Clean up cpjobqueue metrics - https://phabricator.wikimedia.org/T196067#4290234 (10fgiunchedi) 05Open>03Resolved
[09:49:22] <wikibugs>	 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10Security: Serve one production service via Kubernetes - https://phabricator.wikimedia.org/T184462#4290237 (10akosiaris)
[09:50:49] <wikibugs>	 10Operations, 10Prod-Kubernetes, 10Kubernetes: Serve one production service via Kubernetes - https://phabricator.wikimedia.org/T184462#4290243 (10akosiaris)
[09:51:00] <wikibugs>	 (03PS3) 10Paladox: Phabricator: Block vandalism IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/440510 (owner: 10Aklapper)
[09:51:12] <wikibugs>	 (03CR) 10Paladox: "Reverting spam" [puppet] - 10https://gerrit.wikimedia.org/r/440510 (owner: 10Aklapper)
[09:51:45] <wikibugs>	 (03PS4) 10Dnvjdvsj: Phabricator: Block vandalism IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/440510 (owner: 10Aklapper)
[09:54:32] <wikibugs>	 (03PS2) 10Ema: cache::text: ship cache_misc VCL [puppet] - 10https://gerrit.wikimedia.org/r/440157 (https://phabricator.wikimedia.org/T164609)
[09:55:05] <paladox>	 Can some body block that spammer on gerrit ^^
[09:55:22] <paladox>	 bawolff: ^^
[09:55:30] <paladox>	 Spammer is on https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/440510/
[09:55:54] <bawolff>	 done
[09:56:18] <paladox>	 Thanks
[09:58:03] <wikibugs>	 (03PS5) 10Paladox: Phabricator: Block vandalism IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/440510 (owner: 10Aklapper)
[09:58:07] <wikibugs>	 (03PS6) 10Aklapper: Phabricator: Block vandalism IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/440510
[10:03:07] <wikibugs>	 (03PS4) 10Addshore: Load WikibaseLexeme on all of group0 (again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438006 (https://phabricator.wikimedia.org/T197454)
[10:03:20] <wikibugs>	 (03PS4) 10Addshore: Load WikibaseLexeme on testwiki (again again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438005 (https://phabricator.wikimedia.org/T197454)
[10:03:32] <wikibugs>	 (03PS4) 10Addshore: Load WikibaseLexeme on all wikidata clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436499 (https://phabricator.wikimedia.org/T195615)
[10:03:47] <wikibugs>	 (03PS4) 10Addshore: Load WikibaseLexeme on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436498 (https://phabricator.wikimedia.org/T195615)
[10:04:09] <wikibugs>	 (03PS5) 10Addshore: Load WikibaseLexeme on testwiki (again again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438005 (https://phabricator.wikimedia.org/T197454)
[10:04:15] <wikibugs>	 (03PS5) 10Addshore: Load WikibaseLexeme on all of group0 (again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438006 (https://phabricator.wikimedia.org/T197454)
[10:04:21] <wikibugs>	 (03PS5) 10Addshore: Load WikibaseLexeme on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436498 (https://phabricator.wikimedia.org/T195615)
[10:04:26] <wikibugs>	 (03PS5) 10Addshore: Load WikibaseLexeme on all wikidata clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436499 (https://phabricator.wikimedia.org/T195615)
[10:06:55] <wikibugs>	 (03PS5) 10Ema: varnish::instance: separate VCLs support [puppet] - 10https://gerrit.wikimedia.org/r/440508 (https://phabricator.wikimedia.org/T164609)
[10:08:41] <wikibugs>	 (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler02/11530/" [puppet] - 10https://gerrit.wikimedia.org/r/440508 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[10:09:43] <wikibugs>	 10Operations, 10Packaging, 10Patch-For-Review, 10Release: SCAP: Upload debian package version 3.7.5-1 - https://phabricator.wikimedia.org/T184774#4290385 (10akosiaris) p:05Lowest>03High a:03fgiunchedi
[10:15:09] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: switch network port 2/0/3 (frdb1003) back to administration-vlan - https://phabricator.wikimedia.org/T184723#4290426 (10akosiaris) p:05Lowest>03Triage a:03ayounsi
[10:17:48] <wikibugs>	 10Operations, 10ops-esams, 10Security: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T184528#4290472 (10Vgutierrez)
[10:18:04] <wikibugs>	 10Operations, 10ops-esams: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T184528#4290475 (10Vgutierrez)
[10:18:43] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[10:18:47] <wikibugs>	 10Operations, 10ops-eqiad, 10Security: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T184514#4290480 (10Vgutierrez)
[10:18:55] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T184514#4290483 (10Vgutierrez)
[10:19:54] <wikibugs>	 10Operations, 10ops-esams, 10Security: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T184533#4290486 (10Vgutierrez)
[10:20:06] <wikibugs>	 10Operations, 10ops-esams, 10Security: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T184530#4290490 (10Vgutierrez)
[10:20:14] <wikibugs>	 10Operations, 10ops-esams: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T184533#4290493 (10Vgutierrez)
[10:20:30] <wikibugs>	 10Operations, 10ops-esams: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T184530#4290495 (10Vgutierrez)
[10:21:52] <icinga-wm>	 PROBLEM - SSH on ms-be1034 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:22:03] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:22:33] <wikibugs>	 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4290498 (10elukey) I had a chat with Ottomata and I think that the spare could work for the moment. The warranty will expire soonish so in case we'll see that a m...
[10:22:43] <icinga-wm>	 PROBLEM - HP RAID on ms-be1034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[10:24:06] <wikibugs>	 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: prometheus-blazegraph-exporter failing to start after reboot - https://phabricator.wikimedia.org/T184434#4290503 (10Vgutierrez) p:05Lowest>03Normal a:03fgiunchedi
[10:24:16] <wikibugs>	 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: prometheus-blazegraph-exporter failing to start after reboot - https://phabricator.wikimedia.org/T184434#4290510 (10Vgutierrez)
[10:25:15] <wikibugs>	 (03PS7) 10ArielGlenn: allow writeuptopageid to write multiple output files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/436511 (https://phabricator.wikimedia.org/T196063)
[10:27:52] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440515
[10:28:47] <wikibugs>	 (03PS8) 10ArielGlenn: allow writeuptopageid to write multiple output files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/436511 (https://phabricator.wikimedia.org/T196063)
[10:28:49] <addshore>	 mutante: get into the office? or?
[10:29:22] <icinga-wm>	 RECOVERY - SSH on ms-be1034 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0)
[10:29:28] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440515 (owner: 10Marostegui)
[10:29:35] <mutante>	 addshore: i got into 1st floor and then had to leaave again for an errand and then want to come back
[10:30:09] <addshore>	 okay! everyone is in a meeting currently, raz_WMDE want to look for you but didnt find you :D
[10:30:46] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440515 (owner: 10Marostegui)
[10:30:51] <mutante>	 i am currently not there, trying to get my passport. but i will come back
[10:30:54] <mutante>	 thanks addshore
[10:31:13] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440515 (owner: 10Marostegui)
[10:32:00] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1119 after alter table (duration: 00m 58s)
[10:32:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:12] <icinga-wm>	 PROBLEM - Disk space on elastic1020 is CRITICAL: DISK CRITICAL - free space: /srv 61509 MB (12% inode=99%)
[10:35:12] <wikibugs>	 10Operations, 10Page-Previews, 10RESTBase, 10Traffic, and 2 others: Cached page previews not shown when refreshed - https://phabricator.wikimedia.org/T184534#4290582 (10Vgutierrez)
[10:35:21] <wikibugs>	 10Operations, 10Page-Previews, 10RESTBase, 10Traffic, and 2 others: Cached page previews not shown when refreshed - https://phabricator.wikimedia.org/T184534#4290585 (10Vgutierrez)
[10:37:53] <icinga-wm>	 PROBLEM - IPMI Sensor Status on ms-be1034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[10:38:02] <wikibugs>	 (03PS7) 10Aklapper: Phabricator: Block vandalism IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/440510
[10:38:19] <volans>	 godog: ms-be1034 ^^^
[10:38:34] <wikibugs>	 (03CR) 10Aklapper: [C: 04-1] "Garrrr, why the heck is ".gitreview" included here though I didn't use "--all"?" [puppet] - 10https://gerrit.wikimedia.org/r/440510 (owner: 10Aklapper)
[10:39:16] <godog>	 volans: thanks! I'll take a look
[10:42:10] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Request access to analytics cluster for bawolff - https://phabricator.wikimedia.org/T184582#4290705 (10Legoktm) p:05Lowest>03Normal a:03RobH
[10:42:20] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Request access to analytics cluster for bawolff - https://phabricator.wikimedia.org/T184582#4290711 (10Legoktm)
[10:44:02] <icinga-wm>	 RECOVERY - HP RAID on ms-be1034 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK
[10:44:34] <wikibugs>	 (03PS9) 10ArielGlenn: allow writeuptopageid to write multiple output files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/436511 (https://phabricator.wikimedia.org/T196063)
[10:49:23] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[10:51:53] <icinga-wm>	 RECOVERY - Disk space on elastic1020 is OK: DISK OK
[10:52:43] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:00:53] <icinga-wm>	 PROBLEM - SSH on ms-be1035 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:01:22] <icinga-wm>	 PROBLEM - HP RAID on ms-be1035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[11:01:52] <icinga-wm>	 RECOVERY - SSH on ms-be1035 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0)
[11:03:40] <wikibugs>	 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4290888 (10Vgutierrez) I've just tested a new build of librdkafka (0.11.3-1~bpo8+1+wikimedia2) on cp1008 that includes the new TLS configuration...
[11:05:53] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team (Current), and 2 others: Puppet broken on deployment-ores01 due to missing hieradata - https://phabricator.wikimedia.org/T184478#4290893 (10Ladsgroup) a:03Ladsgroup
[11:07:43] <icinga-wm>	 RECOVERY - IPMI Sensor Status on ms-be1034 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK
[11:08:22] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2061 is OK: OK slave_sql_lag Replication lag: 0.46 seconds
[11:12:33] <icinga-wm>	 RECOVERY - HP RAID on ms-be1035 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK
[11:18:17] <wikibugs>	 (03PS1) 10Vgutierrez: kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993)
[11:23:23] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received
[11:24:11] <wikibugs>	 (03PS2) 10Vgutierrez: kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993)
[11:24:32] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[11:26:39] <wikibugs>	 (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler02/11532/" [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[11:29:12] <wikibugs>	 10Operations, 10Gadgets: test.wp shows the gadgets from test2.wp - https://phabricator.wikimedia.org/T197450#4290095 (10Legoktm) I'm randomly guessing that this could be related to the MCR stuff? https://lists.wikimedia.org/pipermail/wikitech-l/2018-June/090206.html  For this to happen somehow the test2 messag...
[11:36:42] <icinga-wm>	 PROBLEM - SSH on ms-be1035 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:38:34] <wikibugs>	 10Operations, 10MediaWiki-Cache: test.wp is using test2.wp's message cache - https://phabricator.wikimedia.org/T197450#4290095 (10Legoktm) I did a bit of debugging with eval.php, and this appears to affect all overridden system messages on test.wp - they're using the test2.wp version.
[11:39:13] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[11:39:52] <icinga-wm>	 RECOVERY - SSH on ms-be1035 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0)
[11:49:22] <wikibugs>	 10Operations, 10Developer-Relations, 10Discourse, 10Security: Discourse migration from wmflabs to production - https://phabricator.wikimedia.org/T184461#4291256 (10Ladsgroup)
[11:49:23] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[11:49:40] <wikibugs>	 10Operations, 10Developer-Relations, 10Discourse: Discourse migration from wmflabs to production - https://phabricator.wikimedia.org/T184461#4291260 (10Ladsgroup)
[11:51:17] <wikibugs>	 (03PS4) 10Rduran: [WIP] Add unit tests for transfer.py and CumminExecution [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/437503
[11:52:42] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:55:22] <icinga-wm>	 PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received
[11:56:22] <icinga-wm>	 RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy
[12:01:02] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[12:10:22] <godog>	 [{exception_id}] {exception_url} Wikimedia\Rdbms\DBQueryError from line 1443 of /srv/mediawiki/php-1.32.0-wmf.8/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema upda
[12:10:26] <godog>	 exception ^
[12:11:03] <ema>	 starting at 11:35ish https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen&from=1529061367667&to=1529064510211
[12:12:17] <godog>	 on commons looks like
[12:13:00] <godog>	 marostegui: could be db1119 repool?
[12:13:02] <papaul>	 !log OS install on bast2002
[12:13:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:43] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4291377 (10Papaul)
[12:19:52] <ema>	 the errors seem to be of the type "Error: 1205 Lock wait timeout exceeded; try restarting transaction"
[12:20:46] <ema>	 CC: jynus 
[12:22:16] <ema>	 (10.64.48.23, so db1068)
[12:23:58] <anomie>	 jynus: FYI, that long-running maintenance script is finished now. I don't have a timestamp though, so I don't know how close it came to my 16-hour estimate from yesterday. ;)
[12:24:01] <godog>	 ah, perhaps some background job perhaps
[12:25:06] <wikibugs>	 10Operations, 10DBA, 10MediaWiki-Database: Preserve InnoDB table auto_increment on restart - https://phabricator.wikimedia.org/T135851#4291395 (10jcrespo) This is now fixed on MySQL 8.0 https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html#innodb-auto-increment-initialization
[12:25:53] <jynus>	 anomie: I am disapointed you don't have micro-second precision
[12:26:13] <jynus>	 anomie: sadly, lag will continue for a while until everthing catches up
[12:27:10] <jynus>	 I would also like you to discuss with performance at some point the impact on maintenance tasks when we go dual active-active dcs
[12:27:38] <jynus>	 I have some doubts how that is going to scale, I have pending to write a task/RFC on that
[12:28:18] <jynus>	 I forgot the ":-P" on my first line, hope the context was clear
[12:30:02] <jynus>	 !log reenabling db2040 consistency after slaves caught up
[12:30:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:30:59] <wikibugs>	 10Operations, 10Gadgets, 10MediaWiki-Cache, 10Performance-Team: test.wp is using test2.wp's message cache - https://phabricator.wikimedia.org/T197450#4291412 (10Krinkle)
[12:30:59] <ema>	 jynus: are the exceptions mentioned above somehow related to lag?
[12:31:56] <jynus>	 where?
[12:32:23] <ema>	 < ema> starting at 11:35ish https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen&from=1529061367667&to=1529064510211
[12:32:30] <ema>	 < ema> the errors seem to be of the type "Error: 1205 Lock wait timeout exceeded; try restarting transaction"
[12:32:37] <ema>	 < ema> (10.64.48.23, so db1068)
[12:32:44] <jynus>	 checking
[12:33:58] <jynus>	 it is the master not the replica
[12:34:10] <jynus>	 jobqueue
[12:34:28] <jynus>	 Lock wait timeout exceeded
[12:34:39] <jynus>	 Issues with HTMLCacheUpdateJob::invalidateTitles
[12:36:47] <jynus>	 it could be the a cron job that is running now
[12:37:14] <_joe_>	 jynus:  uhm let me check
[12:38:22] <jynus>	 !log killing refresh counts on commonswiki
[12:38:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:23] <jynus>	 why not ping me earlier?
[12:39:28] <_joe_>	 Pchelolo: I don't see an abnormal number of jobs on htmlCacheUpdate right now
[12:39:56] <jynus>	 I think that was the consequence, not the cause
[12:40:16] <_joe_>	 jynus: cpjobqueue is processing 50 job/s across all wikis, that should be ok
[12:40:25] <_joe_>	 but of course, maybe it's not
[12:40:27] <jynus>	 refresh counts is creating a lock on the master while running a count(*)
[12:40:43] <_joe_>	 so the conjob?
[12:40:56] <jynus>	 I am going to create a task
[12:41:46] <jynus>	 and I think it is anomie's fault, actually :-)
[12:41:58] <jynus>	 because it is not a cron :-)
[12:42:18] <jynus>	 (hope you don't take that seriously, anomie)
[12:42:34] <wikibugs>	 10Operations, 10Gadgets, 10MediaWiki-Cache, 10Performance-Team: test.wp is using test2.wp's message cache - https://phabricator.wikimedia.org/T197450#4291453 (10Legoktm) a:03Legoktm Scratch that. It seems more likely that this was caused by the mcrouter deployment...
[12:43:06] <ema>	 jynus: I've mentioned you at 12:20, maybe you didn't get the ping?
[12:43:30] <jynus>	 I didn't
[12:43:32] <jynus>	 sorry
[12:44:03] <wikibugs>	 10Operations, 10ops-codfw: ms-be2023 fails to (re)boot - https://phabricator.wikimedia.org/T184785#4291471 (10Aklapper) p:05Lowest>03Triage
[12:44:08] <jynus>	 another ping masked it
[12:44:15] <ema>	 ah
[12:44:27] <jynus>	 we are ok now (I think)
[12:44:33] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[12:44:35] <wikibugs>	 10Operations, 10ops-eqiad: Hardware check on mw1271 - https://phabricator.wikimedia.org/T184722#4291514 (10Aklapper) p:05Lowest>03Triage
[12:45:17] <ema>	 looks like :)
[12:45:20] <wikibugs>	 10Operations, 10procurement: Give access to S4 (procurement tasks) to Erika Bjune - https://phabricator.wikimedia.org/T184617#4291586 (10Aklapper) p:05Lowest>03Triage
[12:45:42] <wikibugs>	 (03PS1) 10Legoktm: Make sure that mcrouter BagOStuff goes through ObjectCache::newFromParams() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440526 (https://phabricator.wikimedia.org/T197450)
[12:46:42] <anomie>	 jynus: Maybe an hour ago I manually ran Category::refreshCounts() on a bunch of categories to try to clean up after the deadlock bug. When I noticed my quick script to do that was pausing for too long on some big Commons categories, I killed it (although now that I think about it I don't know if that killed the DB queries, since PHP→MySQL can be dumb that way) and rewrote it to skip any categories that had too many rows.
[12:47:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Make sure that mcrouter BagOStuff goes through ObjectCache::newFromParams() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440526 (https://phabricator.wikimedia.org/T197450) (owner: 10Legoktm)
[12:47:18] <wikibugs>	 10Operations, 10ops-codfw: ms-be2023 fails to (re)boot - https://phabricator.wikimedia.org/T184785#4291614 (10Aklapper) a:03fgiunchedi
[12:47:47] <wikibugs>	 10Operations, 10ops-eqiad: Hardware check on mw1271 - https://phabricator.wikimedia.org/T184722#4291624 (10Aklapper) a:03Cmjohnson
[12:47:58] <jynus>	 https://phabricator.wikimedia.org/T195397#4291625
[12:48:16] <jynus>	 anomie: please call me at that moment- it doesn't hurt to check
[12:48:18] <wikibugs>	 (03PS2) 10Legoktm: Make sure that mcrouter BagOStuff goes through ObjectCache::newFromParams() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440526 (https://phabricator.wikimedia.org/T197450)
[12:48:47] <jynus>	 good news is that Ibelive most impacted requests were jobqueue
[12:49:01] <jynus>	 I am checking how many non-jobqueue requests were impacted, if any
[12:49:06] <jynus>	 *now
[12:49:15] <_joe_>	 jynus: <3
[12:49:30] <jynus>	 so jobqueue on the receiving end this time :-)
[12:49:48] <_joe_>	 and it's ok, it will retry
[12:50:10] <_joe_>	 https://grafana.wikimedia.org/dashboard/db/jobqueue-eventbus?orgId=1&from=now-3h&to=now&var-site=eqiad&var-type=htmlCacheUpdate
[12:50:17] <_joe_>	 see the retry numbes
[12:50:20] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Make sure that mcrouter BagOStuff goes through ObjectCache::newFromParams() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440526 (https://phabricator.wikimedia.org/T197450) (owner: 10Legoktm)
[12:50:21] <_joe_>	 *numbers
[12:50:52] <jynus>	 these are cleaned up error numbers https://logstash.wikimedia.org/goto/12c11ff863504b8def5d22df42088718
[12:51:04] <jynus>	 but that would include legimimate errors that are unrelated
[12:51:22] <jynus>	 5000 job queue
[12:51:27] <jynus>	 around 1000 others
[12:52:02] <wikibugs>	 (03Merged) 10jenkins-bot: Make sure that mcrouter BagOStuff goes through ObjectCache::newFromParams() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440526 (https://phabricator.wikimedia.org/T197450) (owner: 10Legoktm)
[12:52:20] <wikibugs>	 (03CR) 10jenkins-bot: Make sure that mcrouter BagOStuff goes through ObjectCache::newFromParams() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440526 (https://phabricator.wikimedia.org/T197450) (owner: 10Legoktm)
[12:52:37] <jynus>	 is there a way to see edits numbers per wiki?
[12:52:46] <jynus>	 I guess recentchanges is the most direct
[12:53:33] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received
[12:55:39] <legoktm>	 jynus: https://wikipulse.herokuapp.com/
[12:55:42] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[12:56:00] <jynus>	 legoktm: I wanted historics
[12:56:12] <legoktm>	 ah
[12:56:15] <legoktm>	 I read too fast
[12:56:18] <jynus>	 like https://grafana.wikimedia.org/dashboard/db/edit-count
[12:56:23] <logmsgbot>	 !log legoktm@deploy1001 Synchronized wmf-config/mc.php: Make sure that mcrouter BagOStuff goes through ObjectCache::newFromParams() - T197450 (duration: 00m 57s)
[12:56:23] <jynus>	 but per wiki or per section
[12:56:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:27] <stashbot>	 T197450: test.wp is using test2.wp's message cache - https://phabricator.wikimedia.org/T197450
[12:56:51] <jynus>	 I guess I should be the one implementing that
[12:58:07] <jynus>	 I don't see a huge amount os safe failures, but I think at least some slowdown happened
[13:01:32] <wikibugs>	 10Operations, 10AbuseFilter, 10Analytics-Kanban, 10DBA, and 12 others: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#4291756 (10Aklapper) a:03jcrespo
[13:01:38] <wikibugs>	 10Operations, 10ops-eqiad, 10AbuseFilter, 10Analytics-Kanban, and 14 others: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4291755 (10Aklapper) a:03Cmjohnson
[13:02:16] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on ms-be2023 - https://phabricator.wikimedia.org/T184787#4291773 (10Aklapper) p:05Lowest>03Normal
[13:02:33] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4291794 (10Aklapper) p:05Lowest>03Normal
[13:02:35] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4265037 (10Papaul) a:05Papaul>03Dzahn @Dzahn all yours
[13:02:42] <wikibugs>	 10Operations, 10DBA, 10Goal: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#4291798 (10Aklapper) p:05Lowest>03Normal
[13:03:00] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request for Tonina to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T184620#4291826 (10Aklapper) p:05Lowest>03Normal
[13:03:49] <jynus>	 I am not seeing a huge impact on edits
[13:04:40] <jynus>	 https://phabricator.wikimedia.org/P7264
[13:06:53] <wikibugs>	 10Operations, 10AbuseFilter, 10Analytics-Kanban, 10Data-release, and 13 others: Alert instrumentation returning 500 errors - https://phabricator.wikimedia.org/T184721#4291838 (10Aklapper) a:03ema
[13:07:41] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: mw2140 unresponsive, mgmt not accessible - https://phabricator.wikimedia.org/T184788#4291848 (10Aklapper) a:03elukey
[13:07:49] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: mw2140 unresponsive, mgmt not accessible - https://phabricator.wikimedia.org/T184788#4291850 (10Aklapper) p:05Lowest>03High
[13:08:02] <wikibugs>	 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: pybal's "can-depool" logic only takes downServers into account - https://phabricator.wikimedia.org/T184715#4291867 (10Aklapper) p:05Lowest>03High
[13:08:08] <wikibugs>	 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Alert instrumentation returning 500 errors - https://phabricator.wikimedia.org/T184721#4291865 (10Aklapper) p:05Lowest>03High
[13:09:00] <Cyberpower678>	 andre__: ping
[13:09:17] <wikibugs>	 10Operations, 10Gadgets, 10MediaWiki-Cache, 10Performance-Team, 10Patch-For-Review: test.wp is using test2.wp's message cache - https://phabricator.wikimedia.org/T197450#4291900 (10Legoktm) 05Open>03Resolved OK, so this is fixed, but some of the core messages are missing - if someone wants to do a fu...
[13:09:23] <andre__>	 Cyberpower678, pong?
[13:09:44] <Cyberpower678>	 andre__: any chance I could get the needed permissions to clean up some of the phab tickets?
[13:09:58] <andre__>	 Cyberpower678: Permissions for what?
[13:10:24] <Cyberpower678>	 andre__: unset the security policy, delete the spam comments, etc...
[13:10:42] <andre__>	 Cyberpower678: See https://www.mediawiki.org/wiki/Wikimedia_Security_Team/Policy/Access_To_Security_Issues
[13:10:54] <andre__>	 Deleting comments requires phab admin rights
[13:11:28] <wikibugs>	 (03PS8) 10Paladox: Phabricator: Block vandalism IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/440510 (owner: 10Aklapper)
[13:12:08] <Cyberpower678>	 andre__: is it possible to gain temporary access to these rights.  A bunch of my tickets are a real mess now.
[13:12:21] <andre__>	 see the link I provided
[13:12:34] <andre__>	 It's not that I'm in the Security team to decide. We're cleaning up.
[13:14:48] <wikibugs>	 (03CR) 10Legoktm: "Followup: Change-Id: I44fbaf222e5082188ae3cd12574367abdb41e651" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436252 (owner: 10Aaron Schulz)
[13:14:59] <Cyberpower678>	 andre__: I sent you a PM
[13:16:42] <andre__>	 Cyberpower678, yeah, so? :)
[13:16:57] <andre__>	 It's a task. Okay. Not sure what you're expecting from me.
[13:17:03] <Cyberpower678>	 Just wanted to share the ticket with you. :-)
[13:17:13] <Cyberpower678>	 Privately of course. :p
[13:17:48] <wikibugs>	 (03PS1) 10Elukey: Move the varnishkafka submodule to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440530
[13:18:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move the varnishkafka submodule to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440530 (owner: 10Elukey)
[13:18:42] <elukey>	 yep yep yep
[13:23:05] <wikibugs>	 10Operations, 10netops: Rack/Setup new codfw QFX5100 10G switch - https://phabricator.wikimedia.org/T197147#4291942 (10ayounsi)
[13:24:19] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] "LGTM, but it's late friday, let's avoid merging on a Friday. Furthermore, next week is the SRE summit so let's schedule this upgrade for T" [puppet] - 10https://gerrit.wikimedia.org/r/438135 (https://phabricator.wikimedia.org/T194342) (owner: 10KartikMistry)
[13:25:13] <gehel>	 d
[13:26:30] <wikibugs>	 10Operations, 10netops: Rack/Setup new codfw QFX5100 10G switch - https://phabricator.wikimedia.org/T197147#4291949 (10ayounsi)
[13:28:17] <wikibugs>	 (03PS7) 10Alexandros Kosiaris: Prep to tighten PuppetDB access control - log client certificate details [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[13:28:21] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Prep to tighten PuppetDB access control - log client certificate details [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[13:29:14] <wikibugs>	 10Operations, 10netops: Rack/Setup new codfw QFX5100 10G switch - https://phabricator.wikimedia.org/T197147#4291955 (10ayounsi)
[13:29:49] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Description - https://phabricator.wikimedia.org/T184455#4291973 (10Aklapper)
[13:30:55] <_joe_>	 akosiaris: on a friday before leaving for a week?
[13:31:26] <_joe_>	 thanks, sir :P
[13:32:06] <akosiaris>	 yw
[13:33:16] <akosiaris>	 I see nothing but NONE in the logs whoever
[13:33:18] <akosiaris>	 however
[13:34:01] <akosiaris>	 10.64.48.45 - - - NONE - - [15/Jun/2018:13:33:39 +0000] blah blah
[13:34:20] <akosiaris>	 so ssl_client_s_dn is - ?
[13:35:28] <akosiaris>	 hm NONE means no certificate was present, makes sense
[13:35:43] <_joe_>	 so the masters don't sent the request with a client cert apparently
[13:35:47] <_joe_>	 *send
[13:35:56] <akosiaris>	 yup
[13:36:09] <wikibugs>	 (03CR) 10Elukey: "pcc looks good https://puppet-compiler.wmflabs.org/compiler02/11533/" [puppet] - 10https://gerrit.wikimedia.org/r/440530 (owner: 10Elukey)
[13:36:28] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "Log line example" [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[13:38:22] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] scap: add service name to restart on deploy [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/440368 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[13:45:49] <wikibugs>	 (03PS2) 10Elukey: Move the varnishkafka submodule to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440530
[13:46:04] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] scap: add service name to restart on deploy [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/440368 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[13:46:11] <wikibugs>	 (03CR) 10Elukey: [V: 032 C: 032] Move the varnishkafka submodule to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440530 (owner: 10Elukey)
[13:47:56] <_joe_>	 elukey: it worked on the puppetmasters
[13:48:11] <elukey>	 running puppet on cp1008
[13:48:11] <_joe_>	  /var/lib/git/operations/puppet/modules/varnishkafka is no more
[13:48:20] <elukey>	 \o/
[13:48:51] <elukey>	 vgutierrez: o/ - can i run puppet on pink unicorn
[13:48:52] <elukey>	 ?
[13:49:10] <elukey>	 "test librdkafka1_0.11.3-1~bpo8+1+wikimedia2_amd64.deb"
[13:49:45] <_joe_>	 elukey: I'm running on cp1045
[13:49:53] <_joe_>	 worse that can happen is a compilation error
[13:49:58] <_joe_>	 and no, it compiles
[13:50:02] <_joe_>	 \o/
[13:50:08] <elukey>	 niiiceeeeeeeeeee
[13:50:15] <_joe_>	 we found a way to get away from submodules without pain
[13:50:36] <elukey>	 s/we/you but ok :D
[13:51:14] <elukey>	 this will help a lot the merge of the others submodules into ops/puppet
[13:51:28] <elukey>	 _joe_ shall we also try to move the module back into its place?
[13:51:43] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2060 - https://phabricator.wikimedia.org/T184464#4292288 (10Aklapper)
[13:51:55] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Emails to the mailing list Global-renamers are send, but not received by Hotmail users - https://phabricator.wikimedia.org/T184344#4292295 (10Aklapper) p:05High>03Lowest
[13:52:33] <_joe_>	 elukey: sure
[13:52:38] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Emails to the mailing list Global-renamers are sent, but not received by Hotmail users - https://phabricator.wikimedia.org/T184344#4292317 (10Nemo_bis)
[13:52:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Security: Decommission db1030 - https://phabricator.wikimedia.org/T184397#4292319 (10Reedy)
[13:53:35] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1030 - https://phabricator.wikimedia.org/T184397#4292326 (10Reedy)
[13:53:44] <wikibugs>	 (03PS2) 10Andrew Bogott: nova-api: allow access to port 8774 for api access [puppet] - 10https://gerrit.wikimedia.org/r/440478
[13:54:11] <wikibugs>	 (03CR) 10Ottomata: kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[13:55:07] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] nova-api: allow access to port 8774 for api access [puppet] - 10https://gerrit.wikimedia.org/r/440478 (owner: 10Andrew Bogott)
[13:56:08] <wikibugs>	 (03PS1) 10Elukey: Move the varnishkafka submodule back to its original place [puppet] - 10https://gerrit.wikimedia.org/r/440535
[13:57:48] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Move the varnishkafka submodule back to its original place [puppet] - 10https://gerrit.wikimedia.org/r/440535 (owner: 10Elukey)
[13:58:53] <elukey>	 _joe_ I am getting failures with pcc
[14:00:30] <elukey>	 ahahah lol https://puppet-compiler.wmflabs.org/compiler02/11535/
[14:00:38] <elukey>	 [ 2018-06-15T13:58:47 ] INFO: Compilation failed for hostname cp1051.eqiad.wmnet  in environment prod.
[14:00:50] <elukey>	 and then if completes correctly with a no op
[14:02:52] <_joe_>	 ?
[14:02:59] <_joe_>	 ok
[14:03:04] <_joe_>	 let's go
[14:03:06] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-2] "not ready yet, alas" [puppet] - 10https://gerrit.wikimedia.org/r/432703 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott)
[14:03:12] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Move the varnishkafka submodule back to its original place [puppet] - 10https://gerrit.wikimedia.org/r/440535 (owner: 10Elukey)
[14:03:39] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Fix a small comment typo [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440536
[14:04:16] <_joe_>	 elukey: let's do all of them?
[14:05:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Fix a small comment typo [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440536 (owner: 10Alexandros Kosiaris)
[14:05:04] <elukey>	 _joe_ sure
[14:05:10] <_joe_>	 elukey: let's first check the labs puppetmasters
[14:05:17] <_joe_>	 something tells me they might not be as ok
[14:05:30] <elukey>	 yes git_sync might have not liked the changes
[14:05:41] <_joe_>	 let's see about that
[14:05:52] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1030 - https://phabricator.wikimedia.org/T184397#4292568 (10Aklapper) p:05Lowest>03Normal
[14:09:05] <elukey>	 re-enabled puppet where vk is used
[14:09:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1030 - https://phabricator.wikimedia.org/T184397#4292685 (10Aklapper) a:03Cmjohnson
[14:10:49] <_joe_>	 elukey: git-sync-upstream seems to have gone ok
[14:10:59] <_joe_>	 it just had some untracked files in that directory
[14:11:29] <elukey>	 I checked labs-puppetmaster.wikimedia.org and modules/varnishkafka seems fine
[14:11:43] <elukey>	 nothing untracked
[14:12:00] <ema>	 tested a noop change to /etc/varnishkafka/webrequest.conf on cp3008, all good
[14:12:36] <papaul>	 !log OS install on backup2001
[14:12:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:56] <elukey>	 _joe_ prepping jmxtrans and kafkatee
[14:14:03] <elukey>	 last one will be nginx
[14:14:31] <elukey>	 (or better to do it after the offsite)
[14:16:07] <wikibugs>	 10Operations, 10Dumps-Generation: Reboot snapshot*, dumpsdata*, dataset1001, ms1001, francium - https://phabricator.wikimedia.org/T184443#4292765 (10Aklapper)
[14:16:12] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade cache_text to Varnish 5 - https://phabricator.wikimedia.org/T184448#4292761 (10Aklapper)
[14:16:49] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2055 - https://phabricator.wikimedia.org/T184285#4292801 (10Aklapper)
[14:18:14] <elukey>	 _joe_ so a minor issue is that people pulling into their local repo will get some issues with untracked files
[14:18:51] <elukey>	 (Andrew just tried and reported the issue, but clearning the dir works as a charm)
[14:18:59] <elukey>	 *cleaning
[14:19:18] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[14:19:44] <elukey>	 I can send an email later on to ops@ as heads up
[14:19:51] <_joe_>	 elukey: only if you pull after all is done and don't do a submodule update in the meanwhile
[14:19:59] <_joe_>	 let's do all the others!
[14:21:22] <elukey>	 all including nginx? 
[14:21:44] <_joe_>	 elukey: or you do it next week at the summit
[14:21:51] <_joe_>	 either thing is ok
[14:22:06] <elukey>	 we could do jmxtrans and kafkatee now
[14:22:12] <elukey>	 and the nginx monday after the offsite?
[14:22:16] <elukey>	 *then
[14:22:38] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[14:25:37] <icinga-wm>	 PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: Traceback (most recent call last)
[14:25:37] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: Traceback (most recent call last)
[14:26:18] <icinga-wm>	 PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: Traceback (most recent call last)
[14:26:22] <wikibugs>	 (03PS1) 10Hashar: ci: add some gated extensions to git cache [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469)
[14:26:51] <wikibugs>	 (03PS1) 10Elukey: Move jmxtrans and kafkatee submodules to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440540
[14:27:17] <wikibugs>	 (03CR) 10Hashar: "I have not cherry picked that patch on the CI puppetmaster, I think the Docker slaves are tight on disk space so I dont want to have them " [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar)
[14:27:27] <herron>	 hmm, those seem like broken checks.  looking
[14:27:28] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: Traceback (most recent call last)
[14:27:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move jmxtrans and kafkatee submodules to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440540 (owner: 10Elukey)
[14:28:19] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] "LGTM, thanks for the fix!" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440536 (owner: 10Alexandros Kosiaris)
[14:28:45] <ema>	 the ripe checks are failing because of 500 errors from the ripe API endpoint 
[14:29:29] <herron>	 ah
[14:29:55] <ema>	 https://atlas.ripe.net/ itself returns 500 right now
[14:30:13] <elukey>	 _joe_ mind to check https://gerrit.wikimedia.org/r/440540 ?
[14:30:13] <volans>	 herron: if you have a sec have a look at https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/440365/ please ;)
[14:30:18] <volans>	 p.s. morning :)
[14:30:47] <herron>	 for sure and good afternoon to you!  
[14:30:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Puppet agent: fix redirect to syslogs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440365 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[14:32:28] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 7 probes of 303 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map
[14:32:54] <wikibugs>	 10Operations, 10Analytics, 10hardware-requests: EQIAD: (1) hardware request for eventlog1001 replacement - eventlog1002. - https://phabricator.wikimedia.org/T184551#4293105 (10Aklapper)
[14:33:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] Puppet agent: fix redirect to syslogs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440365 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[14:33:48] <icinga-wm>	 RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 0 probes of 322 (alerts on 19) - https://atlas.ripe.net/measurements/11645085/#!map
[14:33:48] <icinga-wm>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 0 probes of 326 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map
[14:33:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 6 probes of 303 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:35:29] <wikibugs>	 (03CR) 10Volans: Puppet agent: fix redirect to syslogs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440365 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[14:36:12] <wikibugs>	 10Puppet, 10Analytics-Kanban, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#4293128 (10Aklapper)
[14:36:53] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Fix gemspec warnings [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/440293
[14:37:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Fix gemspec warnings [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/440293 (owner: 10Giuseppe Lavagetto)
[14:39:32] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Wikimedia Levant user group - https://phabricator.wikimedia.org/T184352#3880733 (10Aklapper)
[14:43:16] <elukey>	 !log restart varnishkafka-eventlogging on cp5012 as attempt to clear out the errors (not needed but logging it anyway)
[14:43:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:43] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create new mailing list  BiblioWiki@ for Italian group of librarians - https://phabricator.wikimedia.org/T184438#3882910 (10Aklapper)
[14:45:16] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: BilioWiki - https://phabricator.wikimedia.org/T184440#3882961 (10Aklapper)
[14:45:19] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: BiblioWiki - https://phabricator.wikimedia.org/T184441#3882978 (10Aklapper)
[14:45:32] <_joe_>	 elasticsearch seems to be down for lvs1016
[14:45:36] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10Services (done): Puppet disabled for a month on deployment-restbase0[12] instances - https://phabricator.wikimedia.org/T184477#3884325 (10Aklapper)
[14:45:53] <wikibugs>	 (03CR) 10Herron: [C: 031] "makes sense to me" [puppet] - 10https://gerrit.wikimedia.org/r/440365 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[14:46:11] <_joe_>	 and it just recovered
[14:46:27] <wikibugs>	 10Operations, 10ops-codfw, 10media-storage: Degraded RAID on ms-be2037 - https://phabricator.wikimedia.org/T184390#4293222 (10Aklapper)
[14:46:49] <wikibugs>	 10Operations, 10Deployments: find a way to systematically update the deployment server name across all repos - https://phabricator.wikimedia.org/T197470#4293228 (10Dzahn)
[14:47:30] <wikibugs>	 10Operations, 10Scap: find a way to systematically update the deployment server name across all repos - https://phabricator.wikimedia.org/T197470#4293242 (10Dzahn)
[14:48:43] <wikibugs>	 10Operations, 10Scap: find a way to systematically update the deployment server name across all repos - https://phabricator.wikimedia.org/T197470#4293228 (10Dzahn)
[14:49:30] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10Scap: find a way to systematically update the deployment server name across all repos - https://phabricator.wikimedia.org/T197470#4293258 (10Dzahn)
[14:49:42] <elukey>	 !log restart varnishkafka-eventlogging on cp4028, errors logged
[14:49:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:54:11] <wikibugs>	 (03PS3) 10Vgutierrez: kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993)
[14:54:56] <wikibugs>	 (03CR) 10Vgutierrez: kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[14:59:10] <wikibugs>	 (03PS2) 10Volans: Puppet agent: fix redirect to syslog [puppet] - 10https://gerrit.wikimedia.org/r/440365 (https://phabricator.wikimedia.org/T191300)
[15:12:21] <wikibugs>	 (03CR) 10Ottomata: [C: 032] kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[15:12:27] <wikibugs>	 (03PS4) 10Ottomata: kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[15:12:30] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled [puppet] - 10https://gerrit.wikimedia.org/r/440520 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[15:13:48] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "even though noc and dbtree share a puppet class, doesnt mean the cache backends need to be switched at the same time. just move noc and le" [puppet] - 10https://gerrit.wikimedia.org/r/430527 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn)
[15:14:11] <wikibugs>	 10Operations, 10ops-codfw, 10Traffic, 10Patch-For-Review: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560#4293332 (10Papaul) @ayounsi @BBlack  I am getting the network error message below during install on both lvs2009 and lvs2010.  Please advice. Thanks....
[15:16:18] <wikibugs>	 10Operations, 10ops-codfw, 10Traffic, 10Patch-For-Review: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560#4293344 (10Papaul) log on install2002 for lvs2010    DHCPDISCOVER from 00:0a:f7:f0:02:40 via 10.192.48.2 Jun 15 15:06:56 install2002 dhcpd[18272]: DHCPOFFER on 10.192.49.7 t...
[15:18:20] <wikibugs>	 (03PS2) 10Dzahn: cache::misc: switch noc.wm backend to mwmaint1001 [puppet] - 10https://gerrit.wikimedia.org/r/430527 (https://phabricator.wikimedia.org/T192092)
[15:18:47] <wikibugs>	 (03PS4) 10Nehajha: Read rcfile if it exists and parse arguments from it using configparser [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872)
[15:19:08] <gehel>	 !log rolling restart of elasticsearch eqiad for plugin upgrade completed - T194245
[15:19:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:19:13] <stashbot>	 T194245: Implement searching of 'depicts' on commons with the 'quantity' qualifier - https://phabricator.wikimedia.org/T194245
[15:19:13] <ottomata>	 !log bouncing kafka broker on kafka-jumbo1001 to test https://gerrit.wikimedia.org/r/#/c/440520/
[15:19:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:39] <wikibugs>	 (03PS1) 10Dzahn: noc/dbtree: require libapache-mod-php [puppet] - 10https://gerrit.wikimedia.org/r/440542 (https://phabricator.wikimedia.org/T192092)
[15:26:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] noc/dbtree: require libapache-mod-php [puppet] - 10https://gerrit.wikimedia.org/r/440542 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn)
[15:26:54] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "for now still needs to support both terbium and mwmaint1001, hence the stretch and jessie support and php5... will be removed again once t" [puppet] - 10https://gerrit.wikimedia.org/r/440542 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn)
[15:27:00] <wikibugs>	 (03PS1) 10Anomie: Move CLI overrides after InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440543 (https://phabricator.wikimedia.org/T197475)
[15:27:09] <wikibugs>	 (03CR) 10Anomie: "This seems to have caused T197475" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393355 (owner: 10Reedy)
[15:27:40] <wikibugs>	 (03PS2) 10Dzahn: noc/dbtree: require libapache-mod-php [puppet] - 10https://gerrit.wikimedia.org/r/440542 (https://phabricator.wikimedia.org/T192092)
[15:28:59] <wikibugs>	 (03CR) 10Dzahn: [C: 032] noc/dbtree: require libapache-mod-php [puppet] - 10https://gerrit.wikimedia.org/r/440542 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn)
[15:31:11] <wikibugs>	 (03PS3) 10Dzahn: cache::misc: switch noc.wm backend to mwmaint1001 [puppet] - 10https://gerrit.wikimedia.org/r/430527 (https://phabricator.wikimedia.org/T192092)
[15:34:53] <anomie>	 jynus: FYI, I'll have to re-run that maintenance script for plwiki and ptwiki at some point, they errored out for some reason (unfortunately thanks to T197475 I don't know what that reason is, yet). Rough guess is about 8 hours to run both.
[15:34:54] <stashbot>	 T197475: Wikimedia: Command-line scripts are saying to set $wgShowExceptionDetails - https://phabricator.wikimedia.org/T197475
[15:35:28] <wikibugs>	 (03CR) 10Dzahn: [C: 032] cache::misc: switch noc.wm backend to mwmaint1001 [puppet] - 10https://gerrit.wikimedia.org/r/430527 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn)
[15:36:10] <wikibugs>	 (03CR) 10Andrew Bogott: Read rcfile if it exists and parse arguments from it using configparser (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha)
[15:37:35] <mutante>	 !log switching noc.wikimedia.org site from terbium to mwamiant1001 backend, running puppet on all cache::misc cp servers (T192092)
[15:37:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:40] <stashbot>	 T192092: setup replacement for terbium (maintenance_server) on stretch - https://phabricator.wikimedia.org/T192092
[15:37:51] <jynus>	 anomie: maybe it is better to wait until tuesday
[15:41:54] <wikibugs>	 (03PS1) 10Vgutierrez: varnishkafka: Set TLS signature algorithms and curves lists [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993)
[15:45:18] <wikibugs>	 (03CR) 10Nehajha: ">" (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha)
[15:45:52] <wikibugs>	 (03CR) 10Andrew Bogott: "No trouble :)" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha)
[15:46:07] <wikibugs>	 (03CR) 10Vgutierrez: "pcc is pleased with this CR: https://puppet-compiler.wmflabs.org/compiler02/11536/" [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[15:46:53] <wikibugs>	 (03CR) 10Vgutierrez: [C: 04-2] "Wait till librdkafka_0.11.3-1~bpo8+1+wikimedia2 is deployed" [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[15:47:28] <wikibugs>	 (03CR) 10Andrew Bogott: "btw, it would be good to have an example/reference config file included in the docs someplace.  Probably not as part of this patch though." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha)
[15:47:48] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review: Investigate landscape of PuppetDB Frontends and Provision One - https://phabricator.wikimedia.org/T184563#4293457 (10Aklapper)
[15:49:09] <wikibugs>	 (03CR) 10Nehajha: "> btw, it would be good to have an example/reference config file" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha)
[15:49:48] <mutante>	 !log install2002 - disabling puppet temp, live hackking DHCP config for debugging backup2001 install issue 
[15:49:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:25] <wikibugs>	 (03PS5) 10Nehajha: Read rcfile if it exists and parse arguments from it using configparser [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872)
[15:57:41] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477#4293489 (10Papaul) @MoritzMuehlenhoff we are missing in our installer network drivers for the NIC card on this system (QLogic 10GE 2P QL41112HxCU-DE Adapter )...
[16:02:26] <wikibugs>	 (03CR) 10Alex Monk: "Interesting, it's working for me:" [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[16:02:55] <wikibugs>	 (03CR) 10Herron: [C: 032] "LGTM https://puppet-compiler.wmflabs.org/compiler02/11537/" [puppet] - 10https://gerrit.wikimedia.org/r/439451 (https://phabricator.wikimedia.org/T184244) (owner: 10Alex Monk)
[16:03:00] <wikibugs>	 (03PS3) 10Herron: Followup If545182a: Actually use cert_name now [puppet] - 10https://gerrit.wikimedia.org/r/439451 (https://phabricator.wikimedia.org/T184244) (owner: 10Alex Monk)
[16:09:54] <wikibugs>	 (03CR) 10Volans: [C: 032] Puppet agent: fix redirect to syslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440365 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[16:09:59] <wikibugs>	 (03PS3) 10Volans: Puppet agent: fix redirect to syslog [puppet] - 10https://gerrit.wikimedia.org/r/440365 (https://phabricator.wikimedia.org/T191300)
[16:14:19] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] "Everything that is here is ok, I just don't know yet if everthing that is to be doen is here." [puppet] - 10https://gerrit.wikimedia.org/r/437720 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui)
[16:26:19] <wikibugs>	 (03PS1) 10Dzahn: icinga/wikidata: add secondary dispatcher critical check [puppet] - 10https://gerrit.wikimedia.org/r/440549
[16:31:39] <wikibugs>	 (03PS1) 10Dzahn: icinga/wikidata: fix regex for wikidata dispatcher check [puppet] - 10https://gerrit.wikimedia.org/r/440550
[16:33:00] <wikibugs>	 (03PS2) 10Dzahn: icinga/wikidata: fix regex for wikidata dispatcher check [puppet] - 10https://gerrit.wikimedia.org/r/440550
[16:37:43] <wikibugs>	 (03CR) 10Ladsgroup: [C: 031] ":D" [puppet] - 10https://gerrit.wikimedia.org/r/440550 (owner: 10Dzahn)
[16:39:56] <wikibugs>	 (03CR) 10Dzahn: [C: 032] icinga/wikidata: fix regex for wikidata dispatcher check [puppet] - 10https://gerrit.wikimedia.org/r/440550 (owner: 10Dzahn)
[16:40:13] <Chrissymad>	 idk if this is the right place to ask but there's a strange problem with twinkle but only for 3rr? 
[16:45:52] <wikibugs>	 (03PS12) 10Krinkle: mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876
[16:45:58] <wikibugs>	 (03CR) 10Krinkle: "Rebased, again." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876 (owner: 10Krinkle)
[16:51:44] <wikibugs>	 (03PS2) 10Dzahn: icinga/wikidata: add secondary dispatcher critical check [puppet] - 10https://gerrit.wikimedia.org/r/440549
[16:54:59] <wikibugs>	 (03PS3) 10Dzahn: icinga/wikidata: add secondary dispatcher critical check [puppet] - 10https://gerrit.wikimedia.org/r/440549
[16:55:48] <wikibugs>	 (03CR) 10Dzahn: "the definition of critical in this context would be like "a server admin is expected to do something asap"" [puppet] - 10https://gerrit.wikimedia.org/r/440549 (owner: 10Dzahn)
[16:56:05] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477#4293662 (10Papaul)
[17:01:09] <wikibugs>	 (03CR) 10Dzahn: [C: 032] icinga/wikidata: add secondary dispatcher critical check [puppet] - 10https://gerrit.wikimedia.org/r/440549 (owner: 10Dzahn)
[17:04:17] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[17:04:37] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[17:05:06] <herron>	 are those expected XioNoX?
[17:05:31] <mutante>	 schedules downtime for 'correctness of icinga config' adding a new check command..
[17:05:38] <mutante>	 but will be quick
[17:06:08] <mutante>	 meh, just needed 2 puppet runs. not even worth it.done
[17:07:05] <XioNoX>	 herron: checking, but even if not expected we have redundancy
[17:07:35] <herron>	 cool thx
[17:08:05] <wikibugs>	 (03CR) 10Krinkle: [C: 032] mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876 (owner: 10Krinkle)
[17:08:36] * Krinkle stages on mwdebug1002
[17:09:12] <wikibugs>	 (03PS7) 10Krinkle: profiler-labs: Use FlameGraph-compatible format for xhprof sampler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916)
[17:09:29] <XioNoX>	 herron: btw, https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down
[17:09:51] <wikibugs>	 (03Merged) 10jenkins-bot: mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876 (owner: 10Krinkle)
[17:09:56] <herron>	 nice! thanks reading through this now
[17:10:05] <wikibugs>	 (03CR) 10jenkins-bot: mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876 (owner: 10Krinkle)
[17:11:04] <XioNoX>	 herron: I can't find any planned maintenance. I'm on my phone, let me know if anything degrades
[17:11:17] <herron>	 XioNoX: will do thanks
[17:15:50] <logmsgbot>	 !log krinkle@deploy1001 Synchronized wmf-config/mc.php: I619a2ff5db611 (duration: 00m 58s)
[17:15:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:16:08] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4293707 (10herron) Hello @MSantos!  To provision access we need to list down the specific group memberships that are requested.  Could you please coordinate the gathering of this infor...
[17:17:13] <wikibugs>	 (03PS1) 10Dzahn: icinga/wikidata: add new check_command for dispatcher [puppet] - 10https://gerrit.wikimedia.org/r/440552
[17:17:48] <wikibugs>	 (03PS2) 10Dzahn: icinga/wikidata: add new check_command for dispatcher [puppet] - 10https://gerrit.wikimedia.org/r/440552
[17:18:52] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4293712 (10herron) p:05Triage>03Normal
[17:18:58] <wikibugs>	 (03CR) 10Dzahn: [C: 032] icinga/wikidata: add new check_command for dispatcher [puppet] - 10https://gerrit.wikimedia.org/r/440552 (owner: 10Dzahn)
[17:20:49] <wikibugs>	 (03PS1) 10Krinkle: logging: Raise minimum level for 'preferences' to INFO [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440553
[17:32:38] * Krinkle done on mwlog1002
[17:32:45] * Krinkle testing stuff on mwlog1001
[17:57:54] <wikibugs>	 10Operations, 10DBA, 10Traffic, 10Patch-For-Review: dbtree broken (for some users?) - https://phabricator.wikimedia.org/T162976#4293777 (10jcrespo)
[17:57:58] <wikibugs>	 10Operations, 10DBA, 10Traffic: dbtree: make wasat a working backend and become active-active - https://phabricator.wikimedia.org/T163141#4293775 (10jcrespo) 05Open>03stalled This is stalled because tendril cannot work with multiple db backends. We would need to setup a different backend to support it- w...
[17:58:01] <wikibugs>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#4293778 (10jcrespo)
[18:03:26] <icinga-wm>	 ACKNOWLEDGEMENT - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0: Herron Telia Carrier Reference: 00862426. We regret to inform you that we are currently experiencing a major outage in New York. The issue is suspected to be caused by a fiber cut.
[18:03:26] <icinga-wm>	 ACKNOWLEDGEMENT - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0: Herron Telia Carrier Reference: 00862426. We regret to inform you that we are currently experiencing a major outage in New York. The issue is suspected to be caused by a fiber cut.
[18:10:29] <Trey314159>	 !log reindexing Bosnian wikis on elastic@codfw (T196658)
[18:10:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:32] <stashbot>	 T196658: Re-index Croatian, Serbo-Croatian, and Bosnian Wikis - https://phabricator.wikimedia.org/T196658
[18:19:22] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[18:20:45] <wikibugs>	 (03PS1) 10Bmansurov: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440557
[18:21:03] <wikibugs>	 (03PS2) 10Bmansurov: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440557 (https://phabricator.wikimedia.org/T191086)
[18:21:34] <wikibugs>	 (03CR) 10Bmansurov: [C: 04-1] "To be deployed on 6/21." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440557 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[18:22:41] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[18:23:44] <wikibugs>	 (03CR) 10EBernhardson: Prep work for multi-instance elasticsearch refactor (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440498 (owner: 10EBernhardson)
[18:26:28] <wikibugs>	 (03PS2) 10EBernhardson: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498
[18:26:30] <wikibugs>	 (03PS29) 10EBernhardson: [WIP] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049
[18:27:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 (owner: 10EBernhardson)
[18:27:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049 (owner: 10EBernhardson)
[18:45:12] <icinga-wm>	 RECOVERY - Memory correctable errors -EDAC- on scb1002 is OK: (C)4 ge (W)2 ge 1 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=scb1002&var-datasource=eqiad%2520prometheus%252Fops
[18:46:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Read rcfile if it exists and parse arguments from it using configparser [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha)
[18:47:21] <wikibugs>	 (03Merged) 10jenkins-bot: Read rcfile if it exists and parse arguments from it using configparser [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha)
[18:48:52] <Trey314159>	 !log reindexing Bosnian wikis on elastic@eqiad (T196658)
[18:48:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:54] <stashbot>	 T196658: Re-index Croatian, Serbo-Croatian, and Bosnian Wikis - https://phabricator.wikimedia.org/T196658
[18:50:52] <wikibugs>	 10Operations, 10DBA, 10Traffic: dbtree: make wasat a working backend and become active-active - https://phabricator.wikimedia.org/T163141#4293838 (10Krinkle)
[18:51:08] <wikibugs>	 10Operations, 10DBA, 10Traffic, 10Availability: dbtree: make wasat a working backend and become active-active - https://phabricator.wikimedia.org/T163141#3187493 (10Krinkle)
[19:05:58] <wikibugs>	 (03PS2) 10Aaron Schulz: Make mediawiki.org write to both nutcracker and mcrouter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440469
[19:39:21] <wikibugs>	 (03PS1) 10Alex Monk: deployment-prep: Fix shinken check for Citoid [puppet] - 10https://gerrit.wikimedia.org/r/440561
[19:39:37] <Krenair>	 alex@alex-laptop:~/Development/Wikimedia/Operations-Puppet (shinken-beta-citoid)$ git review
[19:39:37] <Krenair>	 Problem running 'git remote update gerrit'
[19:39:38] <Krenair>	 Fetching gerrit
[19:39:39] <Krenair>	 fatal: internal server error
[19:41:11] <wikibugs>	 (03PS1) 10Alex Monk: shinkengen: Ignore instances that are turned off in Nova [puppet] - 10https://gerrit.wikimedia.org/r/440562
[19:46:32] <wikibugs>	 (03CR) 10Alex Monk: "krenair@shinken-01:~$ /usr/lib/nagios/plugins/check_http -H deployment-sca02 -p 1970 -u /_info" [puppet] - 10https://gerrit.wikimedia.org/r/440561 (owner: 10Alex Monk)
[19:48:50] <wikibugs>	 (03CR) 10Alex Monk: "(think it's actually -I instead of -H but it still appears to work)" [puppet] - 10https://gerrit.wikimedia.org/r/440562 (owner: 10Alex Monk)
[20:19:12] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[20:22:32] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[20:23:02] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received
[20:26:21] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[21:04:56] <wikibugs>	 (03CR) 10Krinkle: Move CLI overrides after InitialiseSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440543 (https://phabricator.wikimedia.org/T197475) (owner: 10Anomie)
[21:19:12] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[21:22:32] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[21:45:39] <wikibugs>	 (03PS1) 10Ladsgroup: snapshot: fix css used to show report cards [puppet] - 10https://gerrit.wikimedia.org/r/440613
[22:24:22] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] Greatly simplify svn.wikimedia.org redirects [puppet] - 10https://gerrit.wikimedia.org/r/429449 (owner: 10Chad)
[22:37:40] <wikibugs>	 (03PS1) 10Mooeypoo: Enable RCFilters by default on Watchlist in Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440629 (https://phabricator.wikimedia.org/T181193)
[22:40:19] <wikibugs>	 (03CR) 10Catrope: [C: 032] Enable RCFilters by default on Watchlist in Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440629 (https://phabricator.wikimedia.org/T181193) (owner: 10Mooeypoo)
[22:41:54] <wikibugs>	 (03Merged) 10jenkins-bot: Enable RCFilters by default on Watchlist in Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440629 (https://phabricator.wikimedia.org/T181193) (owner: 10Mooeypoo)
[22:42:11] <wikibugs>	 (03CR) 10jenkins-bot: Enable RCFilters by default on Watchlist in Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440629 (https://phabricator.wikimedia.org/T181193) (owner: 10Mooeypoo)
[23:06:08] <wikibugs>	 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Add HTTPS support to wdqs-internal service - https://phabricator.wikimedia.org/T193473#4294361 (10Smalyshev)
[23:18:02] <wikibugs>	 (03PS1) 10Mooeypoo: Rollout Watchlist Structured Filters to most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440641 (https://phabricator.wikimedia.org/T181193)
[23:21:28] <wikibugs>	 (03PS1) 10Mooeypoo: Rollout Watchlist Structured Filters to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440642 (https://phabricator.wikimedia.org/T181193)
[23:22:09] <wikibugs>	 (03CR) 10Catrope: [C: 04-2] "Not before June 25th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440641 (https://phabricator.wikimedia.org/T181193) (owner: 10Mooeypoo)
[23:22:16] <wikibugs>	 (03Abandoned) 10Aaron Schulz: [DNM] Set "mcrouterAware" flag for "memcached-mcrouter" object cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440039 (owner: 10Aaron Schulz)
[23:22:31] <wikibugs>	 (03CR) 10Catrope: [C: 04-2] "Not before June 25th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440642 (https://phabricator.wikimedia.org/T181193) (owner: 10Mooeypoo)
[23:25:51] <wikibugs>	 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Ban clients of WDQS which don't follow throttling directives for some time - https://phabricator.wikimedia.org/T194653#4294384 (10Smalyshev) p:05Triage>03Normal
[23:26:06] <wikibugs>	 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Ban clients of WDQS which don't follow throttling directives for some time - https://phabricator.wikimedia.org/T194653#4204417 (10Smalyshev) 05Open>03Resolved
[23:47:03] <wikibugs>	 (03PS1) 10Aaron Schulz: Make mc-labs.php settings more similar to mc.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440643
[23:50:50] <wikibugs>	 (03CR) 10Krinkle: [C: 032] "Confirmed that aside from the one message being changed, there are 0 hits for level-DEBUG in production channel 'preferences' over the pas" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440553 (owner: 10Krinkle)
[23:55:30] <wikibugs>	 (03PS2) 10Krinkle: logging: Raise minimum level for 'preferences' to INFO [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440553
[23:56:12] <icinga-wm>	 PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200)
[23:58:21] <icinga-wm>	 RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy
[23:58:36] <Krinkle>	 elukey: Should the old puppet/varnishkafka repo be emptied/archived?
[23:59:46] <wikibugs>	 (03PS1) 10Krinkle: Archive repository [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/440644