[00:05:58] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1966 bytes in 0.096 second response time
[00:11:07] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.079 second response time
[01:48:57] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[01:52:17] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[02:32:26] <logmsgbot>	 !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.7) (duration: 13m 06s)
[02:32:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:38:17] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={create_container,image_status,podsandbox_status,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[02:39:18] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[02:42:42] <logmsgbot>	 !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Mon Jun 11 02:42:42 UTC 2018 (duration 10m 16s)
[02:42:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:57] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 694.21 seconds
[03:43:54] <wikibugs>	 (03CR) 10Chelsyx: [C: 031] statistics::discovery: re-enable cron job [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga)
[04:09:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 294.64 seconds
[04:17:14] <wikibugs>	 (03CR) 10Dzahn: "ack, the one that has been deleted was actually "deployment-deploy1001" not deployment-tin. corrrecting that. That deployment-deploy-01 di" [puppet] - 10https://gerrit.wikimedia.org/r/438001 (https://phabricator.wikimedia.org/T192071) (owner: 10Dzahn)
[04:29:35] <wikibugs>	 (03CR) 10Dzahn: "mariadb grants have been merged. the cron job will appear once we switch phab1002 to the active server in Hiera" [puppet] - 10https://gerrit.wikimedia.org/r/437558 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn)
[05:10:28] <wikibugs>	 (03PS6) 10Dzahn: analytics_cluster::webserver: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/416742
[05:15:57] <marostegui>	 !log Deploy schema change on s6 primary master (db1061)  - T191316 T192926 T195193
[05:16:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:16:04] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[05:16:04] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[05:16:04] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[05:16:50] <wikibugs>	 (03PS7) 10Dzahn: analytics_cluster::webserver: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/416742
[05:17:33] <marostegui>	 !log Stop MySQL and reboot pc2005 for intel-microcode update and final HW check - T196339
[05:17:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:17:40] <stashbot>	 T196339: pc2005 down - https://phabricator.wikimedia.org/T196339
[05:19:10] <wikibugs>	 (03CR) 10Dzahn: [C: 031] "found the missing part. it was turnilo. now this compiles fines and should be noop on thorium: http://puppet-compiler.wmflabs.org/11435/th" [puppet] - 10https://gerrit.wikimedia.org/r/416742 (owner: 10Dzahn)
[05:19:48] <wikibugs>	 (03PS1) 10Marostegui: Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523
[05:19:56] <wikibugs>	 (03PS2) 10Marostegui: Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523
[05:22:11] <wikibugs>	 (03CR) 10Dzahn: [C: 031] "no parked domains are in the list of email domains at modules/role/files/exim/wikimedia_domains" [dns] - 10https://gerrit.wikimedia.org/r/429874 (https://phabricator.wikimedia.org/T193408) (owner: 10Dzahn)
[05:22:17] <wikibugs>	 (03PS1) 10Marostegui: realm.pp: Add id_internalwikimedia as private wiki [puppet] - 10https://gerrit.wikimedia.org/r/439524 (https://phabricator.wikimedia.org/T196748)
[05:22:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 16.84 seconds
[05:22:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 14.71 seconds
[05:23:49] <wikibugs>	 (03CR) 10Marostegui: [C: 032] realm.pp: Add id_internalwikimedia as private wiki [puppet] - 10https://gerrit.wikimedia.org/r/439524 (https://phabricator.wikimedia.org/T196748) (owner: 10Marostegui)
[05:25:12] <marostegui>	 !log Restart mysql on codfw sanitariums (db2094, db2095) to pick up new replication filters - T196748
[05:25:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:25:17] <stashbot>	 T196748: Prepare and check storage layer for id_internalwikimedia - https://phabricator.wikimedia.org/T196748
[05:25:37] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523 (owner: 10Marostegui)
[05:27:08] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523 (owner: 10Marostegui)
[05:28:22] <wikibugs>	 (03CR) 10jenkins-bot: Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523 (owner: 10Marostegui)
[05:28:24] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool pc2005 - T196339 (duration: 00m 52s)
[05:28:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:28:29] <stashbot>	 T196339: pc2005 down - https://phabricator.wikimedia.org/T196339
[05:30:13] <wikibugs>	 (03PS5) 10Dzahn: Add id-internal.wikimedia.org for Wikimedia Indonesia [dns] - 10https://gerrit.wikimedia.org/r/438275 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm)
[05:30:51] <wikibugs>	 (03PS6) 10Dzahn: Add id-internal.wikimedia.org for Wikimedia Indonesia [dns] - 10https://gerrit.wikimedia.org/r/438275 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm)
[05:32:34] <marostegui>	 !log Restart mysql on codfw sanitariums (db1095, db1102, db1124, db1125) to pick up new replication filters - T196748
[05:32:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:32:39] <stashbot>	 T196748: Prepare and check storage layer for id_internalwikimedia - https://phabricator.wikimedia.org/T196748
[05:33:08] <wikibugs>	 (03CR) 10Dzahn: [C: 032] Add id-internal.wikimedia.org for Wikimedia Indonesia [dns] - 10https://gerrit.wikimedia.org/r/438275 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm)
[05:35:10] <wikibugs>	 (03PS6) 10Dzahn: id_internalwikimedia: add Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/438276 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm)
[05:36:01] <wikibugs>	 (03PS7) 10Dzahn: mediawiki/apache: add id-internal.wikimedia.org server alias [puppet] - 10https://gerrit.wikimedia.org/r/438276 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm)
[05:40:58] <wikibugs>	 (03CR) 10Dzahn: [C: 032] mediawiki/apache: add id-internal.wikimedia.org server alias [puppet] - 10https://gerrit.wikimedia.org/r/438276 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm)
[05:45:50] <wikibugs>	 (03CR) 10Dzahn: "yep, this one works MUCH faster:" [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper)
[05:46:54] <wikibugs>	 (03PS5) 10Dzahn: phabricator: List new and recent assignees [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper)
[05:48:47] <wikibugs>	 (03CR) 10Dzahn: [C: 032] phabricator: List new and recent assignees [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper)
[05:51:24] <wikibugs>	 (03CR) 10Dzahn: [C: 04-2] "no more php5 on deployment hosts now" [puppet] - 10https://gerrit.wikimedia.org/r/391045 (owner: 10Hoo man)
[05:53:46] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4270902 (10Marostegui) 05Open>03Resolved a:03Marostegui Everything has been fine for more than a week now (including the...
[05:55:14] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4270909 (10Marostegui)
[05:56:03] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4245083 (10Marostegui)
[05:58:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.62 seconds
[05:58:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db2046 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.37 seconds
[05:58:47] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.78 seconds
[05:58:48] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db2067 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 315.42 seconds
[05:58:48] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db2039 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 315.76 seconds
[05:59:08] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db2076 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 323.51 seconds
[05:59:08] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db2089 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 323.57 seconds
[05:59:17] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db2053 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 327.09 seconds
[05:59:18] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db2060 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 328.91 seconds
[06:03:22] <marostegui>	 ^ that is anomi.e's script hitting s6
[06:10:40] <marostegui>	 !log Stop replication on db2095 to update triggers - T192926
[06:10:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:10:45] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[06:14:53] <marostegui>	 !log Deploy schema change on s4 codfw master (db2051) this will generate lag on codfw - T191316 T192926 T89737 T195193
[06:14:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:15:00] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[06:15:00] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[06:15:00] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[06:19:04] <wikibugs>	 10Operations, 10Dumps-Generation, 10Wikimedia-log-errors: High rate of "Memcached error .. CONNECTION FAILURE" on snapshot hosts - https://phabricator.wikimedia.org/T196303#4270934 (10ArielGlenn) 05Open>03Resolved These messages have disappeared from logstash after the deployment of the BagOStuff fixes....
[06:29:57] <icinga-wm>	 PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/profile.d/field.sh],File[/etc/ssl/localcerts/api.svc.eqiad.wmnet.crt]
[06:30:34] <wikibugs>	 (03CR) 10Elukey: analytics_cluster::webserver: apache -> httpd module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/416742 (owner: 10Dzahn)
[06:37:17] <icinga-wm>	 PROBLEM - Host ms-be1036 is DOWN: PING CRITICAL - Packet loss = 100%
[06:38:06] * elukey waves to marostegui doing alter tables
[06:38:33] <marostegui>	 elukey o/ !!
[06:39:45] <elukey>	 !log restart pdfrender on scb1002
[06:39:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:40:57] <icinga-wm>	 RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.007 second response time
[06:41:24] <elukey>	 checking ms-be's console
[06:42:31] <elukey>	 console frozen, can't get a tty
[06:44:44] <elukey>	 ah nice " The server is not powered on.  The Virtual Serial Port is not available."
[06:45:19] <elukey>	 I checked sal but didn't find anything, re-checking 
[06:55:08] <elukey>	 didn't find anything, also no iLO system/console logs for this event
[06:55:18] <icinga-wm>	 RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[06:58:11] <wikibugs>	 10Operations, 10ops-eqiad: mw1280: CPU error - https://phabricator.wikimedia.org/T195734#4270962 (10MoritzMuehlenhoff) 05Open>03Resolved No new errors have been logged in SEL and the server appears stable, closing the task.
[07:00:41] <elukey>	 so before powering it up I'll wait for another human being to check, monday morning and lack of enough caffeine might be a bad compromise :)
[07:02:20] <moritzm>	 elukey: having a second look at ms-be1036
[07:03:53] <elukey>	 thanks!
[07:07:18] <moritzm>	 nothing logged indeed, the last system event is from May
[07:08:25] <elukey>	 it seems a brutal power off, somebody tripped in the rack's cabling? :P
[07:13:58] <wikibugs>	 (03CR) 10Ayounsi: [C: 031] "A brief restart of Netbox anytime is fine." [puppet] - 10https://gerrit.wikimedia.org/r/438219 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[07:22:40] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: adjust NAT mapping for frdata.wikimedia.org - https://phabricator.wikimedia.org/T196656#4270975 (10ayounsi) a:03ayounsi This needs to be pushed for the NAT change: ```lang=diff [edit security nat static rule-set static-nat rule public-reporting then static-na...
[07:26:09] <wikibugs>	 (03PS1) 10Addshore: Switch from 5 mins to 10 mins for wikidata dispatch check [puppet] - 10https://gerrit.wikimedia.org/r/439528 (https://phabricator.wikimedia.org/T194602)
[07:27:57] <wikibugs>	 (03PS1) 10Elukey: profile::geowiki: remove unused/old crons [puppet] - 10https://gerrit.wikimedia.org/r/439529
[07:28:27] <wikibugs>	 (03PS2) 10Addshore: Switch from 5 mins to 10 mins for wikidata dispatch check [puppet] - 10https://gerrit.wikimedia.org/r/439528 (https://phabricator.wikimedia.org/T194602)
[07:29:18] <moritzm>	 !log installing openjdk-7 security updates
[07:29:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:58] <marostegui>	 !log Deploy schema change on dbstore1002:s4 T191316 T192926 T89737 T195193
[07:37:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:37:06] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[07:37:06] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[07:37:06] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[07:37:06] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[07:42:32] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Promote db1066 to master [puppet] - 10https://gerrit.wikimedia.org/r/439530 (https://phabricator.wikimedia.org/T194870)
[07:52:21] <moritzm>	 !log installing gnupg security updates
[07:52:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:55:43] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Set s2 as read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439531 (https://phabricator.wikimedia.org/T194870)
[07:56:04] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Compiler looks good: https://puppet-compiler.wmflabs.org/compiler02/11436/" [puppet] - 10https://gerrit.wikimedia.org/r/439530 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui)
[07:57:12] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Do not merge until the day of the failover" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439531 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui)
[08:00:43] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Promote db1066 to master and remove read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439532 (https://phabricator.wikimedia.org/T194870)
[08:01:23] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Do not merge until the failover date" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439532 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui)
[08:05:19] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Update s2-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/439533 (https://phabricator.wikimedia.org/T194870)
[08:06:27] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Do not submit till the day of the failover" [dns] - 10https://gerrit.wikimedia.org/r/439533 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui)
[08:06:51] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "@Paladox let's just add 2 crons in puppet, one to delete old files and one to gzip all files?" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox)
[08:07:49] <wikibugs>	 (03PS1) 10Marostegui: s2.hosts: db1066 is now s2 primary master [software] - 10https://gerrit.wikimedia.org/r/439534 (https://phabricator.wikimedia.org/T194870)
[08:08:01] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Do not submit till the failover day" [software] - 10https://gerrit.wikimedia.org/r/439534 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui)
[08:09:06] <wikibugs>	 (03CR) 10Dzahn: "before Gerrit UI changes there should be an announcement on mailing lists with a reminder that this is coming and explanation how to previ" [puppet] - 10https://gerrit.wikimedia.org/r/439444 (https://phabricator.wikimedia.org/T196812) (owner: 10Paladox)
[08:14:17] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "should not use base::service_unit anymore. nowadays we want systemd::service" [puppet] - 10https://gerrit.wikimedia.org/r/362455 (owner: 10Paladox)
[08:15:26] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] servermon: Add gunicorn.service systemd script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/362455 (owner: 10Paladox)
[08:21:10] <wikibugs>	 (03CR) 10Dzahn: "re: paladox: on that other change i commented let's just add 2 cron jobs, one to gzip them, instead of the hack to rename files to .gz tha" [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad)
[08:22:00] <wikibugs>	 (03CR) 10Ema: "Those hostnames are there for testing purposes only. They don't need to reflect any actual machine name and thus there's no need to update" [puppet] - 10https://gerrit.wikimedia.org/r/437625 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn)
[08:23:47] <wikibugs>	 (03CR) 10Dzahn: "Thank you Ema, that kind of confirmation was what i was after with this review :) will abandon and glad to have it checked" [puppet] - 10https://gerrit.wikimedia.org/r/437625 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn)
[08:24:19] <wikibugs>	 (03Abandoned) 10Dzahn: mtail: replace phab1001 with phab1002? [puppet] - 10https://gerrit.wikimedia.org/r/437625 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn)
[08:25:45] <moritzm>	 !log installing gnupg1 security updates
[08:25:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:32] <gehel>	 !log restart elastic1020 to enable G1 GC - T156137
[08:27:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:36] <stashbot>	 T156137: Reduce impact of GC pauses on elasticsearch response time - https://phabricator.wikimedia.org/T156137
[08:28:08] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "my suggestion: abandon this change, merge change that moves log files to /var/log/, add new change that adds cron job that gzips uncompres" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox)
[08:32:01] <moritzm>	 !log installing gnupg2 security updates
[08:32:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:31] <wikibugs>	 (03PS1) 10Dzahn: site: include ::base::firewall -> ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439535
[08:38:03] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "comments only" [puppet] - 10https://gerrit.wikimedia.org/r/439535 (owner: 10Dzahn)
[08:39:19] <wikibugs>	 (03PS1) 10Marostegui: db1102.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/439536 (https://phabricator.wikimedia.org/T196527)
[08:40:00] <wikibugs>	 (03PS2) 10Marostegui: db1102.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/439536 (https://phabricator.wikimedia.org/T196527)
[08:40:46] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db1102.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/439536 (https://phabricator.wikimedia.org/T196527) (owner: 10Marostegui)
[08:42:23] <wikibugs>	 (03CR) 10Volans: [C: 032] "Got agreement on IRC" [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437968 (owner: 10Volans)
[08:42:49] <wikibugs>	 (03Merged) 10jenkins-bot: Bump Gemfile dependencies [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437968 (owner: 10Volans)
[08:43:22] <wikibugs>	 (03CR) 10Volans: [C: 032] "Got additional agreement on IRC" [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans)
[08:43:32] <wikibugs>	 (03PS1) 10Dzahn: maps-test: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439537
[08:43:37] <wikibugs>	 (03Merged) 10jenkins-bot: Add nginx::snippet define [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans)
[08:44:57] <wikibugs>	 (03PS2) 10Dzahn: maps-test: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439537
[08:46:40] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "wmf-style: total violations delta -2" [puppet] - 10https://gerrit.wikimedia.org/r/439537 (owner: 10Dzahn)
[08:50:13] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609#4271132 (10ema)
[08:51:48] <wikibugs>	 (03PS5) 10Volans: debmonitor: add basic HTTP Icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/436509 (https://phabricator.wikimedia.org/T191299)
[08:51:50] <wikibugs>	 (03PS1) 10Volans: nginx: updated git submodule [puppet] - 10https://gerrit.wikimedia.org/r/439539 (https://phabricator.wikimedia.org/T191299)
[08:51:52] <wikibugs>	 (03PS1) 10Volans: debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299)
[08:51:54] <wikibugs>	 (03PS1) 10Volans: debmonitor: use new cache control setting [puppet] - 10https://gerrit.wikimedia.org/r/439541 (https://phabricator.wikimedia.org/T191299)
[08:52:04] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "noop on maps-test2001/2" [puppet] - 10https://gerrit.wikimedia.org/r/439537 (owner: 10Dzahn)
[08:52:53] <wikibugs>	 (03PS1) 10Gehel: maps: remove "style" parameter [puppet] - 10https://gerrit.wikimedia.org/r/439543
[08:53:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] maps: remove "style" parameter [puppet] - 10https://gerrit.wikimedia.org/r/439543 (owner: 10Gehel)
[08:54:40] <wikibugs>	 (03CR) 10Vgutierrez: [C: 031] Fine tune security settings [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437954 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[08:54:59] <wikibugs>	 (03PS2) 10Gehel: maps: remove "style" parameter [puppet] - 10https://gerrit.wikimedia.org/r/439543
[08:55:30] <wikibugs>	 (03CR) 10Volans: [C: 032] Fine tune security settings [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437954 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[08:56:07] <wikibugs>	 (03CR) 10Volans: "Compiler result:" [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[08:56:43] <wikibugs>	 (03Merged) 10jenkins-bot: Fine tune security settings [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437954 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[08:58:55] <wikibugs>	 (03PS1) 10Dzahn: deployment_server/package_builder: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439544
[09:02:02] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439545 (https://phabricator.wikimedia.org/T191316)
[09:03:53] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439545 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[09:05:18] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439545 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[09:06:31] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1097:3314 for alter table (duration: 00m 52s)
[09:06:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:06:55] <wikibugs>	 10Operations, 10ops-eqiad: ms-be1036 in power off status, not responsive to power on commands - https://phabricator.wikimedia.org/T196873#4271185 (10elukey) p:05Triage>03Normal
[09:07:21] <marostegui>	 !log Deploy schema change on db1097:3314 T191316 T192926 T89737 T195193
[09:07:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:28] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[09:07:28] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[09:07:28] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[09:07:28] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[09:08:25] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439545 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[09:09:11] <wikibugs>	 (03PS1) 10Volans: debmonitor: add explicit dependency on libldap [puppet] - 10https://gerrit.wikimedia.org/r/439546 (https://phabricator.wikimedia.org/T191299)
[09:10:03] <wikibugs>	 (03CR) 10Ema: [C: 031] nginx: updated git submodule [puppet] - 10https://gerrit.wikimedia.org/r/439539 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[09:10:53] <icinga-wm>	 PROBLEM - IPMI Sensor Status on maps1002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Power Supply 2 = Critical, Power Supplies = Critical]
[09:15:52] <wikibugs>	 (03PS1) 10Muehlenhoff: Create component/hhvm324 [puppet] - 10https://gerrit.wikimedia.org/r/439548
[09:21:47] <wikibugs>	 (03PS1) 10Ema: reload-vcl: fix get_cmd_output error handling [puppet] - 10https://gerrit.wikimedia.org/r/439550
[09:24:40] <wikibugs>	 (03PS2) 10Vgutierrez: update-ocsp: Actually use --time-offset-end argument [puppet] - 10https://gerrit.wikimedia.org/r/436485 (https://phabricator.wikimedia.org/T163541)
[09:24:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 031] debmonitor: add explicit dependency on libldap [puppet] - 10https://gerrit.wikimedia.org/r/439546 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[09:28:44] <wikibugs>	 (03PS1) 10Dzahn: network::monitor: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439554
[09:32:54] <wikibugs>	 (03PS1) 10Dzahn: kubernetes: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439556
[09:33:14] <wikibugs>	 (03PS1) 10Vgutierrez: update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541)
[09:33:16] <wikibugs>	 (03PS1) 10Vgutierrez: update-ocsp: Make pylint happy [puppet] - 10https://gerrit.wikimedia.org/r/439558 (https://phabricator.wikimedia.org/T163541)
[09:35:09] <wikibugs>	 (03CR) 10Ema: [C: 031] update-ocsp: Actually use --time-offset-end argument [puppet] - 10https://gerrit.wikimedia.org/r/436485 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez)
[09:36:10] <wikibugs>	 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4271243 (10elukey) Hi Rob!  So the spare looks great but I am a bit afraid about having "only" 32G for that machine, in fact I was about to ask even more than wha...
[09:36:37] <wikibugs>	 (03CR) 10Ema: "Please mention what's wrong with the current code in the commit message." [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez)
[09:37:10] <wikibugs>	 (03CR) 10Ema: [C: 031] update-ocsp: Make pylint happy [puppet] - 10https://gerrit.wikimedia.org/r/439558 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez)
[09:38:46] <wikibugs>	 (03CR) 10Dzahn: "part of Change-Id I4a30e491f5861aa00" [puppet] - 10https://gerrit.wikimedia.org/r/439556 (owner: 10Dzahn)
[09:39:44] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.16 seconds
[09:39:53] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2050 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.06 seconds
[09:39:54] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2036 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.11 seconds
[09:39:54] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.13 seconds
[09:40:14] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.70 seconds
[09:40:24] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2074 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.34 seconds
[09:40:45] <marostegui>	 anomi.e ^ script probably
[09:40:53] <wikibugs>	 (03PS2) 10Vgutierrez: update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541)
[09:41:17] <marostegui>	 yep, it is
[09:43:22] <wikibugs>	 (03CR) 10Ema: [C: 031] update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez)
[09:48:56] <wikibugs>	 (03CR) 10Vgutierrez: [C: 031] reload-vcl: fix get_cmd_output error handling [puppet] - 10https://gerrit.wikimedia.org/r/439550 (owner: 10Ema)
[09:49:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio)
[09:50:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Archive the operations/software/tessera repository [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio)
[09:51:16] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Add .gitreview file [debs/python-logstash] - 10https://gerrit.wikimedia.org/r/430306 (owner: 10Gilles)
[09:51:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for smartd [puppet] - 10https://gerrit.wikimedia.org/r/419769 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[09:55:02] <wikibugs>	 (03PS4) 10Muehlenhoff: Enable base::service_auto_restart for smartd [puppet] - 10https://gerrit.wikimedia.org/r/419769 (https://phabricator.wikimedia.org/T135991)
[09:56:29] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for smartd [puppet] - 10https://gerrit.wikimedia.org/r/419769 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[09:56:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Already cherry-picked in beta?" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk)
[09:58:34] <wikibugs>	 (03PS2) 10Volans: Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300)
[09:58:37] <wikibugs>	 (03PS2) 10Volans: Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504)
[09:58:39] <wikibugs>	 (03PS1) 10Volans: Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504)
[09:58:41] <wikibugs>	 (03PS1) 10Volans: Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299)
[09:58:57] <wikibugs>	 (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439562 (https://phabricator.wikimedia.org/T128546)
[09:59:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[09:59:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[09:59:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[10:00:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[10:00:21] <volans>	 sorry for the spam, that's me, but because of T196628 ;)
[10:00:21] <stashbot>	 T196628: CI: upgrade tox, currently running 2.6.0 - https://phabricator.wikimedia.org/T196628
[10:01:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] monitoring: Remove unused 'graphite_anomaly' command [puppet] - 10https://gerrit.wikimedia.org/r/437365 (owner: 10Krinkle)
[10:01:35] <wikibugs>	 (03CR) 10Vgutierrez: [C: 031] Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[10:01:37] <wikibugs>	 (03PS2) 10Volans: nginx: updated git submodule [puppet] - 10https://gerrit.wikimedia.org/r/439539 (https://phabricator.wikimedia.org/T191299)
[10:01:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/438002 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[10:01:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for PDNS recursor Prometheus exporters [puppet] - 10https://gerrit.wikimedia.org/r/437949 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[10:02:25] <wikibugs>	 (03CR) 10Volans: [C: 032] nginx: updated git submodule [puppet] - 10https://gerrit.wikimedia.org/r/439539 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[10:04:44] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2074 is OK: OK slave_sql_lag Replication lag: 3.07 seconds
[10:04:48] <wikibugs>	 (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439562 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[10:04:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: Configuration for phabricator to use swift storage. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4)
[10:05:14] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 0.15 seconds
[10:05:23] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2050 is OK: OK slave_sql_lag Replication lag: 0.30 seconds
[10:05:24] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[10:05:24] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2043 is OK: OK slave_sql_lag Replication lag: 0.20 seconds
[10:05:53] <wikibugs>	 (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for uwsgi-netbox [puppet] - 10https://gerrit.wikimedia.org/r/438219 (https://phabricator.wikimedia.org/T135991)
[10:06:13] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2061 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 426.25 seconds
[10:06:27] <wikibugs>	 (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439562 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[10:07:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Allow removing Diamond gradually [puppet] - 10https://gerrit.wikimedia.org/r/429389 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff)
[10:07:59] <wikibugs>	 (03PS2) 10Volans: debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299)
[10:08:40] <wikibugs>	 (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439562 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[10:09:04] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2094 is OK: OK slave_sql_lag Replication lag: 0.08 seconds
[10:10:07] <apergos>	 did any of those page? because I got none
[10:10:14] <logmsgbot>	 !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:439562|Bumping portals to master (T128546)]] (duration: 00m 51s)
[10:10:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for uwsgi-netbox [puppet] - 10https://gerrit.wikimedia.org/r/438219 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[10:10:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:20] <stashbot>	 T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
[10:10:21] <marostegui>	 apergos: they don't page
[10:10:35] <apergos>	 ok good (means phone is not in a weird state), thanks!
[10:10:41] <marostegui>	 :)
[10:10:55] <Hauskatze>	 godog: is it possible to merge https://gerrit.wikimedia.org/r/#/c/operations/software/tessera/+/439467/ then now that you're okay with the repo content to go?
[10:11:05] <logmsgbot>	 !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:439562|Bumping portals to master (T128546)]] (duration: 00m 50s)
[10:11:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:31] <godog>	 Hauskatze: yup, good to merge, I can go ahead if you don't have permissions otherwise feel free
[10:12:16] <Hauskatze>	 godog: nope, I don't have privs, feel free to merge that and https://gerrit.wikimedia.org/r/#/c/operations/software/tessera/+/439469/ afterwards
[10:12:18] <Hauskatze>	 thank you
[10:13:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Archive the operations/software/tessera repository [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio)
[10:13:06] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Archive the operations/software/tessera repository [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio)
[10:13:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio)
[10:13:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio)
[10:13:29] <godog>	 Hauskatze: {{done}}
[10:13:41] <Hauskatze>	 {{thank you}} --~~~~
[10:13:50] <godog>	 np! thanks for taking care of that
[10:15:48] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: labs: use new redis servers for locks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387570 (https://phabricator.wikimedia.org/T179371) (owner: 10Filippo Giunchedi)
[10:15:50] <Hauskatze>	 my pleasure
[10:16:40] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: hieradata: add redis stretch deployment-prep instances [puppet] - 10https://gerrit.wikimedia.org/r/386869 (https://phabricator.wikimedia.org/T179371) (owner: 10Filippo Giunchedi)
[10:17:09] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: hieradata: use deployment-redis05 for labs jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/387579 (https://phabricator.wikimedia.org/T179371) (owner: 10Filippo Giunchedi)
[10:17:38] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on phab1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 16 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[phabricator/deployment] daniel_zahn debugging - replacement server not in prod
[10:20:37] <wikibugs>	 (03PS1) 10Ema: reload-vcl: fix shell injection [puppet] - 10https://gerrit.wikimedia.org/r/439563
[10:20:58] <wikibugs>	 (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/438002 (https://phabricator.wikimedia.org/T135991)
[10:23:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/438002 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[10:25:31] <volans>	 hashar: by any chance do you have an rough ETA for T196628 ?
[10:25:32] <stashbot>	 T196628: CI: upgrade tox, currently running 2.6.0 - https://phabricator.wikimedia.org/T196628
[10:25:34] <wikibugs>	 (03PS1) 10Muehlenhoff: Revert "Enable base::service_auto_restart for mcelog" [puppet] - 10https://gerrit.wikimedia.org/r/439564
[10:25:46] * volans sends cookies to hashar :)
[10:25:58] <_joe_>	 uhm thumbor failing right now on lvs
[10:27:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Revert "Enable base::service_auto_restart for mcelog" [puppet] - 10https://gerrit.wikimedia.org/r/439564 (owner: 10Muehlenhoff)
[10:27:21] <_joe_>	 it recovered, but this is not the first time it happens
[10:28:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881#4271440 (10Joe)
[10:29:06] <_joe_>	 !log depooling permantently mw1230 for disk replacement, T196881
[10:29:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:11] <stashbot>	 T196881: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881
[10:29:45] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991)
[10:30:50] <wikibugs>	 (03CR) 10Vgutierrez: [C: 031] debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[10:30:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[10:31:32] <wikibugs>	 (03PS2) 10Ema: reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169)
[10:31:44] <wikibugs>	 (03PS3) 10Volans: debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299)
[10:32:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) (owner: 10Ema)
[10:32:14] <wikibugs>	 (03CR) 10Awight: [C: 031] "Chiming in just to say that this upgrade will be useful to my team ASAP, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/438121 (https://phabricator.wikimedia.org/T196710) (owner: 10Thcipriani)
[10:32:32] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[10:33:32] <volans>	 moritzm: there are 2 patches from you on puppet-merge, what should I do?
[10:33:41] <wikibugs>	 (03PS3) 10Ema: reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169)
[10:34:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) (owner: 10Ema)
[10:34:16] <moritzm>	 the second one is a revert of the first, when I tried puppet-merge before it quit because there was no diff
[10:34:25] <moritzm>	 so safe to merge if it's now showing the changes
[10:34:40] <volans>	 lol, yeah it shows only my diffs
[10:34:48] <volans>	 we could open a bug to fix it though
[10:35:14] <wikibugs>	 (03PS4) 10Ema: reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169)
[10:35:44] <volans>	 moritzm: done ;)
[10:35:46] <wikibugs>	 (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991)
[10:35:56] <moritzm>	 volans: don't bother, it's a cornercase which hits us maybe 1-2 times per year...
[10:36:12] <wikibugs>	 (03PS2) 10Volans: debmonitor: use new cache control setting [puppet] - 10https://gerrit.wikimedia.org/r/439541 (https://phabricator.wikimedia.org/T191299)
[10:36:16] <volans>	 ack
[10:37:03] <icinga-wm>	 RECOVERY - puppet last run on phab1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[10:37:33] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: use new cache control setting [puppet] - 10https://gerrit.wikimedia.org/r/439541 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[10:37:54] <wikibugs>	 (03PS6) 10Volans: debmonitor: add basic HTTP Icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/436509 (https://phabricator.wikimedia.org/T191299)
[10:38:06] <wikibugs>	 (03CR) 10Vgutierrez: [C: 031] reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) (owner: 10Ema)
[10:39:03] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: add basic HTTP Icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/436509 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[10:39:13] <wikibugs>	 (03PS2) 10Volans: debmonitor: add explicit dependency on libldap [puppet] - 10https://gerrit.wikimedia.org/r/439546 (https://phabricator.wikimedia.org/T191299)
[10:40:02] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: add explicit dependency on libldap [puppet] - 10https://gerrit.wikimedia.org/r/439546 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[10:42:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[10:52:44] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439567
[10:52:48] <mutante>	 !log phab1002 - editing cached scap config /srv/deployment/phabricator/deployment-cache/.config to replace tin.eqiad with deploy1001.eqiad deployment server, run puppet. other options: run scap with --refresh-config, delet cached .config file (T196019) (T175288)
[10:52:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:52:54] <stashbot>	 T196019: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019
[10:52:54] <stashbot>	 T175288: setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288
[10:52:55] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor nitpick, otherwise LGTM" (031 comment) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[10:55:35] <wikibugs>	 10Operations, 10monitoring: SMART checks fail on wtp1043's sda - https://phabricator.wikimedia.org/T196886#4271553 (10Joe)
[10:56:28] <icinga-wm>	 ACKNOWLEDGEMENT - Device not healthy -SMART- on wtp1043 is CRITICAL: cluster=parsoid device=sda instance=wtp1043:9100 job=node site=eqiad Giuseppe Lavagetto T196886 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wtp1043&var-datasource=eqiad%2520prometheus%252Fops
[10:57:09] <wikibugs>	 (03PS5) 10Ema: reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169)
[10:57:33] <icinga-wm>	 ACKNOWLEDGEMENT - Device not healthy -SMART- on wtp1043 is CRITICAL: cluster=parsoid device=sda instance=wtp1043:9100 job=node site=eqiad Giuseppe Lavagetto T196881 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wtp1043&var-datasource=eqiad%2520prometheus%252Fops
[10:57:47] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271568 (10hashar) We need #operations to fix up permissions on cobalt.wikimedia.org   Files under `/srv/gerrit/git/All-Users.git/` bei...
[10:57:54] <wikibugs>	 (03CR) 10Ema: [C: 032] reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) (owner: 10Ema)
[10:58:43] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271583 (10Paladox) sudo chrown gerrit2:gerrit2 /srv/gerrit/git
[11:00:06] <jouncebot>	 jan_drewniak: I, the Bot under the Fountain, allow thee, The Deployer, to do Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1100).
[11:03:22] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] Bumped django-auth-ldap to v1.6.1 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437956 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:04:04] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner_tls: fix monitoring definitions [puppet] - 10https://gerrit.wikimedia.org/r/439569
[11:04:06] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:04:14] <icinga-wm>	 PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/varnish/reload-vcl]
[11:04:19] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:04:24] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 319.75 seconds
[11:04:43] <wikibugs>	 (03Abandoned) 10Ema: reload-vcl: fix get_cmd_output error handling [puppet] - 10https://gerrit.wikimedia.org/r/439550 (owner: 10Ema)
[11:05:14] <_joe_>	 volans: ^^
[11:05:31] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[11:05:34] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::jobrunner_tls: fix monitoring definitions [puppet] - 10https://gerrit.wikimedia.org/r/439569 (owner: 10Giuseppe Lavagetto)
[11:05:53] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1230 is CRITICAL: Host mw1230 is not in mediawiki-installation dsh group
[11:06:13] <wikibugs>	 (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/439569 (owner: 10Giuseppe Lavagetto)
[11:06:35] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271595 (10Marostegui)
[11:06:38] <_joe_>	 mw1230 is expected, I'll ack that alert
[11:07:40] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271598 (10Paladox) This https://phabricator.wikimedia.org/D1067 will fix it so no more new notedb refs are cloned.
[11:08:12] <wikibugs>	 (03PS1) 10Volans: debmonitor: set TLS cipher suite for LDAP [puppet] - 10https://gerrit.wikimedia.org/r/439571 (https://phabricator.wikimedia.org/T191299)
[11:08:16] <wikibugs>	 (03PS3) 10Volans: Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300)
[11:08:18] <wikibugs>	 (03PS3) 10Volans: Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504)
[11:08:20] <wikibugs>	 (03PS2) 10Volans: Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504)
[11:08:22] <wikibugs>	 (03PS2) 10Volans: Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299)
[11:08:47] <wikibugs>	 10Operations, 10ops-codfw, 10netops: Switch port configuration for backup2001 - https://phabricator.wikimedia.org/T196782#4268246 (10ayounsi) ```lang=diff [edit interfaces interface-range vlan-private1-d-codfw]      member ge-3/0/10 { ... } +    member xe-2/0/11; [edit interfaces] +   xe-2/0/11 { +       des...
[11:09:12] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271605 (10Marostegui) >>! In T196840#4271598, @Paladox wrote: > This https://phabricator.wikimedia.org/D1067 will fix it so no more new notedb refs are cloned.  When are you p...
[11:09:17] <wikibugs>	 10Operations, 10ops-codfw, 10netops: Switch port configuration for backup2001 - https://phabricator.wikimedia.org/T196782#4271606 (10ayounsi) 05Open>03Resolved a:05RobH>03ayounsi
[11:09:19] <wikibugs>	 (03CR) 10Volans: Client CLI: read configuration file. (031 comment) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[11:09:21] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477#4271608 (10ayounsi)
[11:09:28] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271040 (10Dzahn) >>! In T196869#4271568, @hashar wrote: > We need #operations to fix up permissions on cobalt.wikimedia.org  >  > File...
[11:09:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[11:09:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[11:09:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:09:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:09:51] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[11:09:55] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271614 (10Paladox) Need someone to approve it, merge it and then i think @mmodell would have to deploy it.
[11:10:28] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271615 (10Marostegui) Excellent - thanks! :)
[11:10:31] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271616 (10Paladox) @dzahn though the objects inside that folder could be owned by root.
[11:10:49] <wikibugs>	 (03PS1) 10ArielGlenn: ignore blank lines in xml dumps cleanup config files [puppet] - 10https://gerrit.wikimedia.org/r/439574
[11:11:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] kubernetes: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439556 (owner: 10Dzahn)
[11:12:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 364.85 seconds
[11:12:08] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: kubernetes: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439556 (owner: 10Dzahn)
[11:12:41] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] kubernetes: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439556 (owner: 10Dzahn)
[11:12:54] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.83 seconds
[11:13:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2050 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.37 seconds
[11:13:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2036 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.67 seconds
[11:13:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.67 seconds
[11:13:17] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] ignore blank lines in xml dumps cleanup config files [puppet] - 10https://gerrit.wikimedia.org/r/439574 (owner: 10ArielGlenn)
[11:13:26] <wikibugs>	 (03PS2) 10ArielGlenn: ignore blank lines in xml dumps cleanup config files [puppet] - 10https://gerrit.wikimedia.org/r/439574
[11:13:48] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439567 (owner: 10Marostegui)
[11:14:22] <wikibugs>	 (03PS2) 10Volans: debmonitor: set TLS cipher suite for LDAP [puppet] - 10https://gerrit.wikimedia.org/r/439571 (https://phabricator.wikimedia.org/T191299)
[11:15:14] <mutante>	 !Log gerrit (cobalt) - fixing root-owned files in gerrit All-Userrs.git objects ( affects saved preferences of some users) (T196869)
[11:15:15] <stashbot>	 T196869: In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869
[11:15:15] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: set TLS cipher suite for LDAP [puppet] - 10https://gerrit.wikimedia.org/r/439571 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[11:15:31] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439567 (owner: 10Marostegui)
[11:16:31] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439577 (https://phabricator.wikimedia.org/T191316)
[11:16:44] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271646 (10Dzahn) >>! In T196869#4271568, @hashar wrote: > We need #operations to fix up permissions on cobalt.wikimedia.org   Fixed....
[11:17:02] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437955 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:17:55] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437956 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:18:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2076 is OK: OK slave_sql_lag Replication lag: 57.79 seconds
[11:18:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2089 is OK: OK slave_sql_lag Replication lag: 45.44 seconds
[11:18:10] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439567 (owner: 10Marostegui)
[11:18:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2053 is OK: OK slave_sql_lag Replication lag: 0.22 seconds
[11:18:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2060 is OK: OK slave_sql_lag Replication lag: 0.37 seconds
[11:18:29] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[11:18:45] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2067 is OK: OK slave_sql_lag Replication lag: 0.07 seconds
[11:18:46] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2039 is OK: OK slave_sql_lag Replication lag: 0.25 seconds
[11:19:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 54.16 seconds
[11:19:18] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:19:25] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 43.56 seconds
[11:19:59] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439577 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[11:20:01] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans)
[11:20:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2087 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[11:20:39] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[11:20:41] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 after alter table (duration: 00m 51s)
[11:20:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:49] <wikibugs>	 (03PS4) 10Muehlenhoff: Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298)
[11:20:56] <wikibugs>	 (03PS1) 10Ema: reload-vcl: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/439578
[11:21:45] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439577 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[11:21:45] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2046 is OK: OK slave_sql_lag Replication lag: 0.12 seconds
[11:22:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff)
[11:22:40] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439577 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[11:22:53] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 for alter table (duration: 00m 50s)
[11:22:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:22:59] <marostegui>	 !log Deploy schema change on db1103:3314 T191316 T192926 T89737 T195193
[11:23:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:06] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[11:23:06] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[11:23:07] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[11:23:07] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[11:24:47] <wikibugs>	 (03CR) 10Volans: "nitpick inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439578 (owner: 10Ema)
[11:25:05] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2074 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 426.71 seconds
[11:29:05] <wikibugs>	 (03PS1) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300)
[11:29:36] <icinga-wm>	 RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[11:29:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[11:30:11] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@b5396cd]: Tune cirrus jobs concurrencies
[11:30:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:30:44] <wikibugs>	 (03PS1) 10Aklapper: phabricator weekly project changes email: Add mysql slave port parameter [puppet] - 10https://gerrit.wikimedia.org/r/439581 (https://phabricator.wikimedia.org/T196604)
[11:30:52] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@b5396cd]: Tune cirrus jobs concurrencies (duration: 00m 42s)
[11:30:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:31:22] <wikibugs>	 (03CR) 10Hoo man: "Please only change this temporary. 10m are quite a lot…" [puppet] - 10https://gerrit.wikimedia.org/r/439528 (https://phabricator.wikimedia.org/T194602) (owner: 10Addshore)
[11:31:45] <wikibugs>	 (03CR) 10Aklapper: "Review carefully as I have no clue what I'm doing here" [puppet] - 10https://gerrit.wikimedia.org/r/439581 (https://phabricator.wikimedia.org/T196604) (owner: 10Aklapper)
[11:33:34] <wikibugs>	 (03PS2) 10Ema: reload-vcl: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/439578
[11:34:59] <wikibugs>	 (03PS7) 10Arturo Borrero Gonzalez: openstack: eqiad1: enable more components for labcontrol boxes (keystone) [puppet] - 10https://gerrit.wikimedia.org/r/438220 (https://phabricator.wikimedia.org/T196633)
[11:35:26] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271709 (10hashar) a:03Dzahn Fixed mutante ! Danke.  My test case was to go to https://gerrit.wikimedia.org/r/settings/ and try to fi...
[11:37:00] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: eqiad1: enable more components for labcontrol boxes (keystone) [puppet] - 10https://gerrit.wikimedia.org/r/438220 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[11:37:36] <hashar>	 volans: I have no idea. I haven't looked at it yet.  I guess I will just upgrade tox accross the fleet of containers
[11:38:27] <arturo>	 !log T196633 deploy keystone to labcontrol100[3,4].wikimedia.org. Dormant daemon, no DB yet
[11:38:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:38:32] <stashbot>	 T196633: cloudvps: eqiad1 deployment - https://phabricator.wikimedia.org/T196633
[11:43:35] <icinga-wm>	 PROBLEM - DPKG on labcontrol1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[11:43:45] <icinga-wm>	 PROBLEM - DPKG on labcontrol1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[11:43:46] <icinga-wm>	 PROBLEM - puppet last run on labcontrol1003 is CRITICAL: CRITICAL: Puppet has 38 failures. Last run 51 seconds ago with 38 failures. Failed resources (up to 3 shown): Package[keystone],Package[alembic],Package[python-castellan],Package[python-concurrent.futures]
[11:44:55] <icinga-wm>	 PROBLEM - puppet last run on labcontrol1004 is CRITICAL: CRITICAL: Puppet has 38 failures. Last run 1 minute ago with 38 failures. Failed resources (up to 3 shown): Package[keystone],Package[alembic],Package[python-castellan],Package[python-concurrent.futures]
[11:45:05] <arturo>	 ouch
[11:45:45] <arturo>	 !log T196633 downtime labcontrol100[3,4] due to unexpected puppet errors on installation of keystone
[11:45:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:50] <stashbot>	 T196633: cloudvps: eqiad1 deployment - https://phabricator.wikimedia.org/T196633
[11:46:59] <wikibugs>	 (03Abandoned) 10Hoo man: Include php5 packages on canary hosts [puppet] - 10https://gerrit.wikimedia.org/r/391045 (owner: 10Hoo man)
[11:48:55] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2061 is OK: OK slave_sql_lag Replication lag: 46.97 seconds
[11:52:02] <wikibugs>	 (03CR) 10Muehlenhoff: Add initial Debianisation of debmonitor-client (033 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff)
[11:52:41] <volans>	 hashar: ack, thanks! no hurry, just wanted to know if it is something we'll see in the near future or not ;)
[11:53:10] <wikibugs>	 (03PS3) 10Urbanecm: Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134)
[11:53:30] <wikibugs>	 (03PS2) 10Urbanecm: Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134)
[11:53:42] <wikibugs>	 (03PS3) 10Urbanecm: Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134)
[11:55:45] <icinga-wm>	 RECOVERY - DPKG on labcontrol1004 is OK: All packages OK
[12:00:05] <icinga-wm>	 RECOVERY - puppet last run on labcontrol1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:00:37] <wikibugs>	 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4271773 (10danstillman) Not sure what you're planning, but the initial version of our Node port is up:  https://github...
[12:03:29] <wikibugs>	 (03PS5) 10Muehlenhoff: Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298)
[12:03:42] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437777 (https://phabricator.wikimedia.org/T196488) (owner: 10Sau226)
[12:04:16] <icinga-wm>	 RECOVERY - DPKG on labcontrol1003 is OK: All packages OK
[12:05:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff)
[12:08:08] <wikibugs>	 (03CR) 10Volans: "Replies inline" (033 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff)
[12:08:30] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271807 (10Dzahn) 05Open>03Resolved
[12:08:52] <Urbanecm>	 jouncebot, next
[12:08:52] <jouncebot>	 In 0 hour(s) and 51 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1300)
[12:10:15] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet operation_type={create_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:10:54] <wikibugs>	 (03CR) 10Volans: [C: 031] "LGTM (although I'm not familiar with this script ;) )" [puppet] - 10https://gerrit.wikimedia.org/r/439578 (owner: 10Ema)
[12:11:16] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:12:35] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 406.54 seconds
[12:16:11] <wikibugs>	 (03PS6) 10Muehlenhoff: Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298)
[12:16:51] <wikibugs>	 (03PS1) 10Volans: Add scap/log to .gitignore [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439584
[12:16:53] <wikibugs>	 (03PS1) 10Volans: Updated src to v0.1.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439585 (https://phabricator.wikimedia.org/T191299)
[12:16:55] <wikibugs>	 (03PS1) 10Volans: Built wheels for v0.1.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439586 (https://phabricator.wikimedia.org/T191299)
[12:17:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff)
[12:18:51] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] Add scap/log to .gitignore [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439584 (owner: 10Volans)
[12:20:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 0.53 seconds
[12:22:42] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] Updated src to v0.1.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439585 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[12:24:02] <wikibugs>	 (03CR) 10Volans: [V: 032 C: 032] Built wheels for v0.1.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439586 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[12:29:46] <wikibugs>	 (03PS3) 10Paladox: Rename wikimedia-polygerrit-style.html to gerrit-theme.html [puppet] - 10https://gerrit.wikimedia.org/r/439504 (https://phabricator.wikimedia.org/T196835)
[12:29:51] <wikibugs>	 (03PS4) 10Paladox: Gerrit: Add CoC and privacy policy to footer [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835)
[12:30:07] <wikibugs>	 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4271879 (10Mvolz) >>! In T187194#4271773, @danstillman wrote: > Not sure what you're planning, but the initial version...
[12:36:35] <wikibugs>	 10Operations, 10cloud-services-team: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4271885 (10aborrero)
[12:41:37] <wikibugs>	 (03PS1) 10Volans: debmonitor: fix typo in nginx config [puppet] - 10https://gerrit.wikimedia.org/r/439587 (https://phabricator.wikimedia.org/T191299)
[12:42:21] <wikibugs>	 (03CR) 10Paladox: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/439444 (https://phabricator.wikimedia.org/T196812) (owner: 10Paladox)
[12:42:29] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: fix typo in nginx config [puppet] - 10https://gerrit.wikimedia.org/r/439587 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[12:42:51] <wikibugs>	 (03CR) 10Vgutierrez: [C: 031] "Sorry I've missed that :(" [puppet] - 10https://gerrit.wikimedia.org/r/439587 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[12:48:17] <wikibugs>	 10Operations, 10cloud-services-team: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4264894 (10aborrero) This is our usecase, for example with keystone (T196633).  We need the equivalent of `apt-get install -t jessie-backports keystone`. This i...
[12:54:10] <logmsgbot>	 !log volans@deploy1001 Started deploy [debmonitor/deploy@81d7333]: Release v0.1.2
[12:54:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:58:14] <wikibugs>	 (03PS7) 10Muehlenhoff: Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298)
[12:59:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff)
[13:00:05] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1300). Please do the needful.
[13:00:05] <jouncebot>	 Daimona and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:07] <Urbanecm>	 Here
[13:00:19] <Daimona>	 Hey
[13:01:11] <Urbanecm>	 Who will SWAT today? zeljkof? hashar? :)
[13:01:20] <hashar>	 I will hello :)
[13:01:25] <Urbanecm>	 Hi hashar!
[13:01:27] <logmsgbot>	 !log volans@deploy1001 Finished deploy [debmonitor/deploy@81d7333]: Release v0.1.2 (duration: 07m 16s)
[13:01:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:02:14] <wikibugs>	 (03CR) 10Volans: "Same here, debmonitor/deploy/.git deploys are blocked by this." [puppet] - 10https://gerrit.wikimedia.org/r/438121 (https://phabricator.wikimedia.org/T196710) (owner: 10Thcipriani)
[13:03:05] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: keystone: use install_options to install from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/439589 (https://phabricator.wikimedia.org/T196633)
[13:03:23] <hashar>	 arrgh
[13:03:52] <Urbanecm>	 What's happening?
[13:04:42] <hashar>	 well I am looking at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/437924
[13:04:50] <hashar>	 guess I will need to update l10n cache somehow
[13:05:00] <Daimona>	 Probably
[13:05:12] <Daimona>	 Not sure tho
[13:05:41] <Urbanecm>	 hashar, maybe https://wikitech.wikimedia.org/wiki/LocalisationUpdate can help
[13:05:46] <hashar>	 I would rather deploy that with the rest of the train. I am sure I am going to screw it up somehow
[13:07:11] <Daimona>	 If that's the case, no problem :-)
[13:09:01] <hashar>	 Daimona: yeah sorry. I have little time to baby sit the swat after the depoyment and I am not confident with this one :^\
[13:09:12] <hashar>	 it does not seem to urgent though, that will start being deployed tomorrow anyway
[13:09:26] <Daimona>	 Sure
[13:09:34] <wikibugs>	 (03CR) 10Hashar: [C: 032] Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:09:39] <Daimona>	 Yeah, this actually missed last week's train
[13:09:52] <Daimona>	 But I guess waiting another couple of days won't be a big deal
[13:10:08] <hashar>	 yeah that is how I understand it :]
[13:10:18] <hashar>	 Daimona: but as part of the train, it will be straightforward/easy
[13:10:51] <Daimona>	 Indeed
[13:10:54] <Daimona>	 Thanks anyway :-)
[13:11:14] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:12:11] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 374.78 seconds
[13:12:37] <marostegui>	 checking that
[13:12:42] <hashar>	 Urbanecm: I am syncing the logo change
[13:12:47] <Urbanecm>	 ack
[13:13:18] <Urbanecm>	 (are you syncing only the first patch? first 3 patches are for the same problem )
[13:13:21] <logmsgbot>	 !log hashar@deploy1001 Synchronized static/images/project-logos: Revert "Change bewikiquote logo" - T196134 (duration: 00m 51s)
[13:13:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:26] <stashbot>	 T196134: Change bewikiquote logo - https://phabricator.wikimedia.org/T196134
[13:14:02] <hashar>	 Urbanecm: and I have purge dthe logos
[13:14:08] <Urbanecm>	 thx
[13:14:38] <wikibugs>	 (03CR) 10Hashar: [C: 032] Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:15:06] <wikibugs>	 (03CR) 10Hashar: [C: 032] Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:15:24] <hashar>	 I will deploy them one by one
[13:15:32] <hashar>	 logos png files first then the IS.php file
[13:16:03] <Urbanecm>	 Ok, ack
[13:16:03] <wikibugs>	 (03Merged) 10jenkins-bot: Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:16:51] <wikibugs>	 (03Merged) 10jenkins-bot: Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:17:12] <wikibugs>	 (03CR) 10jenkins-bot: Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:17:14] <wikibugs>	 (03CR) 10jenkins-bot: Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:17:16] <wikibugs>	 (03CR) 10jenkins-bot: Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:17:21] <icinga-wm>	 RECOVERY - Check systemd state on proton1001 is OK: OK - running: The system is fully operational
[13:17:27] <logmsgbot>	 !log hashar@deploy1001 Synchronized static/images/project-logos: Change logo files for bewikiquote - T196134 (duration: 00m 50s)
[13:17:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:18:31] <hashar>	 Urbanecm: I have deployed the new bewikiquote logos and purge the URL. Now doing the IS change
[13:18:37] <Urbanecm>	 ack
[13:19:41] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 0.01 seconds
[13:20:05] <logmsgbot>	 !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Use uploaded HD logo for bewikiquote - T196134 (duration: 00m 50s)
[13:20:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:11] <stashbot>	 T196134: Change bewikiquote logo - https://phabricator.wikimedia.org/T196134
[13:20:32] <wikibugs>	 (03CR) 10Ottomata: [C: 031] ":)" [puppet] - 10https://gerrit.wikimedia.org/r/438243 (https://phabricator.wikimedia.org/T196158) (owner: 10Elukey)
[13:20:43] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] update-ocsp: Actually use --time-offset-end argument [puppet] - 10https://gerrit.wikimedia.org/r/436485 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez)
[13:20:46] <logmsgbot>	 !log volans@deploy1001 Started deploy [debmonitor/deploy@81d7333]: Release v0.1.2
[13:20:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:21:28] <wikibugs>	 (03PS3) 10Vgutierrez: update-ocsp: Actually use --time-offset-end argument [puppet] - 10https://gerrit.wikimedia.org/r/436485 (https://phabricator.wikimedia.org/T163541)
[13:21:42] <logmsgbot>	 !log volans@deploy1001 Finished deploy [debmonitor/deploy@81d7333]: Release v0.1.2 (duration: 00m 56s)
[13:21:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:10] <wikibugs>	 (03CR) 10Ottomata: "Does it need the API version set at all anymore?  Can it just negotiate?" [puppet] - 10https://gerrit.wikimedia.org/r/438211 (owner: 10Elukey)
[13:23:00] <hashar>	 Urbanecm: statistics updated https://phabricator.wikimedia.org/T196788#4272009
[13:23:08] <Urbanecm>	 hashar, thx
[13:23:30] <Urbanecm>	 hashar, seems we have time, do you think I can add more patches?
[13:23:32] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez)
[13:23:41] <icinga-wm>	 RECOVERY - Check systemd state on proton1002 is OK: OK - running: The system is fully operational
[13:23:50] <Urbanecm>	 (btw please update stats for idwikimedia as well hashar)
[13:23:56] <wikibugs>	 (03CR) 10Hashar: [C: 032] Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm)
[13:24:01] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[13:24:23] <wikibugs>	 (03PS3) 10Vgutierrez: update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541)
[13:25:04] <wikibugs>	 (03PS1) 10Urbanecm: Use 1x lgoo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134)
[13:25:14] <hashar>	 Urbanecm: idwikimedia done
[13:25:16] <Urbanecm>	 thx
[13:26:18] <wikibugs>	 (03PS2) 10Urbanecm: Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134)
[13:26:32] <hashar>	 Urbanecm: and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/439457 ends up in a merge conflict somehow
[13:26:34] <hashar>	 rebasing
[13:26:40] <wikibugs>	 (03PS2) 10Hashar: Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm)
[13:26:56] <wikibugs>	 (03CR) 10Hashar: [C: 032] Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm)
[13:28:11] <wikibugs>	 (03PS8) 10Urbanecm: id_internalwikimedia: Initial configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438279
[13:28:33] <wikibugs>	 (03Merged) 10jenkins-bot: Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm)
[13:28:37] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] update-ocsp: Make pylint happy [puppet] - 10https://gerrit.wikimedia.org/r/439558 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez)
[13:28:46] <wikibugs>	 (03CR) 10jenkins-bot: Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm)
[13:29:13] <hashar>	 Urbanecm: syncng the pswikivoyage change
[13:29:18] <Urbanecm>	 ack
[13:29:37] <Urbanecm>	 hashar, do we have time for other patches as well?
[13:29:51] <logmsgbot>	 !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix wgMetaNamespace for pswikivoyage - T196837 (duration: 00m 50s)
[13:29:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:56] <stashbot>	 T196837: Fix wgMetaNamespace for pswikivoyage - https://phabricator.wikimedia.org/T196837
[13:30:01] <wikibugs>	 (03PS2) 10Vgutierrez: update-ocsp: Make pylint happy [puppet] - 10https://gerrit.wikimedia.org/r/439558 (https://phabricator.wikimedia.org/T163541)
[13:30:27] <wikibugs>	 (03CR) 10Elukey: [C: 032] "> Does it need the API version set at all anymore?  Can it just" [puppet] - 10https://gerrit.wikimedia.org/r/438211 (owner: 10Elukey)
[13:30:33] <hashar>	 Urbanecm: such as https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/439594/2/wmf-config/InitialiseSettings.php  ? :)
[13:31:06] <Urbanecm>	 Such as this one, but there's 9 tasks assigned to me waiting for SWAT, so... :D
[13:31:21] <wikibugs>	 (03PS3) 10Hashar: Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:31:43] <hashar>	 I will do that one then stop. I have some code to complete before doing my weekly conf calls
[13:31:56] <wikibugs>	 (03CR) 10Hashar: [C: 032] Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:31:58] <Urbanecm>	 Ok, ack
[13:32:06] <icinga-wm>	 ACKNOWLEDGEMENT - IPMI Sensor Status on maps1002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Power Supply 2 = Critical, Power Supplies = Critical] Gehel followed in https://phabricator.wikimedia.org/T196897
[13:32:17] <wikibugs>	 (03CR) 10Ottomata: statistics::discovery: re-enable cron job (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga)
[13:32:18] <hashar>	 Urbanecm: I should more time tomorrow or thursday
[13:32:40] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Power supply issue on maps1002 - https://phabricator.wikimedia.org/T196897#4272061 (10Gehel)
[13:32:44] <Urbanecm>	 Ok :)
[13:32:54] <wikibugs>	 (03CR) 10Ottomata: "Def not a big issue!  Just wondering." [puppet] - 10https://gerrit.wikimedia.org/r/438211 (owner: 10Elukey)
[13:33:15] <wikibugs>	 (03Merged) 10jenkins-bot: Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:33:30] <wikibugs>	 (03CR) 10jenkins-bot: Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:35:38] <logmsgbot>	 !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix wgMetaNamespace for pswikivoyage - T196837 (duration: 00m 49s)
[13:35:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:43] <stashbot>	 T196837: Fix wgMetaNamespace for pswikivoyage - https://phabricator.wikimedia.org/T196837
[13:38:08] <hashar>	 Urbanecm: done
[13:38:11] <Urbanecm>	 thx
[13:43:43] <wikibugs>	 (03CR) 10Rush: "seems good, does labtestn need the same option set?" [puppet] - 10https://gerrit.wikimedia.org/r/439589 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[13:44:54] <marostegui>	 hashar: swat done?
[13:45:04] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439598
[13:47:12] <hashar>	 marostegui: yes
[13:47:16] <hashar>	 !log European SWAT completed
[13:47:17] <hashar>	 sorry
[13:47:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:25] <marostegui>	 \o/
[13:47:26] <marostegui>	 thanks!
[13:47:52] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439598 (owner: 10Marostegui)
[13:49:32] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439598 (owner: 10Marostegui)
[13:49:45] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439598 (owner: 10Marostegui)
[13:50:40] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 after alter table (duration: 00m 50s)
[13:50:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:05] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439600 (https://phabricator.wikimedia.org/T191316)
[13:52:11] <wikibugs>	 (03PS4) 10Zoranzoki21: Add sites to the wgCopyUploadsDomains whitelist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436211 (https://phabricator.wikimedia.org/T195270)
[13:52:55] <logmsgbot>	 !log otto@deploy1001 Started deploy [eventlogging/eventbus@08a1dff]: Producing events with kafka timestamp set to event time - T196407
[13:52:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:00] <stashbot>	 T196407: EventBus should produce messages to Kafka with event time set to meta.dt - https://phabricator.wikimedia.org/T196407
[13:54:00] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439600 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[13:54:06] <icinga-wm>	 ACKNOWLEDGEMENT - Device not healthy -SMART- on mw1230 is CRITICAL: cluster=api_appserver device=sda instance=mw1230:9100 job=node site=eqiad Giuseppe Lavagetto T196881 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mw1230&var-datasource=eqiad%2520prometheus%252Fops
[13:54:06] <icinga-wm>	 ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1230 is CRITICAL: Host mw1230 is not in mediawiki-installation dsh group Giuseppe Lavagetto T196881
[13:54:50] <logmsgbot>	 !log otto@deploy1001 Finished deploy [eventlogging/eventbus@08a1dff]: Producing events with kafka timestamp set to event time - T196407 (duration: 01m 55s)
[13:54:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:37] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439600 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[13:56:24] <logmsgbot>	 !log otto@deploy1001 Started deploy [eventlogging/analytics@08a1dff]: Producing events with kafka timestamp set to event time - T196407
[13:56:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:28] <logmsgbot>	 !log otto@deploy1001 Finished deploy [eventlogging/analytics@08a1dff]: Producing events with kafka timestamp set to event time - T196407 (duration: 00m 04s)
[13:56:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:43] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1081 for alter table (duration: 00m 48s)
[13:56:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:03] <marostegui>	 !log Deploy schema change on db1081 T191316 T192926 T89737 T195193
[13:58:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:10] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[13:58:10] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[13:58:10] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[13:58:10] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[13:59:16] <wikibugs>	 10Operations, 10cloud-services-team, 10Patch-For-Review: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4272176 (10Andrew) @aborrero doesn't pinning work if we pin the keystone package and all dependencies?  Like in openstack::jessie_mitaka_c...
[13:59:18] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439600 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[13:59:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labstore1003 SMART failure (again) - https://phabricator.wikimedia.org/T196704#4272177 (10chasemp) 05Open>03Resolved ```root@labstore1003:~# /usr/local/lib/nagios/plugins/check_raid megacli OK: optimal, 5 logical, 34 physical OK```  Thanks!
[14:00:14] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Replace memory bank on scb1002 - https://phabricator.wikimedia.org/T196901#4272179 (10Joe)
[14:00:42] <icinga-wm>	 ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on scb1002 is CRITICAL: 5.001 ge 4 Giuseppe Lavagetto T196901 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=scb1002&var-datasource=eqiad%2520prometheus%252Fops
[14:06:37] <wikibugs>	 (03CR) 10Muehlenhoff: debmonitor: client side setup (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[14:06:40] <wikibugs>	 10Operations, 10ops-eqiad, 10netops: replace mr1-eqiad - https://phabricator.wikimedia.org/T185171#4272218 (10ayounsi)
[14:06:49] <wikibugs>	 10Operations, 10ops-eqiad, 10netops: replace mr1-eqiad - https://phabricator.wikimedia.org/T185171#3908273 (10ayounsi)
[14:08:22] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2074 is OK: OK slave_sql_lag Replication lag: 0.21 seconds
[14:08:41] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2043 is OK: OK slave_sql_lag Replication lag: 0.05 seconds
[14:08:41] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2050 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[14:08:42] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[14:08:51] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 0.33 seconds
[14:10:58] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] debmonitor: client side setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[14:13:12] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2094 is OK: OK slave_sql_lag Replication lag: 39.95 seconds
[14:13:55] <wikibugs>	 (03CR) 10Hoo man: [C: 032] "Tested" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425987 (owner: 10Lokal Profil)
[14:14:24] <wikibugs>	 (03CR) 10Muehlenhoff: debmonitor: client side setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[14:14:31] <wikibugs>	 (03Merged) 10jenkins-bot: Allow prefix to override "all" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425987 (owner: 10Lokal Profil)
[14:17:25] <wikibugs>	 (03PS1) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603
[14:17:36] <wikibugs>	 (03PS2) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603
[14:18:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 (owner: 10Volans)
[14:19:28] <wikibugs>	 (03PS3) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603
[14:20:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 (owner: 10Volans)
[14:21:14] <wikibugs>	 (03PS17) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151)
[14:21:52] <hoo>	 !log Updated operations/dumps/dcat (536bd5b..559dee3) on snapshot1008
[14:21:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:26] <wikibugs>	 (03PS4) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603
[14:22:49] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] deployment_server/package_builder: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439544 (owner: 10Dzahn)
[14:22:56] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: deployment_server/package_builder: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439544 (owner: 10Dzahn)
[14:22:59] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] deployment_server/package_builder: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439544 (owner: 10Dzahn)
[14:23:09] <wikibugs>	 (03CR) 10Hoo man: [C: 031] "Should be good to merge now (but didn't test yet)" [puppet] - 10https://gerrit.wikimedia.org/r/424291 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil)
[14:23:27] <wikibugs>	 (03PS5) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603
[14:23:42] <godog>	 thcipriani: I'm ok to go with scap upgrade btw, if you are around
[14:24:20] <thcipriani>	 godog: awesome, thank you! I'm around.
[14:24:40] <wikibugs>	 (03CR) 10Volans: [C: 032] Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 (owner: 10Volans)
[14:25:07] <godog>	 !log upload scap 3.8.2-1 - T196710
[14:25:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:12] <stashbot>	 T196710: Update Debian Package for Scap3 to 3.8.2-1 - https://phabricator.wikimedia.org/T196710
[14:25:23] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Scap: Bump version to 3.8.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/438121 (https://phabricator.wikimedia.org/T196710) (owner: 10Thcipriani)
[14:26:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Scap: Bump version to 3.8.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/438121 (https://phabricator.wikimedia.org/T196710) (owner: 10Thcipriani)
[14:26:57] <logmsgbot>	 !log akosiaris@deploy1001 Started deploy [proton/deploy@97ec4bf]: (no justification provided)
[14:27:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:43] <wikibugs>	 (03CR) 10Hoo man: [C: 031] "Tested on sn1008, output diff looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/424291 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil)
[14:28:22] <icinga-wm>	 RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy
[14:28:48] <godog>	 thcipriani: deploy1001 upgraded
[14:29:21] * thcipriani looks
[14:30:04] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: replace bad disk in db2059 - https://phabricator.wikimedia.org/T196709#4272315 (10Papaul) a:05Papaul>03Marostegui Disk replaced
[14:31:05] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: replace bad disk in db2059 - https://phabricator.wikimedia.org/T196709#4272318 (10Marostegui) Thanks! ```       physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 600 GB, Rebuilding) ```  Will report back once it is done
[14:31:18] <thcipriani>	 godog: yep, I see the change, it was a small one-liner. I don't have a repo to test deploy for this particular change, but I was able to recreate locally. Once puppet is run on all the targets it should unblock a few folks. Thanks again for all your help!
[14:32:37] <godog>	 thcipriani: np!
[14:32:52] <icinga-wm>	 PROBLEM - puppet last run on proton1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[proton/deploy]
[14:33:42] <akosiaris>	 that proton thing is failing with a different error every time
[14:34:13] <godog>	 that's how you get to 100% coverage
[14:34:25] <akosiaris>	 lol
[14:34:32] <_joe_>	 akosiaris: so that you don't get bored and burnt out by the repetitiveness of your work
[14:35:56] <akosiaris>	 !log reboot mx1001, poolcounter1001 for kernel upgrades and spec-ctrl enabling
[14:35:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:15] <wikibugs>	 10Operations, 10Scap, 10Patch-For-Review: Update Debian Package for Scap3 to 3.8.2-1 - https://phabricator.wikimedia.org/T196710#4272339 (10thcipriani) 05Open>03Resolved a:03fgiunchedi New package was uploaded and puppet should be setting it up on targets with the next run.
[14:36:18] <wikibugs>	 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 2 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4272343 (10thcipriani)
[14:38:02] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[scap]
[14:41:01] <icinga-wm>	 RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational
[14:42:06] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439606
[14:42:21] <wikibugs>	 (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991)
[14:42:41] <icinga-wm>	 RECOVERY - Check systemd state on restbase-dev1006 is OK: OK - running: The system is fully operational
[14:43:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[14:45:30] <urandom>	 godog: welcome back :)
[14:46:08] <godog>	 urandom: \o/ thanks!
[14:46:12] <urandom>	 godog: thcipriani: should scap 3.8.2-2 exist in apt?
[14:46:31] <thcipriani>	 urandom: I think it should be 3.8.2-1
[14:46:59] <urandom>	 oh, rightt
[14:47:01] <urandom>	 brain-o
[14:47:23] <urandom>	 but i see only 3.8.1-1 available
[14:47:38] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439606 (owner: 10Marostegui)
[14:48:02] <urandom>	 nevermind...
[14:48:12] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:48:15] <thcipriani>	 :)
[14:48:27] * urandom needs coffee
[14:49:04] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439606 (owner: 10Marostegui)
[14:49:16] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439606 (owner: 10Marostegui)
[14:49:48] <urandom>	 thcipriani: as godog pointed out elsewhere, our cron-job runs apt-get update before puppet, and i issued a manual run
[14:49:56] <wikibugs>	 (03PS2) 10Herron: add SPF record to disallow email for all parked domains [dns] - 10https://gerrit.wikimedia.org/r/429874 (https://phabricator.wikimedia.org/T193408) (owner: 10Dzahn)
[14:50:32] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1081 after alter table (duration: 00m 50s)
[14:50:35] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439607 (https://phabricator.wikimedia.org/T191316)
[14:50:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:12] <thcipriani>	 urandom: ah, yep, that's what I started typing before you said nevermind :)
[14:51:20] <thcipriani>	 but I'm a slow typist.
[14:51:22] <wikibugs>	 (03CR) 10Herron: [C: 032] add SPF record to disallow email for all parked domains [dns] - 10https://gerrit.wikimedia.org/r/429874 (https://phabricator.wikimedia.org/T193408) (owner: 10Dzahn)
[14:52:34] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439607 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[14:53:14] <wikibugs>	 10Operations, 10Cassandra, 10User-Eevans: Add Cassandra 3.11.2 package to internal APT repository - https://phabricator.wikimedia.org/T196745#4272414 (10Eevans) 05stalled>03Open p:05Low>03Normal
[14:53:17] <akosiaris>	 !log reboot bohrium for kernel upgrades and spec-ctrl enabling. Manually stopped mysql behorehand
[14:53:19] <akosiaris>	 elukey: ^
[14:53:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:41] <icinga-wm>	 PROBLEM - Check systemd state on restbase-dev1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[14:54:01] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439607 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[14:54:13] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439607 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[14:54:27] <wikibugs>	 10Operations, 10Cassandra, 10User-Eevans: Add Cassandra 3.11.2 package to internal APT repository - https://phabricator.wikimedia.org/T196745#4267217 (10Eevans)
[14:55:13] <elukey>	 akosiaris: thanks!
[14:55:16] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1084 for alter table (duration: 00m 50s)
[14:55:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:55:24] <marostegui>	 !log Deploy schema change on db1084 T191316 T192926 T89737 T195193
[14:55:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:55:30] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[14:55:31] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[14:55:31] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[14:55:31] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[14:57:21] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on db2059 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2059&var-datasource=codfw%2520prometheus%252Fops
[14:57:40] <wikibugs>	 (03PS5) 10Herron: Prep to tighten PuppetDB access control - log client certificate details [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[14:57:46] <wikibugs>	 (03PS6) 10Herron: Prep to tighten PuppetDB access control - log client certificate details [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[15:02:22] <icinga-wm>	 RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy
[15:02:30] <logmsgbot>	 !log akosiaris@deploy1001 Finished deploy [proton/deploy@97ec4bf]: (no justification provided) (duration: 35m 33s)
[15:02:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:32] <icinga-wm>	 RECOVERY - Check systemd state on restbase-dev1006 is OK: OK - running: The system is fully operational
[15:13:08] <wikibugs>	 10Operations, 10ops-eqiad: ms-be1036 in power off status, not responsive to power on commands - https://phabricator.wikimedia.org/T196873#4271185 (10Cmjohnson) ms-be1036 will no power back manually either. I tried pulling the PSU"s out, waiting several minutes and all I get is a flashing green light on the pow...
[15:18:35] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: Evaluate Grafana's LDAP group options and deprecate grafana-admin if possible - https://phabricator.wikimedia.org/T170150#4272537 (10akosiaris) All of my tests went fine. Scheduling this for Wednesday June 27th. I 'll send an email to wikitech-l as well
[15:22:01] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Power supply issue on maps1002 - https://phabricator.wikimedia.org/T196897#4272551 (10Cmjohnson) I checked the power cable, no issue, removed the PSU and re-inserted. Plugged power cable back in. A green light appeared for a second and then went dark again.  This AHS so t...
[15:25:01] <icinga-wm>	 PROBLEM - Host mw1230 is DOWN: PING CRITICAL - Packet loss = 100%
[15:25:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881#4271440 (10Cmjohnson) @joe mw1230 disks replaced, needs reinstall
[15:28:31] <icinga-wm>	 RECOVERY - Host mw1230 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms
[15:29:32] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:30:52] <icinga-wm>	 PROBLEM - Disk space on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:30:52] <icinga-wm>	 PROBLEM - HHVM processes on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:02] <icinga-wm>	 PROBLEM - nutcracker port on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:12] <icinga-wm>	 PROBLEM - HHVM rendering on mw1230 is CRITICAL: connect to address 10.64.48.65 and port 80: Connection refused
[15:31:14] <marostegui>	 !log Set offline disk 32:1 on db1065 - T196806
[15:31:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:19] <stashbot>	 T196806: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806
[15:31:21] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:22] <icinga-wm>	 PROBLEM - mcrouter process on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:22] <icinga-wm>	 PROBLEM - DPKG on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:31] <icinga-wm>	 PROBLEM - configured eth on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:41] <icinga-wm>	 PROBLEM - dhclient process on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:41] <icinga-wm>	 PROBLEM - nutcracker process on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:41] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1230 is CRITICAL: connect to address 10.64.48.65 and port 443: Connection refused
[15:31:42] <icinga-wm>	 PROBLEM - Apache HTTP on mw1230 is CRITICAL: connect to address 10.64.48.65 and port 80: Connection refused
[15:31:42] <icinga-wm>	 PROBLEM - MD RAID on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:51] <icinga-wm>	 PROBLEM - Check systemd state on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:51] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:31:52] <marostegui>	 !log Set offline disk 32:3 on db1063 - T196806
[15:31:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:01] <icinga-wm>	 PROBLEM - puppet last run on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:33:35] <logmsgbot>	 !log akosiaris@deploy1001 Started deploy [proton/deploy@97ec4bf]: (no justification provided)
[15:33:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:56] <elukey>	 what happened to poor mw1230? 
[15:33:57] <logmsgbot>	 !log akosiaris@deploy1001 Finished deploy [proton/deploy@97ec4bf]: (no justification provided) (duration: 00m 22s)
[15:34:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:02] <icinga-wm>	 PROBLEM - puppet last run on proton1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[proton/deploy]
[15:34:02] <icinga-wm>	 RECOVERY - puppet last run on proton1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[15:34:37] <wikibugs>	 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4272582 (10Ottomata) @Vgutierrez from what I can tell: the only blocker to removing IPSec is deploying a new version of librdkafka with your pat...
[15:35:27] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1063 - https://phabricator.wikimedia.org/T196804#4272583 (10Marostegui) Disk replaced by @Cmjohnson and RAID rebuilding: ``` root@db1063:~# megacli -PDRbld -ShowProg -PhysDrv [32:3] -aALL  Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 2% in 1 Min...
[15:35:41] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1003 - https://phabricator.wikimedia.org/T196757#4272585 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson cmjohnson@labstore1003:~$ sudo /usr/local/lib/nagios/plugins/check_raid megacli OK: optimal, 5 logical, 34 physical OK
[15:36:12] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:37:18] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439612
[15:38:44] <no_justification>	 Can someone drop me from the moderators list on this channel? Don't need it :)
[15:38:59] <wikibugs>	 (03PS3) 10Ema: reload-vcl: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/439578
[15:39:11] <icinga-wm>	 RECOVERY - puppet last run on proton1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:39:21] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651#4272603 (10Cmjohnson)
[15:39:32] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806#4272605 (10Marostegui) Disk replaced by @Cmjohnson and now rebuilding: ``` root@db1065:~# megacli -PDRbld -ShowProg -PhysDrv [32:1] -aALL  Rebuild Progress on Device at Enclosure 32, Slot 1 Completed 1% in 1 Minu...
[15:40:49] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1034 - https://phabricator.wikimedia.org/T195569#4272615 (10Cmjohnson) I need an update to this task. If we do not need the new disk I can send it back.   Thanks!
[15:40:57] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439612 (owner: 10Marostegui)
[15:41:22] <icinga-wm>	 PROBLEM - puppet last run on proton1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[proton/deploy]
[15:42:30] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439612 (owner: 10Marostegui)
[15:42:45] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439612 (owner: 10Marostegui)
[15:43:33] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1084 after alter table (duration: 00m 50s)
[15:43:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:56] <logmsgbot>	 !log akosiaris@deploy1001 Started deploy [proton/deploy@97ec4bf]: (no justification provided)
[15:43:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:44:12] <icinga-wm>	 PROBLEM - Host kafka-jumbo1005 is DOWN: PING CRITICAL - Packet loss = 100%
[15:44:28] <logmsgbot>	 !log akosiaris@deploy1001 Finished deploy [proton/deploy@97ec4bf]: (no justification provided) (duration: 00m 33s)
[15:44:32] <icinga-wm>	 RECOVERY - Host kafka-jumbo1005 is UP: PING WARNING - Packet loss = 86%, RTA = 0.25 ms
[15:44:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:44:42] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy
[15:45:12] <wikibugs>	 (03CR) 10Ema: [C: 032] reload-vcl: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/439578 (owner: 10Ema)
[15:46:42] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on mw1230 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mw1230&var-datasource=eqiad%2520prometheus%252Fops
[15:49:11] <icinga-wm>	 PROBLEM - Host kafka-jumbo1005 is DOWN: PING CRITICAL - Packet loss = 100%
[15:50:20] <wikibugs>	 10Operations, 10Release-Engineering-Team: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4272661 (10herron)
[15:50:30] <wikibugs>	 10Operations, 10Release-Engineering-Team: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4272671 (10herron) What does the phabricator outbound mail config look like today?   Do we already have both mx1001 and mx2001 configured as outbound mail servers?
[15:50:42] <icinga-wm>	 RECOVERY - Host kafka-jumbo1005 is UP: PING WARNING - Packet loss = 93%, RTA = 0.21 ms
[15:51:11] <icinga-wm>	 PROBLEM - IPMI Sensor Status on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:51:32] <wikibugs>	 (03PS1) 10Volans: debmonitor: enforce LDAP TLS cipher suite [puppet] - 10https://gerrit.wikimedia.org/r/439616 (https://phabricator.wikimedia.org/T191299)
[15:52:31] <icinga-wm>	 PROBLEM - Host kafka-jumbo1005 is DOWN: PING CRITICAL - Packet loss = 100%
[15:53:04] <wikibugs>	 (03CR) 10Volans: "reply inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[15:53:29] <wikibugs>	 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4272695 (10Vgutierrez) I think we can do it :).  BTW, right now we are enforcing AES ciphersuites in our TLS connections, and we are lucky that...
[15:53:31] <icinga-wm>	 PROBLEM - MegaRAID on db1063 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[15:53:32] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1063 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T196918
[15:53:33] <icinga-wm>	 RECOVERY - Host kafka-jumbo1005 is UP: PING WARNING - Packet loss = 28%, RTA = 0.96 ms
[15:53:36] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T196918#4272696 (10ops-monitoring-bot)
[15:54:12] <icinga-wm>	 PROBLEM - MegaRAID on db1065 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[15:54:13] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1065 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T196919
[15:54:17] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196919#4272701 (10ops-monitoring-bot)
[15:54:39] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T196918#4272707 (10Marostegui)
[15:54:41] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1063 - https://phabricator.wikimedia.org/T196804#4272710 (10Marostegui)
[15:54:51] <icinga-wm>	 PROBLEM - Host kafka-jumbo1005 is DOWN: PING CRITICAL - Packet loss = 100%
[15:55:13] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196919#4272712 (10Marostegui)
[15:55:16] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806#4272715 (10Marostegui)
[15:55:42] <icinga-wm>	 PROBLEM - IPsec on cp3047 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:42] <icinga-wm>	 PROBLEM - IPsec on cp3037 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:42] <icinga-wm>	 PROBLEM - IPsec on cp3042 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:42] <icinga-wm>	 PROBLEM - IPsec on cp3038 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:42] <icinga-wm>	 PROBLEM - IPsec on cp4024 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:51] <icinga-wm>	 PROBLEM - IPsec on cp5007 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:51] <icinga-wm>	 PROBLEM - IPsec on cp4022 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:51] <icinga-wm>	 PROBLEM - IPsec on cp4026 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:51] <icinga-wm>	 PROBLEM - IPsec on cp3030 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:55:51] <icinga-wm>	 PROBLEM - IPsec on cp3032 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[15:56:02] <elukey>	 spam incoming.. checking kafka-jumbo1005
[15:56:11] <icinga-wm>	 PROBLEM - Host mw1230 is DOWN: PING CRITICAL - Packet loss = 100%
[15:56:31] <icinga-wm>	 RECOVERY - Host mw1230 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[15:57:32] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1230 is CRITICAL: Return code of 255 is out of bounds
[15:57:36] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881#4271440 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts: ``` mw1230.eqiad.wmnet ``` The log can be found in `/var/log/wmf-auto-reimage/20180611155...
[15:58:01] <icinga-wm>	 PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1006 is CRITICAL: 63 ge 10 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1006
[15:58:12] <elukey>	 mmmm eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
[15:58:15] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881#4272728 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1230.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['mw1230.eqiad.wmnet'] ```
[15:58:34] <wikibugs>	 (03CR) 10Vgutierrez: [C: 031] "Awesome! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/439616 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[15:59:41] <icinga-wm>	 PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1002 is CRITICAL: 17 ge 10 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1002
[15:59:45] <wikibugs>	 10Operations, 10ops-codfw, 10Traffic: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560#4272735 (10Papaul)
[15:59:51] <icinga-wm>	 PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:59:52] <elukey>	 XioNoX: you there by any chance?
[16:00:31] <XioNoX>	 elukey: ish, about to board a flight, what's up?
[16:00:44] <logmsgbot>	 !log akosiaris@deploy1001 Started deploy [proton/deploy@97ec4bf]: (no justification provided)
[16:00:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:00] <elukey>	 XioNoX: nothing then, nevermind :)
[16:01:02] <icinga-wm>	 PROBLEM - IPsec on cp2011 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:09] <wikibugs>	 10Operations: Add email queueing/failover to services currently using mail_smarthost[0] - https://phabricator.wikimedia.org/T196920#4272739 (10herron)
[16:01:12] <icinga-wm>	 PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:12] <icinga-wm>	 PROBLEM - IPsec on cp4023 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:21] <icinga-wm>	 PROBLEM - IPsec on cp2022 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:21] <icinga-wm>	 PROBLEM - IPsec on cp4032 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:21] <icinga-wm>	 PROBLEM - IPsec on cp4031 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:21] <icinga-wm>	 PROBLEM - IPsec on cp4025 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:22] <icinga-wm>	 PROBLEM - IPsec on cp3008 is CRITICAL: Strongswan CRITICAL - ok: 26 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:31] <icinga-wm>	 PROBLEM - IPsec on cp2005 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:31] <icinga-wm>	 PROBLEM - IPsec on cp2008 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:31] <icinga-wm>	 PROBLEM - IPsec on cp2006 is CRITICAL: Strongswan CRITICAL - ok: 24 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:32] <icinga-wm>	 PROBLEM - IPsec on cp3040 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:32] <icinga-wm>	 PROBLEM - IPsec on cp3031 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:32] <icinga-wm>	 PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1003 is CRITICAL: 36 ge 10 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003
[16:01:32] <icinga-wm>	 PROBLEM - IPsec on cp2025 is CRITICAL: Strongswan CRITICAL - ok: 24 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:41] <icinga-wm>	 PROBLEM - IPsec on cp4029 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:42] <icinga-wm>	 PROBLEM - IPsec on cp3034 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:42] <icinga-wm>	 PROBLEM - IPsec on cp3007 is CRITICAL: Strongswan CRITICAL - ok: 26 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:42] <icinga-wm>	 PROBLEM - IPsec on cp5009 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:42] <icinga-wm>	 PROBLEM - IPsec on cp3049 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:42] <icinga-wm>	 PROBLEM - IPsec on cp3039 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:42] <icinga-wm>	 PROBLEM - IPsec on cp3046 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:42] <XioNoX>	 elukey: I see a lot of critical, let ne know if I can help with the time I have
[16:01:43] <icinga-wm>	 PROBLEM - IPsec on cp3033 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:43] <icinga-wm>	 PROBLEM - IPsec on cp4021 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:44] <icinga-wm>	 PROBLEM - IPsec on cp5012 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:44] <icinga-wm>	 PROBLEM - IPsec on cp5001 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:51] <icinga-wm>	 PROBLEM - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:51] <icinga-wm>	 PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:51] <icinga-wm>	 PROBLEM - IPsec on cp3010 is CRITICAL: Strongswan CRITICAL - ok: 26 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:52] <icinga-wm>	 PROBLEM - IPsec on cp3044 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:52] <icinga-wm>	 PROBLEM - IPsec on cp4030 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:52] <icinga-wm>	 PROBLEM - IPsec on cp3036 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:52] <icinga-wm>	 PROBLEM - IPsec on cp5008 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:53] <icinga-wm>	 PROBLEM - IPsec on cp5011 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:53] <icinga-wm>	 PROBLEM - IPsec on cp5005 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:54] <icinga-wm>	 PROBLEM - IPsec on cp3048 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:54] <icinga-wm>	 PROBLEM - IPsec on cp3045 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:01:55] <icinga-wm>	 PROBLEM - IPsec on cp2002 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:01] <icinga-wm>	 PROBLEM - IPsec on cp3043 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:01] <icinga-wm>	 PROBLEM - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:01] <icinga-wm>	 PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:01] <icinga-wm>	 PROBLEM - IPsec on cp3035 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:01] <icinga-wm>	 PROBLEM - IPsec on cp3041 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:02] <icinga-wm>	 PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:02] <icinga-wm>	 PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:03] <elukey>	 XioNoX: so eno1 on kafka-jumbo1005 is listed by ip addr as DOWN
[16:02:03] <icinga-wm>	 PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:03] <icinga-wm>	 PROBLEM - IPsec on cp4028 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6
[16:02:17] <logmsgbot>	 !log akosiaris@deploy1001 Finished deploy [proton/deploy@97ec4bf]: (no justification provided) (duration: 01m 32s)
[16:02:20] <elukey>	 I am in the console and logged as root, the host works
[16:02:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:42] <icinga-wm>	 PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1001 is CRITICAL: 67 ge 10 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1001
[16:02:44] <elukey>	 XioNoX: it seems like the link is down
[16:03:38] <elukey>	 I am checking https://librenms.wikimedia.org/device/device=162/tab=port/port=16428/ but I don't see anything weird
[16:06:32] <icinga-wm>	 RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational
[16:07:00] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: keystone: use install_options to install from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/439589 (https://phabricator.wikimedia.org/T196633)
[16:07:07] <XioNoX>	 elukey: died 14min ago
[16:07:30] <elukey>	 yeah but is it on the switch side ? 
[16:07:31] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on db1063 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1063&var-datasource=eqiad%2520prometheus%252Fops
[16:07:59] <XioNoX>	 cmjohnson: can you replace the SFP-T on asw2-c-eqiad:ge-4/0/37 ? and verify with elukey why the interface is down? I'm about to board a flight
[16:08:48] <wikibugs>	 10Operations, 10cloud-services-team, 10Patch-For-Review: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4272775 (10aborrero) >>! In T196659#4272176, @Andrew wrote: > @aborrero doesn't pinning work if we pin the keystone package and all depend...
[16:10:02] <XioNoX>	 elukey: dunno, can be on either side, but the SFP-T would be my fist guess. Logs shows the link flapping many many times before going down
[16:10:53] <elukey>	 yeah saw it in the logs
[16:10:59] <elukey>	 I don't see anything weird on the host
[16:11:06] <wikibugs>	 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4272798 (10Vgutierrez) @Ottomata also I'm currently reviewing the TLS implementation on Kafka side, so far so good.
[16:22:21] <icinga-wm>	 RECOVERY - puppet last run on proton1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:29:11] <wikibugs>	 10Operations, 10Mail, 10Phabricator, 10Release-Engineering-Team: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4272884 (10Aklapper) https://phabricator.wikimedia.org/config/all lists for `phpmailer.smtp-host` the value `mx1001.wikimedia.org;mx2001.wik...
[16:35:46] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: enforce LDAP TLS cipher suite [puppet] - 10https://gerrit.wikimedia.org/r/439616 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[16:39:13] <wikibugs>	 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Someday): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#4272917 (10Jdlrobson) 😋😃
[16:41:09] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806#4272925 (10Marostegui) The disk finished its rebuilt, but unfortunately has lots of errors and SMART alert too, so we need a new one :(  ``` Predictive Failure Count: 1 Last Predictive Failure Event Seq Number: 6...
[16:42:03] <marostegui>	 !log Set disk 32:1 offline on db1065 to get a new one - T196806
[16:42:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:09] <stashbot>	 T196806: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806
[16:44:07] <wikibugs>	 (03PS5) 1020after4: Configuration for phabricator to use swift storage. [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085)
[16:44:16] <Hauskatze>	 marostegui: the new disk is also faulty?
[16:44:23] <marostegui>	 Yep
[16:44:37] <Hauskatze>	 :S
[16:44:53] <marostegui>	 It was an used one, so it is not strange
[16:45:02] <Hauskatze>	 ah, so no 'new'
[16:45:08] <marostegui>	 indeed :)
[16:45:10] <wikibugs>	 10Operations, 10ops-codfw: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4272930 (10Papaul)
[16:45:22] <wikibugs>	 (03CR) 1020after4: [C: 031] Configuration for phabricator to use swift storage. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4)
[16:45:49] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1063 - https://phabricator.wikimedia.org/T196804#4272932 (10Marostegui) 05Open>03Resolved All looking good! ``` Drive has flagged a S.M.A.R.T alert : No Drive has flagged a S.M.A.R.T alert : No Drive has flagged a S.M.A.R.T alert : No Drive has flagged a S....
[16:49:12] <icinga-wm>	 PROBLEM - Check systemd state on restbase-dev1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[16:51:38] <benjamin94>	 hello
[16:51:57] <benjamin94>	 i wish to make a complaint against tony balloni'
[16:53:44] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4272969 (10Dzahn) a:03Dzahn has been approved in SRE meeting
[16:53:55] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: replace bad disk in db2059 - https://phabricator.wikimedia.org/T196709#4272971 (10Marostegui) 05Open>03Resolved All went good! ```       logicaldrive 1 (3.3 TB, RAID 1+0, OK)        physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)       physicaldrive 1I:1:2 (p...
[16:54:08] <wikibugs>	 10Operations, 10ops-codfw: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4272977 (10Papaul)
[16:54:11] <icinga-wm>	 RECOVERY - MegaRAID on db1063 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[16:55:11] <wikibugs>	 (03PS1) 10Herron: add thcipriani and hashar to group gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702)
[16:55:13] <wikibugs>	 (03PS1) 10Herron: add thcipriani to group phabricator-roots [puppet] - 10https://gerrit.wikimedia.org/r/439626 (https://phabricator.wikimedia.org/T196703)
[16:58:00] <wikibugs>	 (03PS1) 10Dzahn: admins: add hashar and thcipriani to gerrit-roots [puppet] - 10https://gerrit.wikimedia.org/r/439627 (https://phabricator.wikimedia.org/T196702)
[16:58:22] <mutante>	 herron: ah :) duplicate, heh
[16:58:52] <wikibugs>	 (03Abandoned) 10Dzahn: admins: add hashar and thcipriani to gerrit-roots [puppet] - 10https://gerrit.wikimedia.org/r/439627 (https://phabricator.wikimedia.org/T196702) (owner: 10Dzahn)
[16:59:00] <herron>	 whoops!  
[16:59:13] <wikibugs>	 (03CR) 10Dzahn: [C: 031] add thcipriani and hashar to group gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) (owner: 10Herron)
[16:59:33] <wikibugs>	 (03CR) 10Dzahn: [C: 031] add thcipriani to group phabricator-roots [puppet] - 10https://gerrit.wikimedia.org/r/439626 (https://phabricator.wikimedia.org/T196703) (owner: 10Herron)
[16:59:57] <wikibugs>	 (03CR) 10Paladox: add thcipriani and hashar to group gerrit-root (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) (owner: 10Herron)
[17:00:04] <jouncebot>	 gehel: Your horoscope predicts another unfortunate Wikidata Query Service weekly deploy deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1700).
[17:00:17] <gehel>	 jouncebot: o/
[17:00:59] <akosiaris>	 !log ganeti2008 reboot for microcode update
[17:01:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:55] <mutante>	 herron: paladox is right, probably should be removed from "Gerrit-admins"
[17:02:01] <icinga-wm>	 PROBLEM - Host ganeti2008 is DOWN: PING CRITICAL - Packet loss = 100%
[17:02:02] <mutante>	 since gerrit-roots is more
[17:02:39] <herron>	 ok that works
[17:02:51] <icinga-wm>	 RECOVERY - Host ganeti2008 is UP: PING OK - Packet loss = 0%, RTA = 36.16 ms
[17:03:15] <wikibugs>	 (03PS1) 10Paladox: Gerrit: Cache groups [puppet] - 10https://gerrit.wikimedia.org/r/439628
[17:03:51] <icinga-wm>	 RECOVERY - Check systemd state on restbase-dev1006 is OK: OK - running: The system is fully operational
[17:05:33] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261509 (10herron) Hi @Jdforrester-WMF this was approved during todays SRE meeting pending manager signoff.  Could...
[17:05:46] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4273031 (10herron)
[17:06:08] <wikibugs>	 (03Abandoned) 10Paladox: Gerrit: Cache groups [puppet] - 10https://gerrit.wikimedia.org/r/439628 (owner: 10Paladox)
[17:11:10] <wikibugs>	 (03PS2) 10Herron: add thcipriani and hashar to group gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702)
[17:11:33] <wikibugs>	 10Operations, 10cloud-services-team, 10Patch-For-Review: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4273038 (10aborrero) So, @Andrew questions had me wondering what was happening here. So I investigated a bit further, specially because `j...
[17:11:49] <wikibugs>	 (03CR) 10Paladox: [C: 031] "😊" [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) (owner: 10Herron)
[17:12:22] <wikibugs>	 (03CR) 10Herron: [C: 032] add thcipriani and hashar to group gerrit-root (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) (owner: 10Herron)
[17:12:27] <wikibugs>	 (03PS3) 10Herron: add thcipriani and hashar to group gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702)
[17:13:18] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4266006 (10herron) This was approved in the SRE meeting.  Moving forward with the patch now.
[17:15:33] <wikibugs>	 (03PS2) 10Herron: add thcipriani to group phabricator-roots [puppet] - 10https://gerrit.wikimedia.org/r/439626 (https://phabricator.wikimedia.org/T196703)
[17:15:51] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani to phabricator-roots - https://phabricator.wikimedia.org/T196703#4266022 (10herron) This was approved in the SRE meeting.  Moving forward with the patch now.
[17:16:26] <wikibugs>	 (03CR) 10Herron: [C: 032] add thcipriani to group phabricator-roots [puppet] - 10https://gerrit.wikimedia.org/r/439626 (https://phabricator.wikimedia.org/T196703) (owner: 10Herron)
[17:16:55] <logmsgbot>	 !log gehel@deploy1001 Started deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater (wdqs1009 only)
[17:16:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:17:21] <logmsgbot>	 !log gehel@deploy1001 Finished deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater (wdqs1009 only) (duration: 00m 26s)
[17:17:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:31] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[17:19:25] <XioNoX>	 elukey: flight delayed a bit, but still on my phone. any luck with that interface?
[17:19:36] <logmsgbot>	 !log gehel@deploy1001 Started deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater
[17:19:40] <logmsgbot>	 !log gehel@deploy1001 Finished deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater (duration: 00m 03s)
[17:19:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:19:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:09] <logmsgbot>	 !log gehel@deploy1001 Started deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater
[17:20:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:51] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[17:22:12] <elukey>	 XioNoX: handed over to ottomata, Chris is going to check soon IIUC 
[17:25:32] <twentyafterfour>	 !log Phabricator: deploying hotfix (D1067) refs T196840 T196860 T196855
[17:25:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:40] <stashbot>	 T196860: Ignore refs/changes/**/**/meta - https://phabricator.wikimedia.org/T196860
[17:25:40] <stashbot>	 D1067: Ignore refs/changes/**/**/meta - https://phabricator.wikimedia.org/D1067
[17:25:41] <stashbot>	 T196855: Diffusion commits stuck in 'Importing...' status for too long - https://phabricator.wikimedia.org/T196855
[17:25:41] <stashbot>	 T196840: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840
[17:27:48] <twentyafterfour>	 !log phabricator: restarting phd for D1067
[17:27:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:24] <wikibugs>	 10Operations, 10ops-codfw, 10Traffic: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560#4273095 (10Papaul)
[17:32:14] <paladox>	 twentyafterfour i guess when it ignores refs it deletes them?
[17:32:16] <paladox>	 https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1072&var-port=9104&panelId=2&fullscreen&from=now-7d&to=now
[17:32:21] <paladox>	 deletes look to have gone up
[17:33:48] <logmsgbot>	 !log gehel@deploy1001 Finished deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater (duration: 13m 38s)
[17:33:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:34:53] <gehel>	 SMalyshev: ^deploy completed, intermittent failures on the orderby and paris checks
[17:35:18] <SMalyshev>	 gehel: what kind of failures? timeouts or something else?
[17:35:32] <gehel>	 SMalyshev: checking right now...
[17:35:43] <gehel>	 but looks like timeout
[17:36:33] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4273119 (10herron) 05Open>03Resolved Access granted!  ``` cobalt:~$ id hashar uid=1010(hashar) gid=500(wikidev) groups=500(w...
[17:38:34] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4266006 (10thcipriani) Awesome, thanks @Dzahn, access looks good to me!
[17:39:54] <gehel>	 SMalyshev: yep, timeout
[17:40:13] <SMalyshev>	 gehel: ok, I guess we need to check that the queries won't be too complex for testing
[17:40:25] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani to phabricator-roots - https://phabricator.wikimedia.org/T196703#4273125 (10herron) 05Open>03Resolved a:03herron All set!  ``` phab1001:~$ id thcipriani uid=11634(thcipriani) gid=500(wikidev)...
[17:42:55] <twentyafterfour>	 paladox: I don't know
[17:43:00] <paladox>	 ok
[17:44:25] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4266006 (10hashar) Works for me as well. Thank you @herron and @Dzahn
[17:50:58] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4273176 (10Jdforrester-WMF) >>! In T196566#4273029, @herron wrote: > Hi @Jdforrester-WMF this was approved during...
[17:51:06] <twentyafterfour>	 !log phabricator: rebuilding git parent caches 
[17:51:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:02] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4273195 (10mmodell) I'm going to stop phd and attempt to clear out the backlog from the queue (it's a lot of useless updates that we don't need to write to the db ultimately)
[17:59:29] <twentyafterfour>	 !log phabricator: taking phd offline while I clear out the queue backlog (downtime is logged in icinga) see T196840
[17:59:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:34] <stashbot>	 T196840: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840
[17:59:51] <wikibugs>	 (03CR) 10Imarlier: [C: 031] Add "memcached-mcrouter" to $wgObjectCaches as default for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436252 (owner: 10Aaron Schulz)
[18:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Morning SWAT (Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1800).
[18:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[18:08:43] <wikibugs>	 (03PS2) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300)
[18:08:45] <wikibugs>	 (03PS1) 10Volans: debmonitor: finetune Icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/439640 (https://phabricator.wikimedia.org/T191299)
[18:08:47] <wikibugs>	 (03PS1) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300)
[18:09:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[18:11:33] <wikibugs>	 (03CR) 10Volans: "The failure is because of:" [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[18:15:15] <urandom>	 !log convert timeline indices to time-windowed compaction - T196024
[18:15:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:15:20] <stashbot>	 T196024: Convert timeline keyspaces (indices) to time-windowed compaction - https://phabricator.wikimedia.org/T196024
[18:16:10] <wikibugs>	 (03CR) 10Volans: [C: 032] "Merging to fix failing checks. If you have any comments let me know and I'll fix them tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/439640 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[18:19:56] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4005170 (10Andrew) I don't mind having to manually fix some puppetmasters, although it would be nice to do them al...
[18:20:06] <wikibugs>	 10Operations, 10cloud-services-team, 10Patch-For-Review: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4273233 (10Andrew) @aborrero, thanks for investigating.  I'm sure that that the existing client_pinning file isn't complete, and that maki...
[18:26:11] <wikibugs>	 (03PS3) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300)
[18:26:13] <wikibugs>	 (03PS2) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300)
[18:26:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[18:39:59] <wikibugs>	 (03PS1) 10Paladox: phabricator: Make phd.taskmasters configurable with hiera [puppet] - 10https://gerrit.wikimedia.org/r/439645
[18:41:36] <wikibugs>	 (03PS2) 10Paladox: phabricator: Make phd.taskmasters configurable with hiera [puppet] - 10https://gerrit.wikimedia.org/r/439645
[18:46:05] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261509 (10Tnegrin) approved
[18:49:55] <logmsgbot>	 !log pnorman@deploy1001 Started deploy [tilerator/deploy@074d01a] (cleartables): Restore full config
[18:49:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:12] <logmsgbot>	 !log pnorman@deploy1001 Finished deploy [tilerator/deploy@074d01a] (cleartables): Restore full config (duration: 00m 16s)
[18:50:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:50] <wikibugs>	 (03CR) 10Bearloga: statistics::discovery: re-enable cron job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga)
[18:56:04] <icinga-wm>	 RECOVERY - IPsec on cp2006 is OK: Strongswan OK - 26 ESP OK
[18:56:04] <icinga-wm>	 RECOVERY - IPsec on cp5007 is OK: Strongswan OK - 44 ESP OK
[18:56:04] <icinga-wm>	 RECOVERY - IPsec on cp4027 is OK: Strongswan OK - 44 ESP OK
[18:56:04] <icinga-wm>	 RECOVERY - IPsec on cp2008 is OK: Strongswan OK - 80 ESP OK
[18:56:04] <icinga-wm>	 RECOVERY - IPsec on cp2025 is OK: Strongswan OK - 26 ESP OK
[18:56:14] <icinga-wm>	 RECOVERY - Host kafka-jumbo1005 is UP: PING WARNING - Packet loss = 50%, RTA = 3.77 ms
[18:56:14] <icinga-wm>	 RECOVERY - IPsec on cp4023 is OK: Strongswan OK - 54 ESP OK
[18:56:15] <icinga-wm>	 RECOVERY - IPsec on cp2020 is OK: Strongswan OK - 80 ESP OK
[18:56:15] <icinga-wm>	 RECOVERY - IPsec on cp5012 is OK: Strongswan OK - 44 ESP OK
[18:56:15] <icinga-wm>	 RECOVERY - IPsec on cp5001 is OK: Strongswan OK - 54 ESP OK
[18:56:15] <icinga-wm>	 RECOVERY - IPsec on cp4032 is OK: Strongswan OK - 44 ESP OK
[18:56:15] <icinga-wm>	 RECOVERY - IPsec on cp4025 is OK: Strongswan OK - 54 ESP OK
[18:56:15] <icinga-wm>	 RECOVERY - IPsec on cp4031 is OK: Strongswan OK - 44 ESP OK
[18:56:24] <icinga-wm>	 RECOVERY - IPsec on cp3049 is OK: Strongswan OK - 54 ESP OK
[18:56:24] <icinga-wm>	 RECOVERY - IPsec on cp3039 is OK: Strongswan OK - 54 ESP OK
[18:56:24] <icinga-wm>	 RECOVERY - IPsec on cp5009 is OK: Strongswan OK - 44 ESP OK
[18:56:24] <icinga-wm>	 RECOVERY - IPsec on cp2007 is OK: Strongswan OK - 68 ESP OK
[18:56:24] <icinga-wm>	 RECOVERY - IPsec on cp2026 is OK: Strongswan OK - 80 ESP OK
[18:56:24] <icinga-wm>	 RECOVERY - IPsec on cp4030 is OK: Strongswan OK - 44 ESP OK
[18:56:25] <icinga-wm>	 RECOVERY - IPsec on cp4022 is OK: Strongswan OK - 54 ESP OK
[18:56:25] <icinga-wm>	 RECOVERY - IPsec on cp4024 is OK: Strongswan OK - 54 ESP OK
[18:56:26] <icinga-wm>	 RECOVERY - IPsec on cp4026 is OK: Strongswan OK - 54 ESP OK
[18:56:26] <icinga-wm>	 RECOVERY - IPsec on cp2012 is OK: Strongswan OK - 26 ESP OK
[18:56:27] <icinga-wm>	 RECOVERY - IPsec on cp2014 is OK: Strongswan OK - 80 ESP OK
[18:56:27] <icinga-wm>	 RECOVERY - IPsec on cp2011 is OK: Strongswan OK - 80 ESP OK
[18:56:34] <icinga-wm>	 RECOVERY - IPsec on cp3030 is OK: Strongswan OK - 44 ESP OK
[18:56:34] <icinga-wm>	 RECOVERY - IPsec on cp3008 is OK: Strongswan OK - 28 ESP OK
[18:56:34] <icinga-wm>	 RECOVERY - IPsec on cp3037 is OK: Strongswan OK - 54 ESP OK
[18:56:34] <icinga-wm>	 RECOVERY - IPsec on cp3007 is OK: Strongswan OK - 28 ESP OK
[18:56:34] <icinga-wm>	 RECOVERY - IPsec on cp2001 is OK: Strongswan OK - 68 ESP OK
[18:56:34] <icinga-wm>	 RECOVERY - IPsec on cp2002 is OK: Strongswan OK - 80 ESP OK
[18:56:34] <icinga-wm>	 RECOVERY - IPsec on cp2004 is OK: Strongswan OK - 68 ESP OK
[18:56:35] <icinga-wm>	 RECOVERY - IPsec on cp3047 is OK: Strongswan OK - 54 ESP OK
[18:56:51] <wikibugs>	 10Operations, 10Mail, 10Phabricator, 10Release-Engineering-Team: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4272661 (10Reedy) >>! In T196916#4272671, @herron wrote: > What does the phabricator outbound mail config look like today?  >  > Do we alrea...
[18:56:54] <icinga-wm>	 RECOVERY - IPsec on cp4021 is OK: Strongswan OK - 54 ESP OK
[18:56:54] <icinga-wm>	 RECOVERY - IPsec on cp3010 is OK: Strongswan OK - 28 ESP OK
[18:56:54] <icinga-wm>	 RECOVERY - IPsec on cp3044 is OK: Strongswan OK - 54 ESP OK
[18:56:54] <icinga-wm>	 RECOVERY - IPsec on cp3036 is OK: Strongswan OK - 54 ESP OK
[18:56:55] <icinga-wm>	 RECOVERY - IPsec on cp5003 is OK: Strongswan OK - 54 ESP OK
[18:56:55] <icinga-wm>	 RECOVERY - IPsec on cp2018 is OK: Strongswan OK - 26 ESP OK
[18:56:55] <icinga-wm>	 RECOVERY - IPsec on cp5010 is OK: Strongswan OK - 44 ESP OK
[18:56:55] <icinga-wm>	 RECOVERY - IPsec on cp5004 is OK: Strongswan OK - 54 ESP OK
[18:56:56] <icinga-wm>	 RECOVERY - IPsec on cp2022 is OK: Strongswan OK - 80 ESP OK
[18:56:56] <icinga-wm>	 RECOVERY - IPsec on cp3048 is OK: Strongswan OK - 54 ESP OK
[18:56:57] <icinga-wm>	 RECOVERY - IPsec on cp3035 is OK: Strongswan OK - 54 ESP OK
[18:56:57] <icinga-wm>	 RECOVERY - IPsec on cp3045 is OK: Strongswan OK - 54 ESP OK
[18:56:58] <icinga-wm>	 RECOVERY - IPsec on cp3043 is OK: Strongswan OK - 44 ESP OK
[18:56:58] <icinga-wm>	 RECOVERY - IPsec on cp5002 is OK: Strongswan OK - 54 ESP OK
[18:57:04] <icinga-wm>	 RECOVERY - IPsec on cp3034 is OK: Strongswan OK - 54 ESP OK
[18:57:04] <icinga-wm>	 RECOVERY - IPsec on cp3033 is OK: Strongswan OK - 44 ESP OK
[18:57:04] <icinga-wm>	 RECOVERY - IPsec on cp3041 is OK: Strongswan OK - 44 ESP OK
[18:57:04] <icinga-wm>	 RECOVERY - IPsec on cp4028 is OK: Strongswan OK - 44 ESP OK
[18:57:05] <icinga-wm>	 RECOVERY - IPsec on cp2005 is OK: Strongswan OK - 80 ESP OK
[19:01:15] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1005 is CRITICAL: 3.476e+07 ge 5e+06 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1005
[19:01:39] <chasemp>	 ottomata: ^
[19:04:34] <wikibugs>	 (03PS1) 10Imarlier: Remove /xhprof from performance.wikimedia.org apache config [puppet] - 10https://gerrit.wikimedia.org/r/439647 (https://phabricator.wikimedia.org/T196406)
[19:08:19] <ottomata>	 chasemp:  aye
[19:08:29] <ottomata>	 just got a NIC fixed
[19:08:33] <ottomata>	 it is catching back up now
[19:14:08] <wikibugs>	 (03CR) 10Ottomata: statistics::discovery: re-enable cron job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga)
[19:14:25] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received
[19:14:44] <wikibugs>	 (03PS1) 10Imarlier: Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837)
[19:14:46] <wikibugs>	 (03PS4) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300)
[19:14:48] <wikibugs>	 (03PS3) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300)
[19:15:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier)
[19:15:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[19:16:35] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[19:20:34] <icinga-wm>	 RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1003 is OK: (C)10 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003
[19:20:55] <icinga-wm>	 RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1002 is OK: (C)10 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1002
[19:23:45] <icinga-wm>	 RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1001 is OK: (C)10 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1001
[19:26:55] <icinga-wm>	 RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1006 is OK: (C)10 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1006
[19:29:31] <logmsgbot>	 !log aaron@deploy1001 Synchronized php-1.32.0-wmf.7/includes/libs/rdbms/ChronologyProtector.php: 11e596776f940 - add some logging details (duration: 00m 53s)
[19:29:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:13] <wikibugs>	 (03PS2) 10Imarlier: Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837)
[19:31:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier)
[19:35:51] <wikibugs>	 (03CR) 10Reedy: "require_package ?" [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier)
[19:43:25] <wikibugs>	 10Operations, 10netops: Rack/setup cr2-eqdfw - https://phabricator.wikimedia.org/T196941#4273504 (10Papaul) p:05Triage>03Normal
[19:43:33] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4273516 (10Jgreen) a:05Jgreen>03None
[19:43:36] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frdata1001 - https://phabricator.wikimedia.org/T187364#4273522 (10Jgreen) a:05Jgreen>03None
[19:44:54] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: adjust NAT mapping for frdata.wikimedia.org - https://phabricator.wikimedia.org/T196656#4273538 (10Jgreen)
[19:44:57] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frdata1001 - https://phabricator.wikimedia.org/T187364#3973195 (10Jgreen)
[19:46:23] <wikibugs>	 (03CR) 10Alex Monk: "yep" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk)
[19:51:04] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka-jumbo1005 is OK: (C)5e+06 ge (W)1e+06 ge 9.743e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1005
[19:51:38] <wikibugs>	 10Operations, 10Wikimedia-Incident: Add email queueing/failover to services currently using mail_smarthost[0] - https://phabricator.wikimedia.org/T196920#4273568 (10herron)
[19:52:08] <wikibugs>	 10Operations, 10Mail, 10Phabricator, 10Release-Engineering-Team, 10Wikimedia-Incident: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4273569 (10herron)
[19:52:47] <wikibugs>	 (03PS3) 10Imarlier: Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837)
[19:53:05] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors per minute on cp3032 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1
[19:56:28] <ottomata>	 !log bouncing varnishkafka on cp3032
[19:56:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:57:34] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors per minute on cp3032 is OK: OK: Less than 80.00% above the threshold [0.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1
[19:57:35] <icinga-wm>	 PROBLEM - nutcracker process on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:57:35] <icinga-wm>	 PROBLEM - dhclient process on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:57:45] <icinga-wm>	 PROBLEM - MD RAID on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:57:54] <icinga-wm>	 PROBLEM - Check systemd state on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:57:54] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:58:07] <wikibugs>	 (03CR) 10Krinkle: [C: 031] Remove /xhprof from performance.wikimedia.org apache config [puppet] - 10https://gerrit.wikimedia.org/r/439647 (https://phabricator.wikimedia.org/T196406) (owner: 10Imarlier)
[19:58:14] <icinga-wm>	 PROBLEM - Disk space on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:58:14] <icinga-wm>	 PROBLEM - HHVM processes on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:58:15] <wikibugs>	 (03PS1) 10Ottomata: Switch evenstreams to main kafka in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/439653 (https://phabricator.wikimedia.org/T185225)
[19:58:24] <icinga-wm>	 PROBLEM - nutcracker port on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:58:25] <icinga-wm>	 PROBLEM - DPKG on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:58:25] <icinga-wm>	 PROBLEM - mcrouter process on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:58:34] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:58:35] <icinga-wm>	 PROBLEM - configured eth on mw1230 is CRITICAL: Return code of 255 is out of bounds
[19:59:18] <wikibugs>	 (03CR) 10Ottomata: [C: 032] Switch evenstreams to main kafka in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/439653 (https://phabricator.wikimedia.org/T185225) (owner: 10Ottomata)
[19:59:24] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors per minute on cp3047 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1
[19:59:38] <ottomata>	 hm
[20:00:04] <jouncebot>	 cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor I � Unicode. All rise for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T2000).
[20:00:13] <ottomata>	 not sure exactly, but i'm going to bounce the few vks with this proble
[20:00:15] <ottomata>	 m
[20:00:17] <ottomata>	 its only a few of them
[20:01:33] <bearND>	 nothing for mobileapps today
[20:01:35] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors per minute on cp3047 is OK: OK: Less than 80.00% above the threshold [0.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1
[20:01:55] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors per minute on cp3039 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1
[20:02:20] <ottomata>	 !log bouncing varnishkafka-webrequest on cp3039,cp3047,cp2007,cp3010
[20:02:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:04:04] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on db1065 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1065&var-datasource=eqiad%2520prometheus%252Fops
[20:04:14] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors per minute on cp3039 is OK: OK: Less than 80.00% above the threshold [0.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1
[20:05:25] <wikibugs>	 (03PS5) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300)
[20:05:27] <wikibugs>	 (03PS4) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300)
[20:06:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[20:06:45] <icinga-wm>	 RECOVERY - MegaRAID on db1065 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[20:09:33] <wikibugs>	 10Operations, 10ops-codfw, 10Traffic, 10netops: switch port configuration for lvs200[7-10] - https://phabricator.wikimedia.org/T196946#4273636 (10Papaul) p:05Triage>03Normal
[20:17:47] <logmsgbot>	 !log otto@deploy1001 Started deploy [eventstreams/deploy@6b013f9]: Enable composite stream and timestamp since param - T196009 , T187418
[20:17:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:52] <stashbot>	 T196009: Support timestamp based consumption in KafkaSSE and EventStreams - https://phabricator.wikimedia.org/T196009
[20:17:53] <stashbot>	 T187418: Enable multiple topics in EventStreams URL - https://phabricator.wikimedia.org/T187418
[20:19:53] <wikibugs>	 (03CR) 10Volans: "Quick compiler results available here:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[20:20:52] <halfak>	 Looks like awight won't be around for the window today and I'm not prepared.  ORES will have something for tomorrow's window
[20:21:05] <halfak>	 Bosnian, Basque, and Serbian -- Oh my!
[20:21:22] <halfak>	 You get an AI, and you get an AI.  EVERYONE GETS AN AI. 
[20:25:36] <Platonides>	 then the AI gets you
[20:25:48] <apergos>	 in soviet wmf...
[20:27:38] <logmsgbot>	 !log otto@deploy1001 Finished deploy [eventstreams/deploy@6b013f9]: Enable composite stream and timestamp since param - T196009 , T187418 (duration: 09m 52s)
[20:27:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:27:44] <stashbot>	 T196009: Support timestamp based consumption in KafkaSSE and EventStreams - https://phabricator.wikimedia.org/T196009
[20:27:44] <stashbot>	 T187418: Enable multiple topics in EventStreams URL - https://phabricator.wikimedia.org/T187418
[20:28:05] <wikibugs>	 (03PS5) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300)
[20:29:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "A few comments inline." (032 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha)
[20:29:44] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Move deployment-prep redis instances to stretch - https://phabricator.wikimedia.org/T179371#4273745 (10Krenair) Alright. Leaving open pending deletion of the old redis hosts in a few weeks...
[20:32:25] <logmsgbot>	 !log arlolra@deploy1001 Started deploy [parsoid/deploy@97cdab8]: Updating Parsoid to 06b74d2
[20:32:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:34:38] <wikibugs>	 (03CR) 10Volans: "Quick compiler results:" [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[20:39:15] <icinga-wm>	 PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[20:49:34] <logmsgbot>	 !log arlolra@deploy1001 Finished deploy [parsoid/deploy@97cdab8]: Updating Parsoid to 06b74d2 (duration: 17m 09s)
[20:49:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:51:04] <wikibugs>	 (03CR) 10Alex Monk: "So it turns out this line has been unused since I9c39889a" [puppet] - 10https://gerrit.wikimedia.org/r/436431 (https://phabricator.wikimedia.org/T184244) (owner: 10Alex Monk)
[20:57:56] <arlolra>	 !log Updated Parsoid to 06b74d2 (T191843)
[20:58:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:58:02] <stashbot>	 T191843: Cannot read property 'push' of undefined - https://phabricator.wikimedia.org/T191843
[21:00:04] <jouncebot>	 bawolff and Reedy: #bothumor I � Unicode. All rise for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T2100).
[21:02:35] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200)
[21:02:36] <wikibugs>	 (03PS1) 10Ottomata: Set eventstreams max_connections to 25 per varnish instance [puppet] - 10https://gerrit.wikimedia.org/r/439772 (https://phabricator.wikimedia.org/T196553)
[21:03:22] <wikibugs>	 (03CR) 10Ottomata: "Ema, is this the right place?  I wasn't sure if this should be set here, or on the frontend stream.wikimedia.org  instance?" [puppet] - 10https://gerrit.wikimedia.org/r/439772 (https://phabricator.wikimedia.org/T196553) (owner: 10Ottomata)
[21:03:45] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[21:05:44] <icinga-wm>	 RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational
[21:17:14] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 503 (expecting: 200)
[21:19:25] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 503 (expecting: 200)
[21:20:25] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy
[21:20:34] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[21:21:43] <wikibugs>	 (03PS1) 10Alex Monk: Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774
[21:22:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk)
[21:23:19] <wikibugs>	 (03PS2) 10Alex Monk: Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774
[21:23:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk)
[21:25:09] <wikibugs>	 (03PS3) 10Bearloga: statistics::discovery: re-enable cron job [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494)
[21:26:04] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200)
[21:27:14] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[21:27:14] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200)
[21:28:24] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy
[21:30:22] <wikibugs>	 10Operations, 10Mail, 10Patch-For-Review: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361#4273978 (10Krenair) Related change to your standard::mail::sender changes above: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/439774/  >>! In T175361#4137331, @herron wrote: > Also...
[21:32:24] <awight>	 Reedy: Is the security window quiescent?  I was going to do a minor ORES update.
[21:32:29] <AaronSchulz>	 _joe_: hey. Can you give a quick sanity check to https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/436252/ ?
[21:34:42] <_joe_>	 AaronSchulz: this will use mcrouter but still not use prefixes for setting/deleting keys, right?
[21:35:14] <_joe_>	 because I think we need to change the routing handler for broadcast sets soon-ish
[21:35:39] <wikibugs>	 (03CR) 10Alex Monk: "modules/standard/manifests/mail/sender.pp:2 wmf-style: Found hiera call in class 'standard::mail::sender' for 'standard::mail::sender::rou" [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk)
[21:38:02] <AaronSchulz>	 _joe_: hmm, I can add the mcrouterAware flag there
[21:38:03] <wikibugs>	 (03CR) 10Alex Monk: "Going after that one in I19a28579" [puppet] - 10https://gerrit.wikimedia.org/r/436431 (https://phabricator.wikimedia.org/T184244) (owner: 10Alex Monk)
[21:38:19] <_joe_>	 AaronSchulz: no, don't for now
[21:38:28] <_joe_>	 we can do it in a second pass
[21:38:33] <AaronSchulz>	 right
[21:39:02] <wikibugs>	 (03PS2) 10Alex Monk: Followup If545182a: Actually use cert_name now [puppet] - 10https://gerrit.wikimedia.org/r/439451 (https://phabricator.wikimedia.org/T184244)
[21:39:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] Add "memcached-mcrouter" to $wgObjectCaches as default for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436252 (owner: 10Aaron Schulz)
[21:43:17] <logmsgbot>	 !log awight@deploy1001 Started deploy [ores/deploy@6ee8775]: ORES: bswiki, euwiki, srwiki models
[21:43:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:43:46] <halfak>	 \o/
[21:43:59] <halfak>	 Oh wait.  that should go to beta first :| 
[21:44:07] * halfak sends it to beta asap :) 
[21:44:13] <awight>	 hargh
[21:44:18] <awight>	 ok well this is just canary for now
[21:44:18] <awight>	 ty
[21:44:36] <halfak>	 Should have specified that my tests were local :) 
[21:44:36] <awight>	 Mostly, I'm testing scap :)
[21:44:47] <awight>	 naw it was obvious, my fault
[21:45:05] <awight>	 midnight deployments might turn out to not be my thing.
[21:46:36] <halfak>	 Always midnight somewhere
[21:47:35] <awight>	 hehe
[21:47:46] <awight>	 halfak: ok we're live on ores1001, looking at the machine now
[21:48:17] <halfak>	 kk.  Just about to go to beta
[21:48:25] * halfak crosses fingers and toes
[21:50:43] <awight>	 \o/ LFS success
[21:50:48] <awight>	 happy scappy
[21:51:34] <awight>	 workers are healthy.
[21:52:08] <awight>	 halfak: If beta is good, I'll put this on the rest of the cluster.
[21:52:41] <halfak>	 still waiting on https://ores-beta.wmflabs.org/
[21:52:47] <halfak>	 Aha!  Alive
[21:53:09] <halfak>	 https://ores-beta.wmflabs.org/v3/scores/euwiki/345678
[21:53:13] <halfak>	 It's ALIVE
[21:53:18] <halfak>	 OK awight. Looks good
[21:53:44] <awight>	 +1, I ran through the 3 new models and they're all functional
[21:53:46] <awight>	 kk continuing
[21:56:29] <greg-g>	 yay for git-lfs
[22:01:34] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.53 seconds
[22:01:44] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 317.54 seconds
[22:17:15] <logmsgbot>	 !log awight@deploy1001 Finished deploy [ores/deploy@6ee8775]: ORES: bswiki, euwiki, srwiki models (duration: 33m 58s)
[22:17:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:21:32] <halfak>	 awight, confirmed that all looks good 
[22:21:51] <awight>	 +1, thanks!
[22:37:16] <wikibugs>	 (03CR) 10Paladox: "This is being done next monday :)" [puppet] - 10https://gerrit.wikimedia.org/r/439444 (https://phabricator.wikimedia.org/T196812) (owner: 10Paladox)
[22:48:54] <wikibugs>	 (03PS4) 10Paladox: Add gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835)
[22:49:03] <wikibugs>	 (03PS5) 10Paladox: Add gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835)
[22:53:24] <icinga-wm>	 PROBLEM - Long running screen/tmux on mw1230 is CRITICAL: Return code of 255 is out of bounds
[23:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T2300).
[23:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[23:01:15] <wikibugs>	 10Operations, 10ops-codfw, 10Traffic, 10netops: switch port configuration for lvs200[7-10] - https://phabricator.wikimedia.org/T196946#4274128 (10Papaul) a:05Papaul>03None
[23:01:23] <wikibugs>	 (03PS1) 10Paladox: Gerrit: Add support for adding additional domains to alias in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783
[23:02:30] <twentyafterfour>	 !log phabricator: restarting apache2 on phab1001 to free up apache workers
[23:02:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:03:27] <paladox>	 ah
[23:03:27] <paladox>	 https://phabricator.wikimedia.org/diffusion/
[23:03:29] <paladox>	 finally
[23:03:31] <paladox>	 twentyafterfour ^^
[23:03:44] <paladox>	 it's showing refs as in the full name of the commit now
[23:04:11] <paladox>	 it parsed https://phabricator.wikimedia.org/rGERRITDEPLOY94e8165abf7907965d9133d469c08054ad4a15d3 at least really quick just did that one
[23:05:40] <ebernhardson>	 i hadn't put my SWAT patch up but will now and can deploy it, should i wait for phab work to be done?
[23:06:20] <paladox>	 ebernhardson is it a gerrit patch or a phabricator patch?
[23:06:30] <wikibugs>	 (03PS2) 10EBernhardson: Promote MLR models from AB test to prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430797 (https://phabricator.wikimedia.org/T187148)
[23:06:41] <twentyafterfour>	 ebernhardson: go ahead, phab work is ongoing and should not interfere with what you are doing
[23:06:54] <ebernhardson>	 ok perfect, thanks!
[23:07:22] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430797 (https://phabricator.wikimedia.org/T187148) (owner: 10EBernhardson)
[23:09:05] <wikibugs>	 (03Merged) 10jenkins-bot: Promote MLR models from AB test to prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430797 (https://phabricator.wikimedia.org/T187148) (owner: 10EBernhardson)
[23:09:20] <wikibugs>	 (03CR) 10jenkins-bot: Promote MLR models from AB test to prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430797 (https://phabricator.wikimedia.org/T187148) (owner: 10EBernhardson)
[23:10:22] <wikibugs>	 (03PS3) 10EBernhardson: Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180)
[23:18:06] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Promote Cirrus MLR models from AB test to prod (duration: 00m 51s)
[23:18:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:18:54] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received
[23:19:55] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy
[23:20:04] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received
[23:22:05] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[23:22:30] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson)
[23:23:56] <wikibugs>	 (03Merged) 10jenkins-bot: Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson)
[23:26:11] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: Tune CirrusSearch slow logging (duration: 00m 48s)
[23:26:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:28:00] <wikibugs>	 (03CR) 10jenkins-bot: Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson)
[23:28:16] <wikibugs>	 (03CR) 10Greg Grossmeier: [C: 031] "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox)
[23:31:35] <wikibugs>	 (03PS1) 10Papaul: DNS: Add mgmt and productionn DNS entries for bast2002 [dns] - 10https://gerrit.wikimedia.org/r/439786 (https://phabricator.wikimedia.org/T196665)
[23:31:42] <wikibugs>	 (03CR) 10EBernhardson: Lower CirrusSearch delayed job drop to 2 hours (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson)
[23:32:13] <wikibugs>	 (03PS2) 10EBernhardson: Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233
[23:32:25] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson)
[23:33:23] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4274184 (10Papaul)
[23:33:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson)
[23:34:45] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 49.30 seconds
[23:34:55] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 30.55 seconds
[23:35:48] <wikibugs>	 10Operations, 10ops-codfw, 10netops: switch port configuration for bast2002 - https://phabricator.wikimedia.org/T196957#4274185 (10Papaul) p:05Triage>03Normal
[23:37:48] <wikibugs>	 (03CR) 10Legoktm: [C: 04-1] Gerrit: Add CoC and privacy policy to footer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox)
[23:38:31] <wikibugs>	 (03PS5) 10Paladox: Gerrit: Add CoC and privacy policy to footer [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835)
[23:38:42] <wikibugs>	 (03CR) 10Paladox: Gerrit: Add CoC and privacy policy to footer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox)
[23:39:02] <wikibugs>	 (03PS6) 10Paladox: Add gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835)
[23:39:34] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4274201 (10Papaul)
[23:40:52] <wikibugs>	 (03PS3) 10EBernhardson: Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233
[23:41:28] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: Mails through deployment-mx SPF & DKIM fails - https://phabricator.wikimedia.org/T87338#4274206 (10Krenair) Alright so I had some run-ins with Designate while trying to do DKIM (turns out you can't use a 2048 bit RSA key because that puts your public key over a leng...
[23:42:02] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson)
[23:43:39] <wikibugs>	 (03Merged) 10jenkins-bot: Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson)
[23:45:33] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized wmf-config/CirrusSearch-common.php: SWAT: Lower CirrusSearch delayed job drop timeout (duration: 00m 50s)
[23:45:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:46:25] <wikibugs>	 (03PS1) 10Alex Monk: exim: Permit DKIM domain to be changed by hiera [puppet] - 10https://gerrit.wikimedia.org/r/439791 (https://phabricator.wikimedia.org/T87338)
[23:48:14] <wikibugs>	 (03CR) 10jenkins-bot: Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson)
[23:49:47] <wikibugs>	 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4274212 (10Jrbranaa) >>! In T187194#4271773, @danstillman wrote: > Not sure what you're planning, but the initial vers...
[23:49:49] <wikibugs>	 (03PS1) 10Papaul: DHCP: Add MAC address for bast2002 [puppet] - 10https://gerrit.wikimedia.org/r/439792 (https://phabricator.wikimedia.org/T196665)
[23:51:25] <wikibugs>	 (03PS2) 10Alex Monk: exim: Permit DKIM domain to be changed by hiera [puppet] - 10https://gerrit.wikimedia.org/r/439791 (https://phabricator.wikimedia.org/T87338)
[23:51:41] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4274215 (10Papaul)
[23:52:12] <wikibugs>	 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4274216 (10Jrbranaa) We're looking to have Audiences->Contributors->Editing be the Code Stewards for this moving forwa...
[23:59:25] <wikibugs>	 (03CR) 10Alex Monk: "should be good to go" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436430 (https://phabricator.wikimedia.org/T184244) (owner: 10Alex Monk)