[00:05:58] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1966 bytes in 0.096 second response time [00:11:07] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.079 second response time [01:48:57] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [01:52:17] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:32:26] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.7) (duration: 13m 06s) [02:32:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:38:17] PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={create_container,image_status,podsandbox_status,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [02:39:18] RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [02:42:42] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Mon Jun 11 02:42:42 UTC 2018 (duration 10m 16s) [02:42:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:25:57] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 694.21 seconds [03:43:54] (03CR) 10Chelsyx: [C: 031] statistics::discovery: re-enable cron job [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [04:09:48] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 294.64 seconds [04:17:14] (03CR) 10Dzahn: "ack, the one that has been deleted was actually "deployment-deploy1001" not deployment-tin. corrrecting that. That deployment-deploy-01 di" [puppet] - 10https://gerrit.wikimedia.org/r/438001 (https://phabricator.wikimedia.org/T192071) (owner: 10Dzahn) [04:29:35] (03CR) 10Dzahn: "mariadb grants have been merged. the cron job will appear once we switch phab1002 to the active server in Hiera" [puppet] - 10https://gerrit.wikimedia.org/r/437558 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn) [05:10:28] (03PS6) 10Dzahn: analytics_cluster::webserver: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/416742 [05:15:57] !log Deploy schema change on s6 primary master (db1061) - T191316 T192926 T195193 [05:16:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:16:04] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [05:16:04] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [05:16:04] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [05:16:50] (03PS7) 10Dzahn: analytics_cluster::webserver: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/416742 [05:17:33] !log Stop MySQL and reboot pc2005 for intel-microcode update and final HW check - T196339 [05:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:17:40] T196339: pc2005 down - https://phabricator.wikimedia.org/T196339 [05:19:10] (03CR) 10Dzahn: [C: 031] "found the missing part. it was turnilo. now this compiles fines and should be noop on thorium: http://puppet-compiler.wmflabs.org/11435/th" [puppet] - 10https://gerrit.wikimedia.org/r/416742 (owner: 10Dzahn) [05:19:48] (03PS1) 10Marostegui: Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523 [05:19:56] (03PS2) 10Marostegui: Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523 [05:22:11] (03CR) 10Dzahn: [C: 031] "no parked domains are in the list of email domains at modules/role/files/exim/wikimedia_domains" [dns] - 10https://gerrit.wikimedia.org/r/429874 (https://phabricator.wikimedia.org/T193408) (owner: 10Dzahn) [05:22:17] (03PS1) 10Marostegui: realm.pp: Add id_internalwikimedia as private wiki [puppet] - 10https://gerrit.wikimedia.org/r/439524 (https://phabricator.wikimedia.org/T196748) [05:22:37] RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 16.84 seconds [05:22:48] RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 14.71 seconds [05:23:49] (03CR) 10Marostegui: [C: 032] realm.pp: Add id_internalwikimedia as private wiki [puppet] - 10https://gerrit.wikimedia.org/r/439524 (https://phabricator.wikimedia.org/T196748) (owner: 10Marostegui) [05:25:12] !log Restart mysql on codfw sanitariums (db2094, db2095) to pick up new replication filters - T196748 [05:25:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:25:17] T196748: Prepare and check storage layer for id_internalwikimedia - https://phabricator.wikimedia.org/T196748 [05:25:37] (03CR) 10Marostegui: [C: 032] Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523 (owner: 10Marostegui) [05:27:08] (03Merged) 10jenkins-bot: Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523 (owner: 10Marostegui) [05:28:22] (03CR) 10jenkins-bot: Revert "mariadb: Depool pc2005, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439523 (owner: 10Marostegui) [05:28:24] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool pc2005 - T196339 (duration: 00m 52s) [05:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:28:29] T196339: pc2005 down - https://phabricator.wikimedia.org/T196339 [05:30:13] (03PS5) 10Dzahn: Add id-internal.wikimedia.org for Wikimedia Indonesia [dns] - 10https://gerrit.wikimedia.org/r/438275 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm) [05:30:51] (03PS6) 10Dzahn: Add id-internal.wikimedia.org for Wikimedia Indonesia [dns] - 10https://gerrit.wikimedia.org/r/438275 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm) [05:32:34] !log Restart mysql on codfw sanitariums (db1095, db1102, db1124, db1125) to pick up new replication filters - T196748 [05:32:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:32:39] T196748: Prepare and check storage layer for id_internalwikimedia - https://phabricator.wikimedia.org/T196748 [05:33:08] (03CR) 10Dzahn: [C: 032] Add id-internal.wikimedia.org for Wikimedia Indonesia [dns] - 10https://gerrit.wikimedia.org/r/438275 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm) [05:35:10] (03PS6) 10Dzahn: id_internalwikimedia: add Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/438276 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm) [05:36:01] (03PS7) 10Dzahn: mediawiki/apache: add id-internal.wikimedia.org server alias [puppet] - 10https://gerrit.wikimedia.org/r/438276 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm) [05:40:58] (03CR) 10Dzahn: [C: 032] mediawiki/apache: add id-internal.wikimedia.org server alias [puppet] - 10https://gerrit.wikimedia.org/r/438276 (https://phabricator.wikimedia.org/T196747) (owner: 10Urbanecm) [05:45:50] (03CR) 10Dzahn: "yep, this one works MUCH faster:" [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper) [05:46:54] (03PS5) 10Dzahn: phabricator: List new and recent assignees [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper) [05:48:47] (03CR) 10Dzahn: [C: 032] phabricator: List new and recent assignees [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper) [05:51:24] (03CR) 10Dzahn: [C: 04-2] "no more php5 on deployment hosts now" [puppet] - 10https://gerrit.wikimedia.org/r/391045 (owner: 10Hoo man) [05:53:46] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4270902 (10Marostegui) 05Open>03Resolved a:03Marostegui Everything has been fine for more than a week now (including the... [05:55:14] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4270909 (10Marostegui) [05:56:03] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4245083 (10Marostegui) [05:58:37] PROBLEM - MariaDB Slave Lag: s6 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.62 seconds [05:58:38] PROBLEM - MariaDB Slave Lag: s6 on db2046 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.37 seconds [05:58:47] PROBLEM - MariaDB Slave Lag: s6 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.78 seconds [05:58:48] PROBLEM - MariaDB Slave Lag: s6 on db2067 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 315.42 seconds [05:58:48] PROBLEM - MariaDB Slave Lag: s6 on db2039 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 315.76 seconds [05:59:08] PROBLEM - MariaDB Slave Lag: s6 on db2076 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 323.51 seconds [05:59:08] PROBLEM - MariaDB Slave Lag: s6 on db2089 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 323.57 seconds [05:59:17] PROBLEM - MariaDB Slave Lag: s6 on db2053 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 327.09 seconds [05:59:18] PROBLEM - MariaDB Slave Lag: s6 on db2060 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 328.91 seconds [06:03:22] ^ that is anomi.e's script hitting s6 [06:10:40] !log Stop replication on db2095 to update triggers - T192926 [06:10:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:10:45] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [06:14:53] !log Deploy schema change on s4 codfw master (db2051) this will generate lag on codfw - T191316 T192926 T89737 T195193 [06:14:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:15:00] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [06:15:00] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [06:15:00] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [06:19:04] 10Operations, 10Dumps-Generation, 10Wikimedia-log-errors: High rate of "Memcached error .. CONNECTION FAILURE" on snapshot hosts - https://phabricator.wikimedia.org/T196303#4270934 (10ArielGlenn) 05Open>03Resolved These messages have disappeared from logstash after the deployment of the BagOStuff fixes.... [06:29:57] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/profile.d/field.sh],File[/etc/ssl/localcerts/api.svc.eqiad.wmnet.crt] [06:30:34] (03CR) 10Elukey: analytics_cluster::webserver: apache -> httpd module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/416742 (owner: 10Dzahn) [06:37:17] PROBLEM - Host ms-be1036 is DOWN: PING CRITICAL - Packet loss = 100% [06:38:06] * elukey waves to marostegui doing alter tables [06:38:33] elukey o/ !! [06:39:45] !log restart pdfrender on scb1002 [06:39:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:57] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.007 second response time [06:41:24] checking ms-be's console [06:42:31] console frozen, can't get a tty [06:44:44] ah nice " The server is not powered on. The Virtual Serial Port is not available." [06:45:19] I checked sal but didn't find anything, re-checking [06:55:08] didn't find anything, also no iLO system/console logs for this event [06:55:18] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:58:11] 10Operations, 10ops-eqiad: mw1280: CPU error - https://phabricator.wikimedia.org/T195734#4270962 (10MoritzMuehlenhoff) 05Open>03Resolved No new errors have been logged in SEL and the server appears stable, closing the task. [07:00:41] so before powering it up I'll wait for another human being to check, monday morning and lack of enough caffeine might be a bad compromise :) [07:02:20] elukey: having a second look at ms-be1036 [07:03:53] thanks! [07:07:18] nothing logged indeed, the last system event is from May [07:08:25] it seems a brutal power off, somebody tripped in the rack's cabling? :P [07:13:58] (03CR) 10Ayounsi: [C: 031] "A brief restart of Netbox anytime is fine." [puppet] - 10https://gerrit.wikimedia.org/r/438219 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [07:22:40] 10Operations, 10fundraising-tech-ops, 10netops: adjust NAT mapping for frdata.wikimedia.org - https://phabricator.wikimedia.org/T196656#4270975 (10ayounsi) a:03ayounsi This needs to be pushed for the NAT change: ```lang=diff [edit security nat static rule-set static-nat rule public-reporting then static-na... [07:26:09] (03PS1) 10Addshore: Switch from 5 mins to 10 mins for wikidata dispatch check [puppet] - 10https://gerrit.wikimedia.org/r/439528 (https://phabricator.wikimedia.org/T194602) [07:27:57] (03PS1) 10Elukey: profile::geowiki: remove unused/old crons [puppet] - 10https://gerrit.wikimedia.org/r/439529 [07:28:27] (03PS2) 10Addshore: Switch from 5 mins to 10 mins for wikidata dispatch check [puppet] - 10https://gerrit.wikimedia.org/r/439528 (https://phabricator.wikimedia.org/T194602) [07:29:18] !log installing openjdk-7 security updates [07:29:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:58] !log Deploy schema change on dbstore1002:s4 T191316 T192926 T89737 T195193 [07:37:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:37:06] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [07:37:06] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [07:37:06] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [07:37:06] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [07:42:32] (03PS1) 10Marostegui: mariadb: Promote db1066 to master [puppet] - 10https://gerrit.wikimedia.org/r/439530 (https://phabricator.wikimedia.org/T194870) [07:52:21] !log installing gnupg security updates [07:52:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:43] (03PS1) 10Marostegui: db-eqiad.php: Set s2 as read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439531 (https://phabricator.wikimedia.org/T194870) [07:56:04] (03CR) 10Marostegui: [C: 04-2] "Compiler looks good: https://puppet-compiler.wmflabs.org/compiler02/11436/" [puppet] - 10https://gerrit.wikimedia.org/r/439530 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui) [07:57:12] (03CR) 10Marostegui: [C: 04-2] "Do not merge until the day of the failover" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439531 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui) [08:00:43] (03PS1) 10Marostegui: db-eqiad.php: Promote db1066 to master and remove read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439532 (https://phabricator.wikimedia.org/T194870) [08:01:23] (03CR) 10Marostegui: [C: 04-2] "Do not merge until the failover date" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439532 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui) [08:05:19] (03PS1) 10Marostegui: wmnet: Update s2-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/439533 (https://phabricator.wikimedia.org/T194870) [08:06:27] (03CR) 10Marostegui: [C: 04-2] "Do not submit till the day of the failover" [dns] - 10https://gerrit.wikimedia.org/r/439533 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui) [08:06:51] (03CR) 10Dzahn: [C: 04-1] "@Paladox let's just add 2 crons in puppet, one to delete old files and one to gzip all files?" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [08:07:49] (03PS1) 10Marostegui: s2.hosts: db1066 is now s2 primary master [software] - 10https://gerrit.wikimedia.org/r/439534 (https://phabricator.wikimedia.org/T194870) [08:08:01] (03CR) 10Marostegui: [C: 04-2] "Do not submit till the failover day" [software] - 10https://gerrit.wikimedia.org/r/439534 (https://phabricator.wikimedia.org/T194870) (owner: 10Marostegui) [08:09:06] (03CR) 10Dzahn: "before Gerrit UI changes there should be an announcement on mailing lists with a reminder that this is coming and explanation how to previ" [puppet] - 10https://gerrit.wikimedia.org/r/439444 (https://phabricator.wikimedia.org/T196812) (owner: 10Paladox) [08:14:17] (03CR) 10Dzahn: [C: 04-1] "should not use base::service_unit anymore. nowadays we want systemd::service" [puppet] - 10https://gerrit.wikimedia.org/r/362455 (owner: 10Paladox) [08:15:26] (03CR) 10Dzahn: [C: 04-1] servermon: Add gunicorn.service systemd script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/362455 (owner: 10Paladox) [08:21:10] (03CR) 10Dzahn: "re: paladox: on that other change i commented let's just add 2 cron jobs, one to gzip them, instead of the hack to rename files to .gz tha" [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad) [08:22:00] (03CR) 10Ema: "Those hostnames are there for testing purposes only. They don't need to reflect any actual machine name and thus there's no need to update" [puppet] - 10https://gerrit.wikimedia.org/r/437625 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn) [08:23:47] (03CR) 10Dzahn: "Thank you Ema, that kind of confirmation was what i was after with this review :) will abandon and glad to have it checked" [puppet] - 10https://gerrit.wikimedia.org/r/437625 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn) [08:24:19] (03Abandoned) 10Dzahn: mtail: replace phab1001 with phab1002? [puppet] - 10https://gerrit.wikimedia.org/r/437625 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn) [08:25:45] !log installing gnupg1 security updates [08:25:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:32] !log restart elastic1020 to enable G1 GC - T156137 [08:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:36] T156137: Reduce impact of GC pauses on elasticsearch response time - https://phabricator.wikimedia.org/T156137 [08:28:08] (03CR) 10Dzahn: [C: 04-1] "my suggestion: abandon this change, merge change that moves log files to /var/log/, add new change that adds cron job that gzips uncompres" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [08:32:01] !log installing gnupg2 security updates [08:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:31] (03PS1) 10Dzahn: site: include ::base::firewall -> ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439535 [08:38:03] (03CR) 10Dzahn: [C: 032] "comments only" [puppet] - 10https://gerrit.wikimedia.org/r/439535 (owner: 10Dzahn) [08:39:19] (03PS1) 10Marostegui: db1102.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/439536 (https://phabricator.wikimedia.org/T196527) [08:40:00] (03PS2) 10Marostegui: db1102.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/439536 (https://phabricator.wikimedia.org/T196527) [08:40:46] (03CR) 10Marostegui: [C: 032] db1102.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/439536 (https://phabricator.wikimedia.org/T196527) (owner: 10Marostegui) [08:42:23] (03CR) 10Volans: [C: 032] "Got agreement on IRC" [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437968 (owner: 10Volans) [08:42:49] (03Merged) 10jenkins-bot: Bump Gemfile dependencies [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437968 (owner: 10Volans) [08:43:22] (03CR) 10Volans: [C: 032] "Got additional agreement on IRC" [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans) [08:43:32] (03PS1) 10Dzahn: maps-test: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439537 [08:43:37] (03Merged) 10jenkins-bot: Add nginx::snippet define [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans) [08:44:57] (03PS2) 10Dzahn: maps-test: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439537 [08:46:40] (03CR) 10Dzahn: [C: 032] "wmf-style: total violations delta -2" [puppet] - 10https://gerrit.wikimedia.org/r/439537 (owner: 10Dzahn) [08:50:13] 10Operations, 10Traffic, 10Patch-For-Review: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609#4271132 (10ema) [08:51:48] (03PS5) 10Volans: debmonitor: add basic HTTP Icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/436509 (https://phabricator.wikimedia.org/T191299) [08:51:50] (03PS1) 10Volans: nginx: updated git submodule [puppet] - 10https://gerrit.wikimedia.org/r/439539 (https://phabricator.wikimedia.org/T191299) [08:51:52] (03PS1) 10Volans: debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) [08:51:54] (03PS1) 10Volans: debmonitor: use new cache control setting [puppet] - 10https://gerrit.wikimedia.org/r/439541 (https://phabricator.wikimedia.org/T191299) [08:52:04] (03CR) 10Dzahn: [C: 032] "noop on maps-test2001/2" [puppet] - 10https://gerrit.wikimedia.org/r/439537 (owner: 10Dzahn) [08:52:53] (03PS1) 10Gehel: maps: remove "style" parameter [puppet] - 10https://gerrit.wikimedia.org/r/439543 [08:53:33] (03CR) 10jerkins-bot: [V: 04-1] maps: remove "style" parameter [puppet] - 10https://gerrit.wikimedia.org/r/439543 (owner: 10Gehel) [08:54:40] (03CR) 10Vgutierrez: [C: 031] Fine tune security settings [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437954 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [08:54:59] (03PS2) 10Gehel: maps: remove "style" parameter [puppet] - 10https://gerrit.wikimedia.org/r/439543 [08:55:30] (03CR) 10Volans: [C: 032] Fine tune security settings [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437954 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [08:56:07] (03CR) 10Volans: "Compiler result:" [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [08:56:43] (03Merged) 10jenkins-bot: Fine tune security settings [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437954 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [08:58:55] (03PS1) 10Dzahn: deployment_server/package_builder: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439544 [09:02:02] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439545 (https://phabricator.wikimedia.org/T191316) [09:03:53] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439545 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [09:05:18] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439545 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [09:06:31] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1097:3314 for alter table (duration: 00m 52s) [09:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:55] 10Operations, 10ops-eqiad: ms-be1036 in power off status, not responsive to power on commands - https://phabricator.wikimedia.org/T196873#4271185 (10elukey) p:05Triage>03Normal [09:07:21] !log Deploy schema change on db1097:3314 T191316 T192926 T89737 T195193 [09:07:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:28] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [09:07:28] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [09:07:28] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [09:07:28] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [09:08:25] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439545 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [09:09:11] (03PS1) 10Volans: debmonitor: add explicit dependency on libldap [puppet] - 10https://gerrit.wikimedia.org/r/439546 (https://phabricator.wikimedia.org/T191299) [09:10:03] (03CR) 10Ema: [C: 031] nginx: updated git submodule [puppet] - 10https://gerrit.wikimedia.org/r/439539 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [09:10:53] PROBLEM - IPMI Sensor Status on maps1002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Power Supply 2 = Critical, Power Supplies = Critical] [09:15:52] (03PS1) 10Muehlenhoff: Create component/hhvm324 [puppet] - 10https://gerrit.wikimedia.org/r/439548 [09:21:47] (03PS1) 10Ema: reload-vcl: fix get_cmd_output error handling [puppet] - 10https://gerrit.wikimedia.org/r/439550 [09:24:40] (03PS2) 10Vgutierrez: update-ocsp: Actually use --time-offset-end argument [puppet] - 10https://gerrit.wikimedia.org/r/436485 (https://phabricator.wikimedia.org/T163541) [09:24:45] (03CR) 10Muehlenhoff: [C: 031] debmonitor: add explicit dependency on libldap [puppet] - 10https://gerrit.wikimedia.org/r/439546 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [09:28:44] (03PS1) 10Dzahn: network::monitor: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439554 [09:32:54] (03PS1) 10Dzahn: kubernetes: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439556 [09:33:14] (03PS1) 10Vgutierrez: update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) [09:33:16] (03PS1) 10Vgutierrez: update-ocsp: Make pylint happy [puppet] - 10https://gerrit.wikimedia.org/r/439558 (https://phabricator.wikimedia.org/T163541) [09:35:09] (03CR) 10Ema: [C: 031] update-ocsp: Actually use --time-offset-end argument [puppet] - 10https://gerrit.wikimedia.org/r/436485 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez) [09:36:10] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4271243 (10elukey) Hi Rob! So the spare looks great but I am a bit afraid about having "only" 32G for that machine, in fact I was about to ask even more than wha... [09:36:37] (03CR) 10Ema: "Please mention what's wrong with the current code in the commit message." [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez) [09:37:10] (03CR) 10Ema: [C: 031] update-ocsp: Make pylint happy [puppet] - 10https://gerrit.wikimedia.org/r/439558 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez) [09:38:46] (03CR) 10Dzahn: "part of Change-Id I4a30e491f5861aa00" [puppet] - 10https://gerrit.wikimedia.org/r/439556 (owner: 10Dzahn) [09:39:44] PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.16 seconds [09:39:53] PROBLEM - MariaDB Slave Lag: s3 on db2050 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.06 seconds [09:39:54] PROBLEM - MariaDB Slave Lag: s3 on db2036 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.11 seconds [09:39:54] PROBLEM - MariaDB Slave Lag: s3 on db2043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.13 seconds [09:40:14] PROBLEM - MariaDB Slave Lag: s3 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.70 seconds [09:40:24] PROBLEM - MariaDB Slave Lag: s3 on db2074 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.34 seconds [09:40:45] anomi.e ^ script probably [09:40:53] (03PS2) 10Vgutierrez: update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) [09:41:17] yep, it is [09:43:22] (03CR) 10Ema: [C: 031] update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez) [09:48:56] (03CR) 10Vgutierrez: [C: 031] reload-vcl: fix get_cmd_output error handling [puppet] - 10https://gerrit.wikimedia.org/r/439550 (owner: 10Ema) [09:49:43] (03CR) 10Filippo Giunchedi: [C: 031] Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio) [09:50:00] (03CR) 10Filippo Giunchedi: [C: 031] Archive the operations/software/tessera repository [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio) [09:51:16] (03CR) 10Filippo Giunchedi: [C: 032] Add .gitreview file [debs/python-logstash] - 10https://gerrit.wikimedia.org/r/430306 (owner: 10Gilles) [09:51:57] (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for smartd [puppet] - 10https://gerrit.wikimedia.org/r/419769 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [09:55:02] (03PS4) 10Muehlenhoff: Enable base::service_auto_restart for smartd [puppet] - 10https://gerrit.wikimedia.org/r/419769 (https://phabricator.wikimedia.org/T135991) [09:56:29] (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for smartd [puppet] - 10https://gerrit.wikimedia.org/r/419769 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [09:56:40] (03CR) 10Filippo Giunchedi: "Already cherry-picked in beta?" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [09:58:34] (03PS2) 10Volans: Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) [09:58:37] (03PS2) 10Volans: Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) [09:58:39] (03PS1) 10Volans: Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) [09:58:41] (03PS1) 10Volans: Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) [09:58:57] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439562 (https://phabricator.wikimedia.org/T128546) [09:59:49] (03CR) 10jerkins-bot: [V: 04-1] Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [09:59:54] (03CR) 10jerkins-bot: [V: 04-1] Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [09:59:57] (03CR) 10jerkins-bot: [V: 04-1] Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:00:06] (03CR) 10jerkins-bot: [V: 04-1] Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [10:00:21] sorry for the spam, that's me, but because of T196628 ;) [10:00:21] T196628: CI: upgrade tox, currently running 2.6.0 - https://phabricator.wikimedia.org/T196628 [10:01:02] (03CR) 10Filippo Giunchedi: [C: 031] monitoring: Remove unused 'graphite_anomaly' command [puppet] - 10https://gerrit.wikimedia.org/r/437365 (owner: 10Krinkle) [10:01:35] (03CR) 10Vgutierrez: [C: 031] Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:01:37] (03PS2) 10Volans: nginx: updated git submodule [puppet] - 10https://gerrit.wikimedia.org/r/439539 (https://phabricator.wikimedia.org/T191299) [10:01:43] (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/438002 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:01:54] (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for PDNS recursor Prometheus exporters [puppet] - 10https://gerrit.wikimedia.org/r/437949 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:02:25] (03CR) 10Volans: [C: 032] nginx: updated git submodule [puppet] - 10https://gerrit.wikimedia.org/r/439539 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:04:44] RECOVERY - MariaDB Slave Lag: s3 on db2074 is OK: OK slave_sql_lag Replication lag: 3.07 seconds [10:04:48] (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439562 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:04:52] (03CR) 10Filippo Giunchedi: Configuration for phabricator to use swift storage. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4) [10:05:14] RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 0.15 seconds [10:05:23] RECOVERY - MariaDB Slave Lag: s3 on db2050 is OK: OK slave_sql_lag Replication lag: 0.30 seconds [10:05:24] RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [10:05:24] RECOVERY - MariaDB Slave Lag: s3 on db2043 is OK: OK slave_sql_lag Replication lag: 0.20 seconds [10:05:53] (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for uwsgi-netbox [puppet] - 10https://gerrit.wikimedia.org/r/438219 (https://phabricator.wikimedia.org/T135991) [10:06:13] PROBLEM - MariaDB Slave Lag: s7 on db2061 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 426.25 seconds [10:06:27] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439562 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:07:41] (03CR) 10Filippo Giunchedi: [C: 031] Allow removing Diamond gradually [puppet] - 10https://gerrit.wikimedia.org/r/429389 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [10:07:59] (03PS2) 10Volans: debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) [10:08:40] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439562 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:09:04] RECOVERY - MariaDB Slave Lag: s3 on db2094 is OK: OK slave_sql_lag Replication lag: 0.08 seconds [10:10:07] did any of those page? because I got none [10:10:14] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:439562|Bumping portals to master (T128546)]] (duration: 00m 51s) [10:10:16] (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for uwsgi-netbox [puppet] - 10https://gerrit.wikimedia.org/r/438219 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:20] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:10:21] apergos: they don't page [10:10:35] ok good (means phone is not in a weird state), thanks! [10:10:41] :) [10:10:55] godog: is it possible to merge https://gerrit.wikimedia.org/r/#/c/operations/software/tessera/+/439467/ then now that you're okay with the repo content to go? [10:11:05] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:439562|Bumping portals to master (T128546)]] (duration: 00m 50s) [10:11:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:31] Hauskatze: yup, good to merge, I can go ahead if you don't have permissions otherwise feel free [10:12:16] godog: nope, I don't have privs, feel free to merge that and https://gerrit.wikimedia.org/r/#/c/operations/software/tessera/+/439469/ afterwards [10:12:18] thank you [10:13:02] (03CR) 10Filippo Giunchedi: [C: 032] Archive the operations/software/tessera repository [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio) [10:13:06] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Archive the operations/software/tessera repository [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio) [10:13:15] (03CR) 10Filippo Giunchedi: [C: 032] Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio) [10:13:21] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio) [10:13:29] Hauskatze: {{done}} [10:13:41] {{thank you}} --~~~~ [10:13:50] np! thanks for taking care of that [10:15:48] (03Abandoned) 10Filippo Giunchedi: labs: use new redis servers for locks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387570 (https://phabricator.wikimedia.org/T179371) (owner: 10Filippo Giunchedi) [10:15:50] my pleasure [10:16:40] (03Abandoned) 10Filippo Giunchedi: hieradata: add redis stretch deployment-prep instances [puppet] - 10https://gerrit.wikimedia.org/r/386869 (https://phabricator.wikimedia.org/T179371) (owner: 10Filippo Giunchedi) [10:17:09] (03Abandoned) 10Filippo Giunchedi: hieradata: use deployment-redis05 for labs jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/387579 (https://phabricator.wikimedia.org/T179371) (owner: 10Filippo Giunchedi) [10:17:38] ACKNOWLEDGEMENT - puppet last run on phab1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 16 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[phabricator/deployment] daniel_zahn debugging - replacement server not in prod [10:20:37] (03PS1) 10Ema: reload-vcl: fix shell injection [puppet] - 10https://gerrit.wikimedia.org/r/439563 [10:20:58] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/438002 (https://phabricator.wikimedia.org/T135991) [10:23:36] (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/438002 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:25:31] hashar: by any chance do you have an rough ETA for T196628 ? [10:25:32] T196628: CI: upgrade tox, currently running 2.6.0 - https://phabricator.wikimedia.org/T196628 [10:25:34] (03PS1) 10Muehlenhoff: Revert "Enable base::service_auto_restart for mcelog" [puppet] - 10https://gerrit.wikimedia.org/r/439564 [10:25:46] * volans sends cookies to hashar :) [10:25:58] <_joe_> uhm thumbor failing right now on lvs [10:27:12] (03CR) 10Muehlenhoff: [C: 032] Revert "Enable base::service_auto_restart for mcelog" [puppet] - 10https://gerrit.wikimedia.org/r/439564 (owner: 10Muehlenhoff) [10:27:21] <_joe_> it recovered, but this is not the first time it happens [10:28:21] 10Operations, 10ops-eqiad, 10DC-Ops: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881#4271440 (10Joe) [10:29:06] <_joe_> !log depooling permantently mw1230 for disk replacement, T196881 [10:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:11] T196881: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881 [10:29:45] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) [10:30:50] (03CR) 10Vgutierrez: [C: 031] debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:30:59] (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:31:32] (03PS2) 10Ema: reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) [10:31:44] (03PS3) 10Volans: debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) [10:32:01] (03CR) 10jerkins-bot: [V: 04-1] reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) (owner: 10Ema) [10:32:14] (03CR) 10Awight: [C: 031] "Chiming in just to say that this upgrade will be useful to my team ASAP, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/438121 (https://phabricator.wikimedia.org/T196710) (owner: 10Thcipriani) [10:32:32] (03CR) 10Volans: [C: 032] debmonitor: use Nginx snippets [puppet] - 10https://gerrit.wikimedia.org/r/439540 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:33:32] moritzm: there are 2 patches from you on puppet-merge, what should I do? [10:33:41] (03PS3) 10Ema: reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) [10:34:12] (03CR) 10jerkins-bot: [V: 04-1] reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) (owner: 10Ema) [10:34:16] the second one is a revert of the first, when I tried puppet-merge before it quit because there was no diff [10:34:25] so safe to merge if it's now showing the changes [10:34:40] lol, yeah it shows only my diffs [10:34:48] we could open a bug to fix it though [10:35:14] (03PS4) 10Ema: reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) [10:35:44] moritzm: done ;) [10:35:46] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) [10:35:56] volans: don't bother, it's a cornercase which hits us maybe 1-2 times per year... [10:36:12] (03PS2) 10Volans: debmonitor: use new cache control setting [puppet] - 10https://gerrit.wikimedia.org/r/439541 (https://phabricator.wikimedia.org/T191299) [10:36:16] ack [10:37:03] RECOVERY - puppet last run on phab1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:37:33] (03CR) 10Volans: [C: 032] debmonitor: use new cache control setting [puppet] - 10https://gerrit.wikimedia.org/r/439541 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:37:54] (03PS6) 10Volans: debmonitor: add basic HTTP Icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/436509 (https://phabricator.wikimedia.org/T191299) [10:38:06] (03CR) 10Vgutierrez: [C: 031] reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) (owner: 10Ema) [10:39:03] (03CR) 10Volans: [C: 032] debmonitor: add basic HTTP Icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/436509 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:39:13] (03PS2) 10Volans: debmonitor: add explicit dependency on libldap [puppet] - 10https://gerrit.wikimedia.org/r/439546 (https://phabricator.wikimedia.org/T191299) [10:40:02] (03CR) 10Volans: [C: 032] debmonitor: add explicit dependency on libldap [puppet] - 10https://gerrit.wikimedia.org/r/439546 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:42:44] (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:52:44] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439567 [10:52:48] !log phab1002 - editing cached scap config /srv/deployment/phabricator/deployment-cache/.config to replace tin.eqiad with deploy1001.eqiad deployment server, run puppet. other options: run scap with --refresh-config, delet cached .config file (T196019) (T175288) [10:52:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:54] T196019: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019 [10:52:54] T175288: setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288 [10:52:55] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor nitpick, otherwise LGTM" (031 comment) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [10:55:35] 10Operations, 10monitoring: SMART checks fail on wtp1043's sda - https://phabricator.wikimedia.org/T196886#4271553 (10Joe) [10:56:28] ACKNOWLEDGEMENT - Device not healthy -SMART- on wtp1043 is CRITICAL: cluster=parsoid device=sda instance=wtp1043:9100 job=node site=eqiad Giuseppe Lavagetto T196886 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wtp1043&var-datasource=eqiad%2520prometheus%252Fops [10:57:09] (03PS5) 10Ema: reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) [10:57:33] ACKNOWLEDGEMENT - Device not healthy -SMART- on wtp1043 is CRITICAL: cluster=parsoid device=sda instance=wtp1043:9100 job=node site=eqiad Giuseppe Lavagetto T196881 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wtp1043&var-datasource=eqiad%2520prometheus%252Fops [10:57:47] 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271568 (10hashar) We need #operations to fix up permissions on cobalt.wikimedia.org Files under `/srv/gerrit/git/All-Users.git/` bei... [10:57:54] (03CR) 10Ema: [C: 032] reload-vcl: fix shell injection, add .py suffix [puppet] - 10https://gerrit.wikimedia.org/r/439563 (https://phabricator.wikimedia.org/T144169) (owner: 10Ema) [10:58:43] 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271583 (10Paladox) sudo chrown gerrit2:gerrit2 /srv/gerrit/git [11:00:06] jan_drewniak: I, the Bot under the Fountain, allow thee, The Deployer, to do Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1100). [11:03:22] (03CR) 10Alexandros Kosiaris: [C: 031] Bumped django-auth-ldap to v1.6.1 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437956 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:04:04] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner_tls: fix monitoring definitions [puppet] - 10https://gerrit.wikimedia.org/r/439569 [11:04:06] (03CR) 10Alexandros Kosiaris: [C: 031] Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:04:14] PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/varnish/reload-vcl] [11:04:19] (03CR) 10Alexandros Kosiaris: [C: 031] Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:04:24] PROBLEM - MariaDB Slave Lag: s3 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 319.75 seconds [11:04:43] (03Abandoned) 10Ema: reload-vcl: fix get_cmd_output error handling [puppet] - 10https://gerrit.wikimedia.org/r/439550 (owner: 10Ema) [11:05:14] <_joe_> volans: ^^ [11:05:31] (03CR) 10Alexandros Kosiaris: [C: 031] Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [11:05:34] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::jobrunner_tls: fix monitoring definitions [puppet] - 10https://gerrit.wikimedia.org/r/439569 (owner: 10Giuseppe Lavagetto) [11:05:53] PROBLEM - mediawiki-installation DSH group on mw1230 is CRITICAL: Host mw1230 is not in mediawiki-installation dsh group [11:06:13] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/439569 (owner: 10Giuseppe Lavagetto) [11:06:35] 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271595 (10Marostegui) [11:06:38] <_joe_> mw1230 is expected, I'll ack that alert [11:07:40] 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271598 (10Paladox) This https://phabricator.wikimedia.org/D1067 will fix it so no more new notedb refs are cloned. [11:08:12] (03PS1) 10Volans: debmonitor: set TLS cipher suite for LDAP [puppet] - 10https://gerrit.wikimedia.org/r/439571 (https://phabricator.wikimedia.org/T191299) [11:08:16] (03PS3) 10Volans: Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) [11:08:18] (03PS3) 10Volans: Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) [11:08:20] (03PS2) 10Volans: Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) [11:08:22] (03PS2) 10Volans: Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) [11:08:47] 10Operations, 10ops-codfw, 10netops: Switch port configuration for backup2001 - https://phabricator.wikimedia.org/T196782#4268246 (10ayounsi) ```lang=diff [edit interfaces interface-range vlan-private1-d-codfw] member ge-3/0/10 { ... } + member xe-2/0/11; [edit interfaces] + xe-2/0/11 { + des... [11:09:12] 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271605 (10Marostegui) >>! In T196840#4271598, @Paladox wrote: > This https://phabricator.wikimedia.org/D1067 will fix it so no more new notedb refs are cloned. When are you p... [11:09:17] 10Operations, 10ops-codfw, 10netops: Switch port configuration for backup2001 - https://phabricator.wikimedia.org/T196782#4271606 (10ayounsi) 05Open>03Resolved a:05RobH>03ayounsi [11:09:19] (03CR) 10Volans: Client CLI: read configuration file. (031 comment) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [11:09:21] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477#4271608 (10ayounsi) [11:09:28] 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271040 (10Dzahn) >>! In T196869#4271568, @hashar wrote: > We need #operations to fix up permissions on cobalt.wikimedia.org > > File... [11:09:31] (03CR) 10jerkins-bot: [V: 04-1] Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [11:09:35] (03CR) 10jerkins-bot: [V: 04-1] Allow to set AUTH_LDAP_GLOBAL_OPTIONS [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [11:09:37] (03CR) 10jerkins-bot: [V: 04-1] Client CLI: bump version to 1.2.0 [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:09:46] (03CR) 10jerkins-bot: [V: 04-1] Client CLI: add CA bundle for server validation [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:09:51] (03CR) 10Alexandros Kosiaris: [C: 031] Client CLI: read configuration file. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [11:09:55] 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271614 (10Paladox) Need someone to approve it, merge it and then i think @mmodell would have to deploy it. [11:10:28] 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4271615 (10Marostegui) Excellent - thanks! :) [11:10:31] 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271616 (10Paladox) @dzahn though the objects inside that folder could be owned by root. [11:10:49] (03PS1) 10ArielGlenn: ignore blank lines in xml dumps cleanup config files [puppet] - 10https://gerrit.wikimedia.org/r/439574 [11:11:36] (03CR) 10Alexandros Kosiaris: [C: 032] kubernetes: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439556 (owner: 10Dzahn) [11:12:03] PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 364.85 seconds [11:12:08] (03PS2) 10Alexandros Kosiaris: kubernetes: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439556 (owner: 10Dzahn) [11:12:41] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] kubernetes: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439556 (owner: 10Dzahn) [11:12:54] PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.83 seconds [11:13:03] PROBLEM - MariaDB Slave Lag: s3 on db2050 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.37 seconds [11:13:03] PROBLEM - MariaDB Slave Lag: s3 on db2036 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.67 seconds [11:13:03] PROBLEM - MariaDB Slave Lag: s3 on db2043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.67 seconds [11:13:17] (03CR) 10ArielGlenn: [C: 032] ignore blank lines in xml dumps cleanup config files [puppet] - 10https://gerrit.wikimedia.org/r/439574 (owner: 10ArielGlenn) [11:13:26] (03PS2) 10ArielGlenn: ignore blank lines in xml dumps cleanup config files [puppet] - 10https://gerrit.wikimedia.org/r/439574 [11:13:48] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439567 (owner: 10Marostegui) [11:14:22] (03PS2) 10Volans: debmonitor: set TLS cipher suite for LDAP [puppet] - 10https://gerrit.wikimedia.org/r/439571 (https://phabricator.wikimedia.org/T191299) [11:15:14] !Log gerrit (cobalt) - fixing root-owned files in gerrit All-Userrs.git objects ( affects saved preferences of some users) (T196869) [11:15:15] T196869: In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869 [11:15:15] (03CR) 10Volans: [C: 032] debmonitor: set TLS cipher suite for LDAP [puppet] - 10https://gerrit.wikimedia.org/r/439571 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [11:15:31] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439567 (owner: 10Marostegui) [11:16:31] (03PS1) 10Marostegui: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439577 (https://phabricator.wikimedia.org/T191316) [11:16:44] 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271646 (10Dzahn) >>! In T196869#4271568, @hashar wrote: > We need #operations to fix up permissions on cobalt.wikimedia.org Fixed.... [11:17:02] (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437955 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:17:55] (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437956 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:18:06] RECOVERY - MariaDB Slave Lag: s6 on db2076 is OK: OK slave_sql_lag Replication lag: 57.79 seconds [11:18:06] RECOVERY - MariaDB Slave Lag: s6 on db2089 is OK: OK slave_sql_lag Replication lag: 45.44 seconds [11:18:10] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439567 (owner: 10Marostegui) [11:18:26] RECOVERY - MariaDB Slave Lag: s6 on db2053 is OK: OK slave_sql_lag Replication lag: 0.22 seconds [11:18:26] RECOVERY - MariaDB Slave Lag: s6 on db2060 is OK: OK slave_sql_lag Replication lag: 0.37 seconds [11:18:29] (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437957 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [11:18:45] RECOVERY - MariaDB Slave Lag: s6 on db2067 is OK: OK slave_sql_lag Replication lag: 0.07 seconds [11:18:46] RECOVERY - MariaDB Slave Lag: s6 on db2039 is OK: OK slave_sql_lag Replication lag: 0.25 seconds [11:19:06] RECOVERY - MariaDB Slave Lag: s6 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 54.16 seconds [11:19:18] (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/437958 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:19:25] RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 43.56 seconds [11:19:59] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439577 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [11:20:01] (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439560 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [11:20:06] RECOVERY - MariaDB Slave Lag: s6 on db2087 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [11:20:39] (03CR) 10Volans: [V: 032 C: 032] "Overriding CI due to T196628 (only py27 test is failing)" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/439561 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [11:20:41] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 after alter table (duration: 00m 51s) [11:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:49] (03PS4) 10Muehlenhoff: Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) [11:20:56] (03PS1) 10Ema: reload-vcl: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/439578 [11:21:45] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439577 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [11:21:45] RECOVERY - MariaDB Slave Lag: s6 on db2046 is OK: OK slave_sql_lag Replication lag: 0.12 seconds [11:22:00] (03CR) 10jerkins-bot: [V: 04-1] Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff) [11:22:40] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439577 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [11:22:53] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 for alter table (duration: 00m 50s) [11:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:59] !log Deploy schema change on db1103:3314 T191316 T192926 T89737 T195193 [11:23:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:06] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [11:23:06] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [11:23:07] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [11:23:07] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [11:24:47] (03CR) 10Volans: "nitpick inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439578 (owner: 10Ema) [11:25:05] PROBLEM - MariaDB Slave Lag: s3 on db2074 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 426.71 seconds [11:29:05] (03PS1) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) [11:29:36] RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [11:29:51] (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [11:30:11] !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@b5396cd]: Tune cirrus jobs concurrencies [11:30:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:44] (03PS1) 10Aklapper: phabricator weekly project changes email: Add mysql slave port parameter [puppet] - 10https://gerrit.wikimedia.org/r/439581 (https://phabricator.wikimedia.org/T196604) [11:30:52] !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@b5396cd]: Tune cirrus jobs concurrencies (duration: 00m 42s) [11:30:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:22] (03CR) 10Hoo man: "Please only change this temporary. 10m are quite a lot…" [puppet] - 10https://gerrit.wikimedia.org/r/439528 (https://phabricator.wikimedia.org/T194602) (owner: 10Addshore) [11:31:45] (03CR) 10Aklapper: "Review carefully as I have no clue what I'm doing here" [puppet] - 10https://gerrit.wikimedia.org/r/439581 (https://phabricator.wikimedia.org/T196604) (owner: 10Aklapper) [11:33:34] (03PS2) 10Ema: reload-vcl: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/439578 [11:34:59] (03PS7) 10Arturo Borrero Gonzalez: openstack: eqiad1: enable more components for labcontrol boxes (keystone) [puppet] - 10https://gerrit.wikimedia.org/r/438220 (https://phabricator.wikimedia.org/T196633) [11:35:26] 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271709 (10hashar) a:03Dzahn Fixed mutante ! Danke. My test case was to go to https://gerrit.wikimedia.org/r/settings/ and try to fi... [11:37:00] (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: eqiad1: enable more components for labcontrol boxes (keystone) [puppet] - 10https://gerrit.wikimedia.org/r/438220 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez) [11:37:36] volans: I have no idea. I haven't looked at it yet. I guess I will just upgrade tox accross the fleet of containers [11:38:27] !log T196633 deploy keystone to labcontrol100[3,4].wikimedia.org. Dormant daemon, no DB yet [11:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:32] T196633: cloudvps: eqiad1 deployment - https://phabricator.wikimedia.org/T196633 [11:43:35] PROBLEM - DPKG on labcontrol1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:43:45] PROBLEM - DPKG on labcontrol1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:43:46] PROBLEM - puppet last run on labcontrol1003 is CRITICAL: CRITICAL: Puppet has 38 failures. Last run 51 seconds ago with 38 failures. Failed resources (up to 3 shown): Package[keystone],Package[alembic],Package[python-castellan],Package[python-concurrent.futures] [11:44:55] PROBLEM - puppet last run on labcontrol1004 is CRITICAL: CRITICAL: Puppet has 38 failures. Last run 1 minute ago with 38 failures. Failed resources (up to 3 shown): Package[keystone],Package[alembic],Package[python-castellan],Package[python-concurrent.futures] [11:45:05] ouch [11:45:45] !log T196633 downtime labcontrol100[3,4] due to unexpected puppet errors on installation of keystone [11:45:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:50] T196633: cloudvps: eqiad1 deployment - https://phabricator.wikimedia.org/T196633 [11:46:59] (03Abandoned) 10Hoo man: Include php5 packages on canary hosts [puppet] - 10https://gerrit.wikimedia.org/r/391045 (owner: 10Hoo man) [11:48:55] RECOVERY - MariaDB Slave Lag: s7 on db2061 is OK: OK slave_sql_lag Replication lag: 46.97 seconds [11:52:02] (03CR) 10Muehlenhoff: Add initial Debianisation of debmonitor-client (033 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff) [11:52:41] hashar: ack, thanks! no hurry, just wanted to know if it is something we'll see in the near future or not ;) [11:53:10] (03PS3) 10Urbanecm: Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134) [11:53:30] (03PS2) 10Urbanecm: Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134) [11:53:42] (03PS3) 10Urbanecm: Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134) [11:55:45] RECOVERY - DPKG on labcontrol1004 is OK: All packages OK [12:00:05] RECOVERY - puppet last run on labcontrol1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [12:00:37] 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4271773 (10danstillman) Not sure what you're planning, but the initial version of our Node port is up: https://github... [12:03:29] (03PS5) 10Muehlenhoff: Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) [12:03:42] (03CR) 10Urbanecm: [C: 031] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437777 (https://phabricator.wikimedia.org/T196488) (owner: 10Sau226) [12:04:16] RECOVERY - DPKG on labcontrol1003 is OK: All packages OK [12:05:01] (03CR) 10jerkins-bot: [V: 04-1] Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff) [12:08:08] (03CR) 10Volans: "Replies inline" (033 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff) [12:08:30] 10Operations, 10Gerrit, 10Release-Engineering-Team (Kanban): In Gerrit some users are reporting problems saving there preferences - https://phabricator.wikimedia.org/T196869#4271807 (10Dzahn) 05Open>03Resolved [12:08:52] jouncebot, next [12:08:52] In 0 hour(s) and 51 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1300) [12:10:15] PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet operation_type={create_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [12:10:54] (03CR) 10Volans: [C: 031] "LGTM (although I'm not familiar with this script ;) )" [puppet] - 10https://gerrit.wikimedia.org/r/439578 (owner: 10Ema) [12:11:16] RECOVERY - kubelet operational latencies on kubernetes1002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [12:12:35] PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 406.54 seconds [12:16:11] (03PS6) 10Muehlenhoff: Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) [12:16:51] (03PS1) 10Volans: Add scap/log to .gitignore [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439584 [12:16:53] (03PS1) 10Volans: Updated src to v0.1.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439585 (https://phabricator.wikimedia.org/T191299) [12:16:55] (03PS1) 10Volans: Built wheels for v0.1.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439586 (https://phabricator.wikimedia.org/T191299) [12:17:36] (03CR) 10jerkins-bot: [V: 04-1] Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff) [12:18:51] (03CR) 10Volans: [V: 032 C: 032] Add scap/log to .gitignore [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439584 (owner: 10Volans) [12:20:06] RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 0.53 seconds [12:22:42] (03CR) 10Volans: [V: 032 C: 032] Updated src to v0.1.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439585 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:24:02] (03CR) 10Volans: [V: 032 C: 032] Built wheels for v0.1.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/439586 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:29:46] (03PS3) 10Paladox: Rename wikimedia-polygerrit-style.html to gerrit-theme.html [puppet] - 10https://gerrit.wikimedia.org/r/439504 (https://phabricator.wikimedia.org/T196835) [12:29:51] (03PS4) 10Paladox: Gerrit: Add CoC and privacy policy to footer [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) [12:30:07] 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4271879 (10Mvolz) >>! In T187194#4271773, @danstillman wrote: > Not sure what you're planning, but the initial version... [12:36:35] 10Operations, 10cloud-services-team: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4271885 (10aborrero) [12:41:37] (03PS1) 10Volans: debmonitor: fix typo in nginx config [puppet] - 10https://gerrit.wikimedia.org/r/439587 (https://phabricator.wikimedia.org/T191299) [12:42:21] (03CR) 10Paladox: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/439444 (https://phabricator.wikimedia.org/T196812) (owner: 10Paladox) [12:42:29] (03CR) 10Volans: [C: 032] debmonitor: fix typo in nginx config [puppet] - 10https://gerrit.wikimedia.org/r/439587 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:42:51] (03CR) 10Vgutierrez: [C: 031] "Sorry I've missed that :(" [puppet] - 10https://gerrit.wikimedia.org/r/439587 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:48:17] 10Operations, 10cloud-services-team: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4264894 (10aborrero) This is our usecase, for example with keystone (T196633). We need the equivalent of `apt-get install -t jessie-backports keystone`. This i... [12:54:10] !log volans@deploy1001 Started deploy [debmonitor/deploy@81d7333]: Release v0.1.2 [12:54:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:14] (03PS7) 10Muehlenhoff: Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) [12:59:23] (03CR) 10jerkins-bot: [V: 04-1] Add initial Debianisation of debmonitor-client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/438018 (https://phabricator.wikimedia.org/T191298) (owner: 10Muehlenhoff) [13:00:05] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1300). Please do the needful. [13:00:05] Daimona and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:07] Here [13:00:19] Hey [13:01:11] Who will SWAT today? zeljkof? hashar? :) [13:01:20] I will hello :) [13:01:25] Hi hashar! [13:01:27] !log volans@deploy1001 Finished deploy [debmonitor/deploy@81d7333]: Release v0.1.2 (duration: 07m 16s) [13:01:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:14] (03CR) 10Volans: "Same here, debmonitor/deploy/.git deploys are blocked by this." [puppet] - 10https://gerrit.wikimedia.org/r/438121 (https://phabricator.wikimedia.org/T196710) (owner: 10Thcipriani) [13:03:05] (03PS1) 10Arturo Borrero Gonzalez: openstack: keystone: use install_options to install from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/439589 (https://phabricator.wikimedia.org/T196633) [13:03:23] arrgh [13:03:52] What's happening? [13:04:42] well I am looking at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/437924 [13:04:50] guess I will need to update l10n cache somehow [13:05:00] Probably [13:05:12] Not sure tho [13:05:41] hashar, maybe https://wikitech.wikimedia.org/wiki/LocalisationUpdate can help [13:05:46] I would rather deploy that with the rest of the train. I am sure I am going to screw it up somehow [13:07:11] If that's the case, no problem :-) [13:09:01] Daimona: yeah sorry. I have little time to baby sit the swat after the depoyment and I am not confident with this one :^\ [13:09:12] it does not seem to urgent though, that will start being deployed tomorrow anyway [13:09:26] Sure [13:09:34] (03CR) 10Hashar: [C: 032] Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:09:39] Yeah, this actually missed last week's train [13:09:52] But I guess waiting another couple of days won't be a big deal [13:10:08] yeah that is how I understand it :] [13:10:18] Daimona: but as part of the train, it will be straightforward/easy [13:10:51] Indeed [13:10:54] Thanks anyway :-) [13:11:14] (03Merged) 10jenkins-bot: Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:12:11] PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 374.78 seconds [13:12:37] checking that [13:12:42] Urbanecm: I am syncing the logo change [13:12:47] ack [13:13:18] (are you syncing only the first patch? first 3 patches are for the same problem ) [13:13:21] !log hashar@deploy1001 Synchronized static/images/project-logos: Revert "Change bewikiquote logo" - T196134 (duration: 00m 51s) [13:13:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:26] T196134: Change bewikiquote logo - https://phabricator.wikimedia.org/T196134 [13:14:02] Urbanecm: and I have purge dthe logos [13:14:08] thx [13:14:38] (03CR) 10Hashar: [C: 032] Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:15:06] (03CR) 10Hashar: [C: 032] Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:15:24] I will deploy them one by one [13:15:32] logos png files first then the IS.php file [13:16:03] Ok, ack [13:16:03] (03Merged) 10jenkins-bot: Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:16:51] (03Merged) 10jenkins-bot: Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:17:12] (03CR) 10jenkins-bot: Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:17:14] (03CR) 10jenkins-bot: Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:17:16] (03CR) 10jenkins-bot: Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:17:21] RECOVERY - Check systemd state on proton1001 is OK: OK - running: The system is fully operational [13:17:27] !log hashar@deploy1001 Synchronized static/images/project-logos: Change logo files for bewikiquote - T196134 (duration: 00m 50s) [13:17:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:31] Urbanecm: I have deployed the new bewikiquote logos and purge the URL. Now doing the IS change [13:18:37] ack [13:19:41] RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 0.01 seconds [13:20:05] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Use uploaded HD logo for bewikiquote - T196134 (duration: 00m 50s) [13:20:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:11] T196134: Change bewikiquote logo - https://phabricator.wikimedia.org/T196134 [13:20:32] (03CR) 10Ottomata: [C: 031] ":)" [puppet] - 10https://gerrit.wikimedia.org/r/438243 (https://phabricator.wikimedia.org/T196158) (owner: 10Elukey) [13:20:43] (03CR) 10Vgutierrez: [C: 032] update-ocsp: Actually use --time-offset-end argument [puppet] - 10https://gerrit.wikimedia.org/r/436485 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez) [13:20:46] !log volans@deploy1001 Started deploy [debmonitor/deploy@81d7333]: Release v0.1.2 [13:20:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:28] (03PS3) 10Vgutierrez: update-ocsp: Actually use --time-offset-end argument [puppet] - 10https://gerrit.wikimedia.org/r/436485 (https://phabricator.wikimedia.org/T163541) [13:21:42] !log volans@deploy1001 Finished deploy [debmonitor/deploy@81d7333]: Release v0.1.2 (duration: 00m 56s) [13:21:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:10] (03CR) 10Ottomata: "Does it need the API version set at all anymore? Can it just negotiate?" [puppet] - 10https://gerrit.wikimedia.org/r/438211 (owner: 10Elukey) [13:23:00] Urbanecm: statistics updated https://phabricator.wikimedia.org/T196788#4272009 [13:23:08] hashar, thx [13:23:30] hashar, seems we have time, do you think I can add more patches? [13:23:32] (03CR) 10Vgutierrez: [C: 032] update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez) [13:23:41] RECOVERY - Check systemd state on proton1002 is OK: OK - running: The system is fully operational [13:23:50] (btw please update stats for idwikimedia as well hashar) [13:23:56] (03CR) 10Hashar: [C: 032] Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm) [13:24:01] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [13:24:23] (03PS3) 10Vgutierrez: update-ocsp: Fix cert_get_issuer_filename error handling [puppet] - 10https://gerrit.wikimedia.org/r/439557 (https://phabricator.wikimedia.org/T163541) [13:25:04] (03PS1) 10Urbanecm: Use 1x lgoo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) [13:25:14] Urbanecm: idwikimedia done [13:25:16] thx [13:26:18] (03PS2) 10Urbanecm: Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) [13:26:32] Urbanecm: and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/439457 ends up in a merge conflict somehow [13:26:34] rebasing [13:26:40] (03PS2) 10Hashar: Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm) [13:26:56] (03CR) 10Hashar: [C: 032] Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm) [13:28:11] (03PS8) 10Urbanecm: id_internalwikimedia: Initial configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438279 [13:28:33] (03Merged) 10jenkins-bot: Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm) [13:28:37] (03CR) 10Vgutierrez: [C: 032] update-ocsp: Make pylint happy [puppet] - 10https://gerrit.wikimedia.org/r/439558 (https://phabricator.wikimedia.org/T163541) (owner: 10Vgutierrez) [13:28:46] (03CR) 10jenkins-bot: Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837) (owner: 10Urbanecm) [13:29:13] Urbanecm: syncng the pswikivoyage change [13:29:18] ack [13:29:37] hashar, do we have time for other patches as well? [13:29:51] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix wgMetaNamespace for pswikivoyage - T196837 (duration: 00m 50s) [13:29:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:56] T196837: Fix wgMetaNamespace for pswikivoyage - https://phabricator.wikimedia.org/T196837 [13:30:01] (03PS2) 10Vgutierrez: update-ocsp: Make pylint happy [puppet] - 10https://gerrit.wikimedia.org/r/439558 (https://phabricator.wikimedia.org/T163541) [13:30:27] (03CR) 10Elukey: [C: 032] "> Does it need the API version set at all anymore? Can it just" [puppet] - 10https://gerrit.wikimedia.org/r/438211 (owner: 10Elukey) [13:30:33] Urbanecm: such as https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/439594/2/wmf-config/InitialiseSettings.php ? :) [13:31:06] Such as this one, but there's 9 tasks assigned to me waiting for SWAT, so... :D [13:31:21] (03PS3) 10Hashar: Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:31:43] I will do that one then stop. I have some code to complete before doing my weekly conf calls [13:31:56] (03CR) 10Hashar: [C: 032] Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:31:58] Ok, ack [13:32:06] ACKNOWLEDGEMENT - IPMI Sensor Status on maps1002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Power Supply 2 = Critical, Power Supplies = Critical] Gehel followed in https://phabricator.wikimedia.org/T196897 [13:32:17] (03CR) 10Ottomata: statistics::discovery: re-enable cron job (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [13:32:18] Urbanecm: I should more time tomorrow or thursday [13:32:40] 10Operations, 10ops-eqiad, 10DC-Ops: Power supply issue on maps1002 - https://phabricator.wikimedia.org/T196897#4272061 (10Gehel) [13:32:44] Ok :) [13:32:54] (03CR) 10Ottomata: "Def not a big issue! Just wondering." [puppet] - 10https://gerrit.wikimedia.org/r/438211 (owner: 10Elukey) [13:33:15] (03Merged) 10jenkins-bot: Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:33:30] (03CR) 10jenkins-bot: Use 1x logo for bewikiquote in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439594 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm) [13:35:38] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix wgMetaNamespace for pswikivoyage - T196837 (duration: 00m 49s) [13:35:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:43] T196837: Fix wgMetaNamespace for pswikivoyage - https://phabricator.wikimedia.org/T196837 [13:38:08] Urbanecm: done [13:38:11] thx [13:43:43] (03CR) 10Rush: "seems good, does labtestn need the same option set?" [puppet] - 10https://gerrit.wikimedia.org/r/439589 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez) [13:44:54] hashar: swat done? [13:45:04] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439598 [13:47:12] marostegui: yes [13:47:16] !log European SWAT completed [13:47:17] sorry [13:47:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:25] \o/ [13:47:26] thanks! [13:47:52] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439598 (owner: 10Marostegui) [13:49:32] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439598 (owner: 10Marostegui) [13:49:45] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439598 (owner: 10Marostegui) [13:50:40] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 after alter table (duration: 00m 50s) [13:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:05] (03PS1) 10Marostegui: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439600 (https://phabricator.wikimedia.org/T191316) [13:52:11] (03PS4) 10Zoranzoki21: Add sites to the wgCopyUploadsDomains whitelist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436211 (https://phabricator.wikimedia.org/T195270) [13:52:55] !log otto@deploy1001 Started deploy [eventlogging/eventbus@08a1dff]: Producing events with kafka timestamp set to event time - T196407 [13:52:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:00] T196407: EventBus should produce messages to Kafka with event time set to meta.dt - https://phabricator.wikimedia.org/T196407 [13:54:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439600 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [13:54:06] ACKNOWLEDGEMENT - Device not healthy -SMART- on mw1230 is CRITICAL: cluster=api_appserver device=sda instance=mw1230:9100 job=node site=eqiad Giuseppe Lavagetto T196881 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mw1230&var-datasource=eqiad%2520prometheus%252Fops [13:54:06] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1230 is CRITICAL: Host mw1230 is not in mediawiki-installation dsh group Giuseppe Lavagetto T196881 [13:54:50] !log otto@deploy1001 Finished deploy [eventlogging/eventbus@08a1dff]: Producing events with kafka timestamp set to event time - T196407 (duration: 01m 55s) [13:54:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:37] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439600 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [13:56:24] !log otto@deploy1001 Started deploy [eventlogging/analytics@08a1dff]: Producing events with kafka timestamp set to event time - T196407 [13:56:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:28] !log otto@deploy1001 Finished deploy [eventlogging/analytics@08a1dff]: Producing events with kafka timestamp set to event time - T196407 (duration: 00m 04s) [13:56:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:43] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1081 for alter table (duration: 00m 48s) [13:56:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:03] !log Deploy schema change on db1081 T191316 T192926 T89737 T195193 [13:58:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:10] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [13:58:10] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [13:58:10] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [13:58:10] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [13:59:16] 10Operations, 10cloud-services-team, 10Patch-For-Review: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4272176 (10Andrew) @aborrero doesn't pinning work if we pin the keystone package and all dependencies? Like in openstack::jessie_mitaka_c... [13:59:18] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439600 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [13:59:21] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labstore1003 SMART failure (again) - https://phabricator.wikimedia.org/T196704#4272177 (10chasemp) 05Open>03Resolved ```root@labstore1003:~# /usr/local/lib/nagios/plugins/check_raid megacli OK: optimal, 5 logical, 34 physical OK``` Thanks! [14:00:14] 10Operations, 10ops-eqiad, 10DC-Ops: Replace memory bank on scb1002 - https://phabricator.wikimedia.org/T196901#4272179 (10Joe) [14:00:42] ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on scb1002 is CRITICAL: 5.001 ge 4 Giuseppe Lavagetto T196901 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=scb1002&var-datasource=eqiad%2520prometheus%252Fops [14:06:37] (03CR) 10Muehlenhoff: debmonitor: client side setup (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [14:06:40] 10Operations, 10ops-eqiad, 10netops: replace mr1-eqiad - https://phabricator.wikimedia.org/T185171#4272218 (10ayounsi) [14:06:49] 10Operations, 10ops-eqiad, 10netops: replace mr1-eqiad - https://phabricator.wikimedia.org/T185171#3908273 (10ayounsi) [14:08:22] RECOVERY - MariaDB Slave Lag: s3 on db2074 is OK: OK slave_sql_lag Replication lag: 0.21 seconds [14:08:41] RECOVERY - MariaDB Slave Lag: s3 on db2043 is OK: OK slave_sql_lag Replication lag: 0.05 seconds [14:08:41] RECOVERY - MariaDB Slave Lag: s3 on db2050 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [14:08:42] RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [14:08:51] RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 0.33 seconds [14:10:58] (03CR) 10Alexandros Kosiaris: [C: 031] debmonitor: client side setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [14:13:12] RECOVERY - MariaDB Slave Lag: s3 on db2094 is OK: OK slave_sql_lag Replication lag: 39.95 seconds [14:13:55] (03CR) 10Hoo man: [C: 032] "Tested" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425987 (owner: 10Lokal Profil) [14:14:24] (03CR) 10Muehlenhoff: debmonitor: client side setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [14:14:31] (03Merged) 10jenkins-bot: Allow prefix to override "all" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425987 (owner: 10Lokal Profil) [14:17:25] (03PS1) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 [14:17:36] (03PS2) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 [14:18:25] (03CR) 10jerkins-bot: [V: 04-1] Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 (owner: 10Volans) [14:19:28] (03PS3) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 [14:20:31] (03CR) 10jerkins-bot: [V: 04-1] Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 (owner: 10Volans) [14:21:14] (03PS17) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [14:21:52] !log Updated operations/dumps/dcat (536bd5b..559dee3) on snapshot1008 [14:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:26] (03PS4) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 [14:22:49] (03CR) 10Alexandros Kosiaris: [C: 032] deployment_server/package_builder: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439544 (owner: 10Dzahn) [14:22:56] (03PS2) 10Alexandros Kosiaris: deployment_server/package_builder: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439544 (owner: 10Dzahn) [14:22:59] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] deployment_server/package_builder: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/439544 (owner: 10Dzahn) [14:23:09] (03CR) 10Hoo man: [C: 031] "Should be good to merge now (but didn't test yet)" [puppet] - 10https://gerrit.wikimedia.org/r/424291 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [14:23:27] (03PS5) 10Volans: Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 [14:23:42] thcipriani: I'm ok to go with scap upgrade btw, if you are around [14:24:20] godog: awesome, thank you! I'm around. [14:24:40] (03CR) 10Volans: [C: 032] Revert "debmonitor: set TLS cipher suite for LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/439603 (owner: 10Volans) [14:25:07] !log upload scap 3.8.2-1 - T196710 [14:25:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:12] T196710: Update Debian Package for Scap3 to 3.8.2-1 - https://phabricator.wikimedia.org/T196710 [14:25:23] (03PS2) 10Filippo Giunchedi: Scap: Bump version to 3.8.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/438121 (https://phabricator.wikimedia.org/T196710) (owner: 10Thcipriani) [14:26:13] (03CR) 10Filippo Giunchedi: [C: 032] Scap: Bump version to 3.8.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/438121 (https://phabricator.wikimedia.org/T196710) (owner: 10Thcipriani) [14:26:57] !log akosiaris@deploy1001 Started deploy [proton/deploy@97ec4bf]: (no justification provided) [14:27:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:27:43] (03CR) 10Hoo man: [C: 031] "Tested on sn1008, output diff looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/424291 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [14:28:22] RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy [14:28:48] thcipriani: deploy1001 upgraded [14:29:21] * thcipriani looks [14:30:04] 10Operations, 10ops-codfw, 10DBA: replace bad disk in db2059 - https://phabricator.wikimedia.org/T196709#4272315 (10Papaul) a:05Papaul>03Marostegui Disk replaced [14:31:05] 10Operations, 10ops-codfw, 10DBA: replace bad disk in db2059 - https://phabricator.wikimedia.org/T196709#4272318 (10Marostegui) Thanks! ``` physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 600 GB, Rebuilding) ``` Will report back once it is done [14:31:18] godog: yep, I see the change, it was a small one-liner. I don't have a repo to test deploy for this particular change, but I was able to recreate locally. Once puppet is run on all the targets it should unblock a few folks. Thanks again for all your help! [14:32:37] thcipriani: np! [14:32:52] PROBLEM - puppet last run on proton1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[proton/deploy] [14:33:42] that proton thing is failing with a different error every time [14:34:13] that's how you get to 100% coverage [14:34:25] lol [14:34:32] <_joe_> akosiaris: so that you don't get bored and burnt out by the repetitiveness of your work [14:35:56] !log reboot mx1001, poolcounter1001 for kernel upgrades and spec-ctrl enabling [14:35:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:15] 10Operations, 10Scap, 10Patch-For-Review: Update Debian Package for Scap3 to 3.8.2-1 - https://phabricator.wikimedia.org/T196710#4272339 (10thcipriani) 05Open>03Resolved a:03fgiunchedi New package was uploaded and puppet should be setting it up on targets with the next run. [14:36:18] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 2 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4272343 (10thcipriani) [14:38:02] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[scap] [14:41:01] RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational [14:42:06] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439606 [14:42:21] (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) [14:42:41] RECOVERY - Check systemd state on restbase-dev1006 is OK: OK - running: The system is fully operational [14:43:44] (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for mcelog [puppet] - 10https://gerrit.wikimedia.org/r/439565 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [14:45:30] godog: welcome back :) [14:46:08] urandom: \o/ thanks! [14:46:12] godog: thcipriani: should scap 3.8.2-2 exist in apt? [14:46:31] urandom: I think it should be 3.8.2-1 [14:46:59] oh, rightt [14:47:01] brain-o [14:47:23] but i see only 3.8.1-1 available [14:47:38] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439606 (owner: 10Marostegui) [14:48:02] nevermind... [14:48:12] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:48:15] :) [14:48:27] * urandom needs coffee [14:49:04] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439606 (owner: 10Marostegui) [14:49:16] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439606 (owner: 10Marostegui) [14:49:48] thcipriani: as godog pointed out elsewhere, our cron-job runs apt-get update before puppet, and i issued a manual run [14:49:56] (03PS2) 10Herron: add SPF record to disallow email for all parked domains [dns] - 10https://gerrit.wikimedia.org/r/429874 (https://phabricator.wikimedia.org/T193408) (owner: 10Dzahn) [14:50:32] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1081 after alter table (duration: 00m 50s) [14:50:35] (03PS1) 10Marostegui: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439607 (https://phabricator.wikimedia.org/T191316) [14:50:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:12] urandom: ah, yep, that's what I started typing before you said nevermind :) [14:51:20] but I'm a slow typist. [14:51:22] (03CR) 10Herron: [C: 032] add SPF record to disallow email for all parked domains [dns] - 10https://gerrit.wikimedia.org/r/429874 (https://phabricator.wikimedia.org/T193408) (owner: 10Dzahn) [14:52:34] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439607 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [14:53:14] 10Operations, 10Cassandra, 10User-Eevans: Add Cassandra 3.11.2 package to internal APT repository - https://phabricator.wikimedia.org/T196745#4272414 (10Eevans) 05stalled>03Open p:05Low>03Normal [14:53:17] !log reboot bohrium for kernel upgrades and spec-ctrl enabling. Manually stopped mysql behorehand [14:53:19] elukey: ^ [14:53:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:41] PROBLEM - Check systemd state on restbase-dev1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:54:01] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439607 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [14:54:13] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439607 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [14:54:27] 10Operations, 10Cassandra, 10User-Eevans: Add Cassandra 3.11.2 package to internal APT repository - https://phabricator.wikimedia.org/T196745#4267217 (10Eevans) [14:55:13] akosiaris: thanks! [14:55:16] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1084 for alter table (duration: 00m 50s) [14:55:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:24] !log Deploy schema change on db1084 T191316 T192926 T89737 T195193 [14:55:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:30] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [14:55:31] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [14:55:31] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [14:55:31] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [14:57:21] RECOVERY - Device not healthy -SMART- on db2059 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2059&var-datasource=codfw%2520prometheus%252Fops [14:57:40] (03PS5) 10Herron: Prep to tighten PuppetDB access control - log client certificate details [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [14:57:46] (03PS6) 10Herron: Prep to tighten PuppetDB access control - log client certificate details [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [15:02:22] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy [15:02:30] !log akosiaris@deploy1001 Finished deploy [proton/deploy@97ec4bf]: (no justification provided) (duration: 35m 33s) [15:02:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:32] RECOVERY - Check systemd state on restbase-dev1006 is OK: OK - running: The system is fully operational [15:13:08] 10Operations, 10ops-eqiad: ms-be1036 in power off status, not responsive to power on commands - https://phabricator.wikimedia.org/T196873#4271185 (10Cmjohnson) ms-be1036 will no power back manually either. I tried pulling the PSU"s out, waiting several minutes and all I get is a flashing green light on the pow... [15:18:35] 10Operations, 10monitoring, 10Patch-For-Review: Evaluate Grafana's LDAP group options and deprecate grafana-admin if possible - https://phabricator.wikimedia.org/T170150#4272537 (10akosiaris) All of my tests went fine. Scheduling this for Wednesday June 27th. I 'll send an email to wikitech-l as well [15:22:01] 10Operations, 10ops-eqiad, 10DC-Ops: Power supply issue on maps1002 - https://phabricator.wikimedia.org/T196897#4272551 (10Cmjohnson) I checked the power cable, no issue, removed the PSU and re-inserted. Plugged power cable back in. A green light appeared for a second and then went dark again. This AHS so t... [15:25:01] PROBLEM - Host mw1230 is DOWN: PING CRITICAL - Packet loss = 100% [15:25:37] 10Operations, 10ops-eqiad, 10DC-Ops: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881#4271440 (10Cmjohnson) @joe mw1230 disks replaced, needs reinstall [15:28:31] RECOVERY - Host mw1230 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [15:29:32] PROBLEM - High CPU load on API appserver on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:30:52] PROBLEM - Disk space on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:30:52] PROBLEM - HHVM processes on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:02] PROBLEM - nutcracker port on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:12] PROBLEM - HHVM rendering on mw1230 is CRITICAL: connect to address 10.64.48.65 and port 80: Connection refused [15:31:14] !log Set offline disk 32:1 on db1065 - T196806 [15:31:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:19] T196806: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806 [15:31:21] PROBLEM - Check size of conntrack table on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:22] PROBLEM - mcrouter process on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:22] PROBLEM - DPKG on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:31] PROBLEM - configured eth on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:41] PROBLEM - dhclient process on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:41] PROBLEM - nutcracker process on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:41] PROBLEM - Nginx local proxy to apache on mw1230 is CRITICAL: connect to address 10.64.48.65 and port 443: Connection refused [15:31:42] PROBLEM - Apache HTTP on mw1230 is CRITICAL: connect to address 10.64.48.65 and port 80: Connection refused [15:31:42] PROBLEM - MD RAID on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:51] PROBLEM - Check systemd state on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:51] PROBLEM - Check whether ferm is active by checking the default input chain on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:31:52] !log Set offline disk 32:3 on db1063 - T196806 [15:31:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:01] PROBLEM - puppet last run on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:33:35] !log akosiaris@deploy1001 Started deploy [proton/deploy@97ec4bf]: (no justification provided) [15:33:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:56] what happened to poor mw1230? [15:33:57] !log akosiaris@deploy1001 Finished deploy [proton/deploy@97ec4bf]: (no justification provided) (duration: 00m 22s) [15:34:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:02] PROBLEM - puppet last run on proton1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[proton/deploy] [15:34:02] RECOVERY - puppet last run on proton1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [15:34:37] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4272582 (10Ottomata) @Vgutierrez from what I can tell: the only blocker to removing IPSec is deploying a new version of librdkafka with your pat... [15:35:27] 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1063 - https://phabricator.wikimedia.org/T196804#4272583 (10Marostegui) Disk replaced by @Cmjohnson and RAID rebuilding: ``` root@db1063:~# megacli -PDRbld -ShowProg -PhysDrv [32:3] -aALL Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 2% in 1 Min... [15:35:41] 10Operations, 10ops-eqiad: Degraded RAID on labstore1003 - https://phabricator.wikimedia.org/T196757#4272585 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson cmjohnson@labstore1003:~$ sudo /usr/local/lib/nagios/plugins/check_raid megacli OK: optimal, 5 logical, 34 physical OK [15:36:12] PROBLEM - Check the NTP synchronisation status of timesyncd on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:37:18] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439612 [15:38:44] Can someone drop me from the moderators list on this channel? Don't need it :) [15:38:59] (03PS3) 10Ema: reload-vcl: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/439578 [15:39:11] RECOVERY - puppet last run on proton1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:39:21] 10Operations, 10ops-eqiad, 10Cloud-VPS: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651#4272603 (10Cmjohnson) [15:39:32] 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806#4272605 (10Marostegui) Disk replaced by @Cmjohnson and now rebuilding: ``` root@db1065:~# megacli -PDRbld -ShowProg -PhysDrv [32:1] -aALL Rebuild Progress on Device at Enclosure 32, Slot 1 Completed 1% in 1 Minu... [15:40:49] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1034 - https://phabricator.wikimedia.org/T195569#4272615 (10Cmjohnson) I need an update to this task. If we do not need the new disk I can send it back. Thanks! [15:40:57] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439612 (owner: 10Marostegui) [15:41:22] PROBLEM - puppet last run on proton1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[proton/deploy] [15:42:30] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439612 (owner: 10Marostegui) [15:42:45] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439612 (owner: 10Marostegui) [15:43:33] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1084 after alter table (duration: 00m 50s) [15:43:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:56] !log akosiaris@deploy1001 Started deploy [proton/deploy@97ec4bf]: (no justification provided) [15:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:12] PROBLEM - Host kafka-jumbo1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:44:28] !log akosiaris@deploy1001 Finished deploy [proton/deploy@97ec4bf]: (no justification provided) (duration: 00m 33s) [15:44:32] RECOVERY - Host kafka-jumbo1005 is UP: PING WARNING - Packet loss = 86%, RTA = 0.25 ms [15:44:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:42] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy [15:45:12] (03CR) 10Ema: [C: 032] reload-vcl: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/439578 (owner: 10Ema) [15:46:42] RECOVERY - Device not healthy -SMART- on mw1230 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mw1230&var-datasource=eqiad%2520prometheus%252Fops [15:49:11] PROBLEM - Host kafka-jumbo1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:50:20] 10Operations, 10Release-Engineering-Team: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4272661 (10herron) [15:50:30] 10Operations, 10Release-Engineering-Team: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4272671 (10herron) What does the phabricator outbound mail config look like today? Do we already have both mx1001 and mx2001 configured as outbound mail servers? [15:50:42] RECOVERY - Host kafka-jumbo1005 is UP: PING WARNING - Packet loss = 93%, RTA = 0.21 ms [15:51:11] PROBLEM - IPMI Sensor Status on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:51:32] (03PS1) 10Volans: debmonitor: enforce LDAP TLS cipher suite [puppet] - 10https://gerrit.wikimedia.org/r/439616 (https://phabricator.wikimedia.org/T191299) [15:52:31] PROBLEM - Host kafka-jumbo1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:53:04] (03CR) 10Volans: "reply inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [15:53:29] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4272695 (10Vgutierrez) I think we can do it :). BTW, right now we are enforcing AES ciphersuites in our TLS connections, and we are lucky that... [15:53:31] PROBLEM - MegaRAID on db1063 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [15:53:32] ACKNOWLEDGEMENT - MegaRAID on db1063 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T196918 [15:53:33] RECOVERY - Host kafka-jumbo1005 is UP: PING WARNING - Packet loss = 28%, RTA = 0.96 ms [15:53:36] 10Operations, 10ops-eqiad: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T196918#4272696 (10ops-monitoring-bot) [15:54:12] PROBLEM - MegaRAID on db1065 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [15:54:13] ACKNOWLEDGEMENT - MegaRAID on db1065 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T196919 [15:54:17] 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196919#4272701 (10ops-monitoring-bot) [15:54:39] 10Operations, 10ops-eqiad: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T196918#4272707 (10Marostegui) [15:54:41] 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1063 - https://phabricator.wikimedia.org/T196804#4272710 (10Marostegui) [15:54:51] PROBLEM - Host kafka-jumbo1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:55:13] 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196919#4272712 (10Marostegui) [15:55:16] 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806#4272715 (10Marostegui) [15:55:42] PROBLEM - IPsec on cp3047 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:42] PROBLEM - IPsec on cp3037 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:42] PROBLEM - IPsec on cp3042 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:42] PROBLEM - IPsec on cp3038 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:42] PROBLEM - IPsec on cp4024 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:51] PROBLEM - IPsec on cp5007 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:51] PROBLEM - IPsec on cp4022 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:51] PROBLEM - IPsec on cp4026 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:51] PROBLEM - IPsec on cp3030 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:55:51] PROBLEM - IPsec on cp3032 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [15:56:02] spam incoming.. checking kafka-jumbo1005 [15:56:11] PROBLEM - Host mw1230 is DOWN: PING CRITICAL - Packet loss = 100% [15:56:31] RECOVERY - Host mw1230 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [15:57:32] PROBLEM - High CPU load on API appserver on mw1230 is CRITICAL: Return code of 255 is out of bounds [15:57:36] 10Operations, 10ops-eqiad, 10DC-Ops: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881#4271440 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts: ``` mw1230.eqiad.wmnet ``` The log can be found in `/var/log/wmf-auto-reimage/20180611155... [15:58:01] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1006 is CRITICAL: 63 ge 10 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1006 [15:58:12] mmmm eno1: mtu 1500 qdisc mq state DOWN group default qlen 1000 [15:58:15] 10Operations, 10ops-eqiad, 10DC-Ops: Replace disk on mw1230 - https://phabricator.wikimedia.org/T196881#4272728 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1230.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['mw1230.eqiad.wmnet'] ``` [15:58:34] (03CR) 10Vgutierrez: [C: 031] "Awesome! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/439616 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [15:59:41] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1002 is CRITICAL: 17 ge 10 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1002 [15:59:45] 10Operations, 10ops-codfw, 10Traffic: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560#4272735 (10Papaul) [15:59:51] PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:59:52] XioNoX: you there by any chance? [16:00:31] elukey: ish, about to board a flight, what's up? [16:00:44] !log akosiaris@deploy1001 Started deploy [proton/deploy@97ec4bf]: (no justification provided) [16:00:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:00] XioNoX: nothing then, nevermind :) [16:01:02] PROBLEM - IPsec on cp2011 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:09] 10Operations: Add email queueing/failover to services currently using mail_smarthost[0] - https://phabricator.wikimedia.org/T196920#4272739 (10herron) [16:01:12] PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:12] PROBLEM - IPsec on cp4023 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:21] PROBLEM - IPsec on cp2022 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:21] PROBLEM - IPsec on cp4032 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:21] PROBLEM - IPsec on cp4031 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:21] PROBLEM - IPsec on cp4025 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:22] PROBLEM - IPsec on cp3008 is CRITICAL: Strongswan CRITICAL - ok: 26 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:31] PROBLEM - IPsec on cp2005 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:31] PROBLEM - IPsec on cp2008 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:31] PROBLEM - IPsec on cp2006 is CRITICAL: Strongswan CRITICAL - ok: 24 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:32] PROBLEM - IPsec on cp3040 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:32] PROBLEM - IPsec on cp3031 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:32] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1003 is CRITICAL: 36 ge 10 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003 [16:01:32] PROBLEM - IPsec on cp2025 is CRITICAL: Strongswan CRITICAL - ok: 24 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:41] PROBLEM - IPsec on cp4029 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:42] PROBLEM - IPsec on cp3034 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:42] PROBLEM - IPsec on cp3007 is CRITICAL: Strongswan CRITICAL - ok: 26 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:42] PROBLEM - IPsec on cp5009 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:42] PROBLEM - IPsec on cp3049 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:42] PROBLEM - IPsec on cp3039 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:42] PROBLEM - IPsec on cp3046 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:42] elukey: I see a lot of critical, let ne know if I can help with the time I have [16:01:43] PROBLEM - IPsec on cp3033 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:43] PROBLEM - IPsec on cp4021 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:44] PROBLEM - IPsec on cp5012 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:44] PROBLEM - IPsec on cp5001 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:51] PROBLEM - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:51] PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:51] PROBLEM - IPsec on cp3010 is CRITICAL: Strongswan CRITICAL - ok: 26 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:52] PROBLEM - IPsec on cp3044 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:52] PROBLEM - IPsec on cp4030 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:52] PROBLEM - IPsec on cp3036 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:52] PROBLEM - IPsec on cp5008 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:53] PROBLEM - IPsec on cp5011 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:53] PROBLEM - IPsec on cp5005 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:54] PROBLEM - IPsec on cp3048 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:54] PROBLEM - IPsec on cp3045 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:01:55] PROBLEM - IPsec on cp2002 is CRITICAL: Strongswan CRITICAL - ok: 78 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:01] PROBLEM - IPsec on cp3043 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:01] PROBLEM - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:01] PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:01] PROBLEM - IPsec on cp3035 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:01] PROBLEM - IPsec on cp3041 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:02] PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:02] PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:03] XioNoX: so eno1 on kafka-jumbo1005 is listed by ip addr as DOWN [16:02:03] PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 66 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:03] PROBLEM - IPsec on cp4028 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: kafka-jumbo1005_v4,kafka-jumbo1005_v6 [16:02:17] !log akosiaris@deploy1001 Finished deploy [proton/deploy@97ec4bf]: (no justification provided) (duration: 01m 32s) [16:02:20] I am in the console and logged as root, the host works [16:02:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:02:42] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1001 is CRITICAL: 67 ge 10 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1001 [16:02:44] XioNoX: it seems like the link is down [16:03:38] I am checking https://librenms.wikimedia.org/device/device=162/tab=port/port=16428/ but I don't see anything weird [16:06:32] RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational [16:07:00] (03PS2) 10Arturo Borrero Gonzalez: openstack: keystone: use install_options to install from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/439589 (https://phabricator.wikimedia.org/T196633) [16:07:07] elukey: died 14min ago [16:07:30] yeah but is it on the switch side ? [16:07:31] RECOVERY - Device not healthy -SMART- on db1063 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1063&var-datasource=eqiad%2520prometheus%252Fops [16:07:59] cmjohnson: can you replace the SFP-T on asw2-c-eqiad:ge-4/0/37 ? and verify with elukey why the interface is down? I'm about to board a flight [16:08:48] 10Operations, 10cloud-services-team, 10Patch-For-Review: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4272775 (10aborrero) >>! In T196659#4272176, @Andrew wrote: > @aborrero doesn't pinning work if we pin the keystone package and all depend... [16:10:02] elukey: dunno, can be on either side, but the SFP-T would be my fist guess. Logs shows the link flapping many many times before going down [16:10:53] yeah saw it in the logs [16:10:59] I don't see anything weird on the host [16:11:06] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4272798 (10Vgutierrez) @Ottomata also I'm currently reviewing the TLS implementation on Kafka side, so far so good. [16:22:21] RECOVERY - puppet last run on proton1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:29:11] 10Operations, 10Mail, 10Phabricator, 10Release-Engineering-Team: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4272884 (10Aklapper) https://phabricator.wikimedia.org/config/all lists for `phpmailer.smtp-host` the value `mx1001.wikimedia.org;mx2001.wik... [16:35:46] (03CR) 10Volans: [C: 032] debmonitor: enforce LDAP TLS cipher suite [puppet] - 10https://gerrit.wikimedia.org/r/439616 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [16:39:13] 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Someday): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#4272917 (10Jdlrobson) 😋😃 [16:41:09] 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806#4272925 (10Marostegui) The disk finished its rebuilt, but unfortunately has lots of errors and SMART alert too, so we need a new one :( ``` Predictive Failure Count: 1 Last Predictive Failure Event Seq Number: 6... [16:42:03] !log Set disk 32:1 offline on db1065 to get a new one - T196806 [16:42:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:09] T196806: Bad disk on db1065 - https://phabricator.wikimedia.org/T196806 [16:44:07] (03PS5) 1020after4: Configuration for phabricator to use swift storage. [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) [16:44:16] marostegui: the new disk is also faulty? [16:44:23] Yep [16:44:37] :S [16:44:53] It was an used one, so it is not strange [16:45:02] ah, so no 'new' [16:45:08] indeed :) [16:45:10] 10Operations, 10ops-codfw: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4272930 (10Papaul) [16:45:22] (03CR) 1020after4: [C: 031] Configuration for phabricator to use swift storage. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4) [16:45:49] 10Operations, 10ops-eqiad, 10DBA: Bad disk on db1063 - https://phabricator.wikimedia.org/T196804#4272932 (10Marostegui) 05Open>03Resolved All looking good! ``` Drive has flagged a S.M.A.R.T alert : No Drive has flagged a S.M.A.R.T alert : No Drive has flagged a S.M.A.R.T alert : No Drive has flagged a S.... [16:49:12] PROBLEM - Check systemd state on restbase-dev1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:51:38] hello [16:51:57] i wish to make a complaint against tony balloni' [16:53:44] 10Operations, 10SRE-Access-Requests, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4272969 (10Dzahn) a:03Dzahn has been approved in SRE meeting [16:53:55] 10Operations, 10ops-codfw, 10DBA: replace bad disk in db2059 - https://phabricator.wikimedia.org/T196709#4272971 (10Marostegui) 05Open>03Resolved All went good! ``` logicaldrive 1 (3.3 TB, RAID 1+0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK) physicaldrive 1I:1:2 (p... [16:54:08] 10Operations, 10ops-codfw: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4272977 (10Papaul) [16:54:11] RECOVERY - MegaRAID on db1063 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [16:55:11] (03PS1) 10Herron: add thcipriani and hashar to group gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) [16:55:13] (03PS1) 10Herron: add thcipriani to group phabricator-roots [puppet] - 10https://gerrit.wikimedia.org/r/439626 (https://phabricator.wikimedia.org/T196703) [16:58:00] (03PS1) 10Dzahn: admins: add hashar and thcipriani to gerrit-roots [puppet] - 10https://gerrit.wikimedia.org/r/439627 (https://phabricator.wikimedia.org/T196702) [16:58:22] herron: ah :) duplicate, heh [16:58:52] (03Abandoned) 10Dzahn: admins: add hashar and thcipriani to gerrit-roots [puppet] - 10https://gerrit.wikimedia.org/r/439627 (https://phabricator.wikimedia.org/T196702) (owner: 10Dzahn) [16:59:00] whoops! [16:59:13] (03CR) 10Dzahn: [C: 031] add thcipriani and hashar to group gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) (owner: 10Herron) [16:59:33] (03CR) 10Dzahn: [C: 031] add thcipriani to group phabricator-roots [puppet] - 10https://gerrit.wikimedia.org/r/439626 (https://phabricator.wikimedia.org/T196703) (owner: 10Herron) [16:59:57] (03CR) 10Paladox: add thcipriani and hashar to group gerrit-root (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) (owner: 10Herron) [17:00:04] gehel: Your horoscope predicts another unfortunate Wikidata Query Service weekly deploy deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1700). [17:00:17] jouncebot: o/ [17:00:59] !log ganeti2008 reboot for microcode update [17:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:01:55] herron: paladox is right, probably should be removed from "Gerrit-admins" [17:02:01] PROBLEM - Host ganeti2008 is DOWN: PING CRITICAL - Packet loss = 100% [17:02:02] since gerrit-roots is more [17:02:39] ok that works [17:02:51] RECOVERY - Host ganeti2008 is UP: PING OK - Packet loss = 0%, RTA = 36.16 ms [17:03:15] (03PS1) 10Paladox: Gerrit: Cache groups [puppet] - 10https://gerrit.wikimedia.org/r/439628 [17:03:51] RECOVERY - Check systemd state on restbase-dev1006 is OK: OK - running: The system is fully operational [17:05:33] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261509 (10herron) Hi @Jdforrester-WMF this was approved during todays SRE meeting pending manager signoff. Could... [17:05:46] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4273031 (10herron) [17:06:08] (03Abandoned) 10Paladox: Gerrit: Cache groups [puppet] - 10https://gerrit.wikimedia.org/r/439628 (owner: 10Paladox) [17:11:10] (03PS2) 10Herron: add thcipriani and hashar to group gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) [17:11:33] 10Operations, 10cloud-services-team, 10Patch-For-Review: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4273038 (10aborrero) So, @Andrew questions had me wondering what was happening here. So I investigated a bit further, specially because `j... [17:11:49] (03CR) 10Paladox: [C: 031] "😊" [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) (owner: 10Herron) [17:12:22] (03CR) 10Herron: [C: 032] add thcipriani and hashar to group gerrit-root (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) (owner: 10Herron) [17:12:27] (03PS3) 10Herron: add thcipriani and hashar to group gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/439625 (https://phabricator.wikimedia.org/T196702) [17:13:18] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4266006 (10herron) This was approved in the SRE meeting. Moving forward with the patch now. [17:15:33] (03PS2) 10Herron: add thcipriani to group phabricator-roots [puppet] - 10https://gerrit.wikimedia.org/r/439626 (https://phabricator.wikimedia.org/T196703) [17:15:51] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani to phabricator-roots - https://phabricator.wikimedia.org/T196703#4266022 (10herron) This was approved in the SRE meeting. Moving forward with the patch now. [17:16:26] (03CR) 10Herron: [C: 032] add thcipriani to group phabricator-roots [puppet] - 10https://gerrit.wikimedia.org/r/439626 (https://phabricator.wikimedia.org/T196703) (owner: 10Herron) [17:16:55] !log gehel@deploy1001 Started deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater (wdqs1009 only) [17:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:21] !log gehel@deploy1001 Finished deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater (wdqs1009 only) (duration: 00m 26s) [17:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:31] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [17:19:25] elukey: flight delayed a bit, but still on my phone. any luck with that interface? [17:19:36] !log gehel@deploy1001 Started deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater [17:19:40] !log gehel@deploy1001 Finished deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater (duration: 00m 03s) [17:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:19:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:20:09] !log gehel@deploy1001 Started deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater [17:20:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:51] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:22:12] XioNoX: handed over to ottomata, Chris is going to check soon IIUC [17:25:32] !log Phabricator: deploying hotfix (D1067) refs T196840 T196860 T196855 [17:25:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:40] T196860: Ignore refs/changes/**/**/meta - https://phabricator.wikimedia.org/T196860 [17:25:40] D1067: Ignore refs/changes/**/**/meta - https://phabricator.wikimedia.org/D1067 [17:25:41] T196855: Diffusion commits stuck in 'Importing...' status for too long - https://phabricator.wikimedia.org/T196855 [17:25:41] T196840: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840 [17:27:48] !log phabricator: restarting phd for D1067 [17:27:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:30:24] 10Operations, 10ops-codfw, 10Traffic: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560#4273095 (10Papaul) [17:32:14] twentyafterfour i guess when it ignores refs it deletes them? [17:32:16] https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1072&var-port=9104&panelId=2&fullscreen&from=now-7d&to=now [17:32:21] deletes look to have gone up [17:33:48] !log gehel@deploy1001 Finished deploy [wdqs/wdqs@37f6f32]: new version of wdqs GUI and updater (duration: 13m 38s) [17:33:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:34:53] SMalyshev: ^deploy completed, intermittent failures on the orderby and paris checks [17:35:18] gehel: what kind of failures? timeouts or something else? [17:35:32] SMalyshev: checking right now... [17:35:43] but looks like timeout [17:36:33] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4273119 (10herron) 05Open>03Resolved Access granted! ``` cobalt:~$ id hashar uid=1010(hashar) gid=500(wikidev) groups=500(w... [17:38:34] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4266006 (10thcipriani) Awesome, thanks @Dzahn, access looks good to me! [17:39:54] SMalyshev: yep, timeout [17:40:13] gehel: ok, I guess we need to check that the queries won't be too complex for testing [17:40:25] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani to phabricator-roots - https://phabricator.wikimedia.org/T196703#4273125 (10herron) 05Open>03Resolved a:03herron All set! ``` phab1001:~$ id thcipriani uid=11634(thcipriani) gid=500(wikidev)... [17:42:55] paladox: I don't know [17:43:00] ok [17:44:25] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Add thcipriani and hashar to gerrit-root - https://phabricator.wikimedia.org/T196702#4266006 (10hashar) Works for me as well. Thank you @herron and @Dzahn [17:50:58] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4273176 (10Jdforrester-WMF) >>! In T196566#4273029, @herron wrote: > Hi @Jdforrester-WMF this was approved during... [17:51:06] !log phabricator: rebuilding git parent caches [17:51:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:02] 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4273195 (10mmodell) I'm going to stop phd and attempt to clear out the backlog from the queue (it's a lot of useless updates that we don't need to write to the db ultimately) [17:59:29] !log phabricator: taking phd offline while I clear out the queue backlog (downtime is logged in icinga) see T196840 [17:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:34] T196840: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840 [17:59:51] (03CR) 10Imarlier: [C: 031] Add "memcached-mcrouter" to $wgObjectCaches as default for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436252 (owner: 10Aaron Schulz) [18:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Morning SWAT (Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:08:43] (03PS2) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) [18:08:45] (03PS1) 10Volans: debmonitor: finetune Icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/439640 (https://phabricator.wikimedia.org/T191299) [18:08:47] (03PS1) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) [18:09:48] (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [18:11:33] (03CR) 10Volans: "The failure is because of:" [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [18:15:15] !log convert timeline indices to time-windowed compaction - T196024 [18:15:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:20] T196024: Convert timeline keyspaces (indices) to time-windowed compaction - https://phabricator.wikimedia.org/T196024 [18:16:10] (03CR) 10Volans: [C: 032] "Merging to fix failing checks. If you have any comments let me know and I'll fix them tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/439640 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [18:19:56] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4005170 (10Andrew) I don't mind having to manually fix some puppetmasters, although it would be nice to do them al... [18:20:06] 10Operations, 10cloud-services-team, 10Patch-For-Review: cloud vps: disable system-wide apt pinning for OpenStack jessie hosts - https://phabricator.wikimedia.org/T196659#4273233 (10Andrew) @aborrero, thanks for investigating. I'm sure that that the existing client_pinning file isn't complete, and that maki... [18:26:11] (03PS3) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) [18:26:13] (03PS2) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) [18:26:53] (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [18:39:59] (03PS1) 10Paladox: phabricator: Make phd.taskmasters configurable with hiera [puppet] - 10https://gerrit.wikimedia.org/r/439645 [18:41:36] (03PS2) 10Paladox: phabricator: Make phd.taskmasters configurable with hiera [puppet] - 10https://gerrit.wikimedia.org/r/439645 [18:46:05] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261509 (10Tnegrin) approved [18:49:55] !log pnorman@deploy1001 Started deploy [tilerator/deploy@074d01a] (cleartables): Restore full config [18:49:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:12] !log pnorman@deploy1001 Finished deploy [tilerator/deploy@074d01a] (cleartables): Restore full config (duration: 00m 16s) [18:50:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:52:50] (03CR) 10Bearloga: statistics::discovery: re-enable cron job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [18:56:04] RECOVERY - IPsec on cp2006 is OK: Strongswan OK - 26 ESP OK [18:56:04] RECOVERY - IPsec on cp5007 is OK: Strongswan OK - 44 ESP OK [18:56:04] RECOVERY - IPsec on cp4027 is OK: Strongswan OK - 44 ESP OK [18:56:04] RECOVERY - IPsec on cp2008 is OK: Strongswan OK - 80 ESP OK [18:56:04] RECOVERY - IPsec on cp2025 is OK: Strongswan OK - 26 ESP OK [18:56:14] RECOVERY - Host kafka-jumbo1005 is UP: PING WARNING - Packet loss = 50%, RTA = 3.77 ms [18:56:14] RECOVERY - IPsec on cp4023 is OK: Strongswan OK - 54 ESP OK [18:56:15] RECOVERY - IPsec on cp2020 is OK: Strongswan OK - 80 ESP OK [18:56:15] RECOVERY - IPsec on cp5012 is OK: Strongswan OK - 44 ESP OK [18:56:15] RECOVERY - IPsec on cp5001 is OK: Strongswan OK - 54 ESP OK [18:56:15] RECOVERY - IPsec on cp4032 is OK: Strongswan OK - 44 ESP OK [18:56:15] RECOVERY - IPsec on cp4025 is OK: Strongswan OK - 54 ESP OK [18:56:15] RECOVERY - IPsec on cp4031 is OK: Strongswan OK - 44 ESP OK [18:56:24] RECOVERY - IPsec on cp3049 is OK: Strongswan OK - 54 ESP OK [18:56:24] RECOVERY - IPsec on cp3039 is OK: Strongswan OK - 54 ESP OK [18:56:24] RECOVERY - IPsec on cp5009 is OK: Strongswan OK - 44 ESP OK [18:56:24] RECOVERY - IPsec on cp2007 is OK: Strongswan OK - 68 ESP OK [18:56:24] RECOVERY - IPsec on cp2026 is OK: Strongswan OK - 80 ESP OK [18:56:24] RECOVERY - IPsec on cp4030 is OK: Strongswan OK - 44 ESP OK [18:56:25] RECOVERY - IPsec on cp4022 is OK: Strongswan OK - 54 ESP OK [18:56:25] RECOVERY - IPsec on cp4024 is OK: Strongswan OK - 54 ESP OK [18:56:26] RECOVERY - IPsec on cp4026 is OK: Strongswan OK - 54 ESP OK [18:56:26] RECOVERY - IPsec on cp2012 is OK: Strongswan OK - 26 ESP OK [18:56:27] RECOVERY - IPsec on cp2014 is OK: Strongswan OK - 80 ESP OK [18:56:27] RECOVERY - IPsec on cp2011 is OK: Strongswan OK - 80 ESP OK [18:56:34] RECOVERY - IPsec on cp3030 is OK: Strongswan OK - 44 ESP OK [18:56:34] RECOVERY - IPsec on cp3008 is OK: Strongswan OK - 28 ESP OK [18:56:34] RECOVERY - IPsec on cp3037 is OK: Strongswan OK - 54 ESP OK [18:56:34] RECOVERY - IPsec on cp3007 is OK: Strongswan OK - 28 ESP OK [18:56:34] RECOVERY - IPsec on cp2001 is OK: Strongswan OK - 68 ESP OK [18:56:34] RECOVERY - IPsec on cp2002 is OK: Strongswan OK - 80 ESP OK [18:56:34] RECOVERY - IPsec on cp2004 is OK: Strongswan OK - 68 ESP OK [18:56:35] RECOVERY - IPsec on cp3047 is OK: Strongswan OK - 54 ESP OK [18:56:51] 10Operations, 10Mail, 10Phabricator, 10Release-Engineering-Team: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4272661 (10Reedy) >>! In T196916#4272671, @herron wrote: > What does the phabricator outbound mail config look like today? > > Do we alrea... [18:56:54] RECOVERY - IPsec on cp4021 is OK: Strongswan OK - 54 ESP OK [18:56:54] RECOVERY - IPsec on cp3010 is OK: Strongswan OK - 28 ESP OK [18:56:54] RECOVERY - IPsec on cp3044 is OK: Strongswan OK - 54 ESP OK [18:56:54] RECOVERY - IPsec on cp3036 is OK: Strongswan OK - 54 ESP OK [18:56:55] RECOVERY - IPsec on cp5003 is OK: Strongswan OK - 54 ESP OK [18:56:55] RECOVERY - IPsec on cp2018 is OK: Strongswan OK - 26 ESP OK [18:56:55] RECOVERY - IPsec on cp5010 is OK: Strongswan OK - 44 ESP OK [18:56:55] RECOVERY - IPsec on cp5004 is OK: Strongswan OK - 54 ESP OK [18:56:56] RECOVERY - IPsec on cp2022 is OK: Strongswan OK - 80 ESP OK [18:56:56] RECOVERY - IPsec on cp3048 is OK: Strongswan OK - 54 ESP OK [18:56:57] RECOVERY - IPsec on cp3035 is OK: Strongswan OK - 54 ESP OK [18:56:57] RECOVERY - IPsec on cp3045 is OK: Strongswan OK - 54 ESP OK [18:56:58] RECOVERY - IPsec on cp3043 is OK: Strongswan OK - 44 ESP OK [18:56:58] RECOVERY - IPsec on cp5002 is OK: Strongswan OK - 54 ESP OK [18:57:04] RECOVERY - IPsec on cp3034 is OK: Strongswan OK - 54 ESP OK [18:57:04] RECOVERY - IPsec on cp3033 is OK: Strongswan OK - 44 ESP OK [18:57:04] RECOVERY - IPsec on cp3041 is OK: Strongswan OK - 44 ESP OK [18:57:04] RECOVERY - IPsec on cp4028 is OK: Strongswan OK - 44 ESP OK [18:57:05] RECOVERY - IPsec on cp2005 is OK: Strongswan OK - 80 ESP OK [19:01:15] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1005 is CRITICAL: 3.476e+07 ge 5e+06 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1005 [19:01:39] ottomata: ^ [19:04:34] (03PS1) 10Imarlier: Remove /xhprof from performance.wikimedia.org apache config [puppet] - 10https://gerrit.wikimedia.org/r/439647 (https://phabricator.wikimedia.org/T196406) [19:08:19] chasemp: aye [19:08:29] just got a NIC fixed [19:08:33] it is catching back up now [19:14:08] (03CR) 10Ottomata: statistics::discovery: re-enable cron job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [19:14:25] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received [19:14:44] (03PS1) 10Imarlier: Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) [19:14:46] (03PS4) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) [19:14:48] (03PS3) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) [19:15:19] (03CR) 10jerkins-bot: [V: 04-1] Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier) [19:15:38] (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [19:16:35] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [19:20:34] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1003 is OK: (C)10 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003 [19:20:55] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1002 is OK: (C)10 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1002 [19:23:45] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1001 is OK: (C)10 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1001 [19:26:55] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1006 is OK: (C)10 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1006 [19:29:31] !log aaron@deploy1001 Synchronized php-1.32.0-wmf.7/includes/libs/rdbms/ChronologyProtector.php: 11e596776f940 - add some logging details (duration: 00m 53s) [19:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:13] (03PS2) 10Imarlier: Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) [19:31:47] (03CR) 10jerkins-bot: [V: 04-1] Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier) [19:35:51] (03CR) 10Reedy: "require_package ?" [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier) [19:43:25] 10Operations, 10netops: Rack/setup cr2-eqdfw - https://phabricator.wikimedia.org/T196941#4273504 (10Papaul) p:05Triage>03Normal [19:43:33] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4273516 (10Jgreen) a:05Jgreen>03None [19:43:36] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frdata1001 - https://phabricator.wikimedia.org/T187364#4273522 (10Jgreen) a:05Jgreen>03None [19:44:54] 10Operations, 10fundraising-tech-ops, 10netops: adjust NAT mapping for frdata.wikimedia.org - https://phabricator.wikimedia.org/T196656#4273538 (10Jgreen) [19:44:57] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frdata1001 - https://phabricator.wikimedia.org/T187364#3973195 (10Jgreen) [19:46:23] (03CR) 10Alex Monk: "yep" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [19:51:04] RECOVERY - Kafka Broker Replica Max Lag on kafka-jumbo1005 is OK: (C)5e+06 ge (W)1e+06 ge 9.743e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1005 [19:51:38] 10Operations, 10Wikimedia-Incident: Add email queueing/failover to services currently using mail_smarthost[0] - https://phabricator.wikimedia.org/T196920#4273568 (10herron) [19:52:08] 10Operations, 10Mail, 10Phabricator, 10Release-Engineering-Team, 10Wikimedia-Incident: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916#4273569 (10herron) [19:52:47] (03PS3) 10Imarlier: Need to install mongodb on xhgui machines [puppet] - 10https://gerrit.wikimedia.org/r/439648 (https://phabricator.wikimedia.org/T158837) [19:53:05] PROBLEM - Varnishkafka Delivery Errors per minute on cp3032 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1 [19:56:28] !log bouncing varnishkafka on cp3032 [19:56:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:34] RECOVERY - Varnishkafka Delivery Errors per minute on cp3032 is OK: OK: Less than 80.00% above the threshold [0.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1 [19:57:35] PROBLEM - nutcracker process on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:57:35] PROBLEM - dhclient process on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:57:45] PROBLEM - MD RAID on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:57:54] PROBLEM - Check systemd state on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:57:54] PROBLEM - Check whether ferm is active by checking the default input chain on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:58:07] (03CR) 10Krinkle: [C: 031] Remove /xhprof from performance.wikimedia.org apache config [puppet] - 10https://gerrit.wikimedia.org/r/439647 (https://phabricator.wikimedia.org/T196406) (owner: 10Imarlier) [19:58:14] PROBLEM - Disk space on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:58:14] PROBLEM - HHVM processes on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:58:15] (03PS1) 10Ottomata: Switch evenstreams to main kafka in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/439653 (https://phabricator.wikimedia.org/T185225) [19:58:24] PROBLEM - nutcracker port on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:58:25] PROBLEM - DPKG on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:58:25] PROBLEM - mcrouter process on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:58:34] PROBLEM - Check size of conntrack table on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:58:35] PROBLEM - configured eth on mw1230 is CRITICAL: Return code of 255 is out of bounds [19:59:18] (03CR) 10Ottomata: [C: 032] Switch evenstreams to main kafka in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/439653 (https://phabricator.wikimedia.org/T185225) (owner: 10Ottomata) [19:59:24] PROBLEM - Varnishkafka Delivery Errors per minute on cp3047 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1 [19:59:38] hm [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor I � Unicode. All rise for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T2000). [20:00:13] not sure exactly, but i'm going to bounce the few vks with this proble [20:00:15] m [20:00:17] its only a few of them [20:01:33] nothing for mobileapps today [20:01:35] RECOVERY - Varnishkafka Delivery Errors per minute on cp3047 is OK: OK: Less than 80.00% above the threshold [0.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1 [20:01:55] PROBLEM - Varnishkafka Delivery Errors per minute on cp3039 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1 [20:02:20] !log bouncing varnishkafka-webrequest on cp3039,cp3047,cp2007,cp3010 [20:02:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:04] RECOVERY - Device not healthy -SMART- on db1065 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1065&var-datasource=eqiad%2520prometheus%252Fops [20:04:14] RECOVERY - Varnishkafka Delivery Errors per minute on cp3039 is OK: OK: Less than 80.00% above the threshold [0.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1 [20:05:25] (03PS5) 10Volans: debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) [20:05:27] (03PS4) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) [20:06:16] (03CR) 10jerkins-bot: [V: 04-1] debmonitor: client side setup [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [20:06:45] RECOVERY - MegaRAID on db1065 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [20:09:33] 10Operations, 10ops-codfw, 10Traffic, 10netops: switch port configuration for lvs200[7-10] - https://phabricator.wikimedia.org/T196946#4273636 (10Papaul) p:05Triage>03Normal [20:17:47] !log otto@deploy1001 Started deploy [eventstreams/deploy@6b013f9]: Enable composite stream and timestamp since param - T196009 , T187418 [20:17:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:52] T196009: Support timestamp based consumption in KafkaSSE and EventStreams - https://phabricator.wikimedia.org/T196009 [20:17:53] T187418: Enable multiple topics in EventStreams URL - https://phabricator.wikimedia.org/T187418 [20:19:53] (03CR) 10Volans: "Quick compiler results available here:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439580 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [20:20:52] Looks like awight won't be around for the window today and I'm not prepared. ORES will have something for tomorrow's window [20:21:05] Bosnian, Basque, and Serbian -- Oh my! [20:21:22] You get an AI, and you get an AI. EVERYONE GETS AN AI. [20:25:36] then the AI gets you [20:25:48] in soviet wmf... [20:27:38] !log otto@deploy1001 Finished deploy [eventstreams/deploy@6b013f9]: Enable composite stream and timestamp since param - T196009 , T187418 (duration: 09m 52s) [20:27:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:44] T196009: Support timestamp based consumption in KafkaSSE and EventStreams - https://phabricator.wikimedia.org/T196009 [20:27:44] T187418: Enable multiple topics in EventStreams URL - https://phabricator.wikimedia.org/T187418 [20:28:05] (03PS5) 10Volans: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) [20:29:03] (03CR) 10Andrew Bogott: [C: 04-1] "A few comments inline." (032 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/437164 (https://phabricator.wikimedia.org/T148872) (owner: 10Nehajha) [20:29:44] 10Operations, 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Move deployment-prep redis instances to stretch - https://phabricator.wikimedia.org/T179371#4273745 (10Krenair) Alright. Leaving open pending deletion of the old redis hosts in a few weeks... [20:32:25] !log arlolra@deploy1001 Started deploy [parsoid/deploy@97cdab8]: Updating Parsoid to 06b74d2 [20:32:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:38] (03CR) 10Volans: "Quick compiler results:" [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans) [20:39:15] PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:49:34] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@97cdab8]: Updating Parsoid to 06b74d2 (duration: 17m 09s) [20:49:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:51:04] (03CR) 10Alex Monk: "So it turns out this line has been unused since I9c39889a" [puppet] - 10https://gerrit.wikimedia.org/r/436431 (https://phabricator.wikimedia.org/T184244) (owner: 10Alex Monk) [20:57:56] !log Updated Parsoid to 06b74d2 (T191843) [20:58:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:58:02] T191843: Cannot read property 'push' of undefined - https://phabricator.wikimedia.org/T191843 [21:00:04] bawolff and Reedy: #bothumor I � Unicode. All rise for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T2100). [21:02:35] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200) [21:02:36] (03PS1) 10Ottomata: Set eventstreams max_connections to 25 per varnish instance [puppet] - 10https://gerrit.wikimedia.org/r/439772 (https://phabricator.wikimedia.org/T196553) [21:03:22] (03CR) 10Ottomata: "Ema, is this the right place? I wasn't sure if this should be set here, or on the frontend stream.wikimedia.org instance?" [puppet] - 10https://gerrit.wikimedia.org/r/439772 (https://phabricator.wikimedia.org/T196553) (owner: 10Ottomata) [21:03:45] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [21:05:44] RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational [21:17:14] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 503 (expecting: 200) [21:19:25] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 503 (expecting: 200) [21:20:25] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy [21:20:34] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [21:21:43] (03PS1) 10Alex Monk: Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774 [21:22:18] (03CR) 10jerkins-bot: [V: 04-1] Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk) [21:23:19] (03PS2) 10Alex Monk: Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774 [21:23:55] (03CR) 10jerkins-bot: [V: 04-1] Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk) [21:25:09] (03PS3) 10Bearloga: statistics::discovery: re-enable cron job [puppet] - 10https://gerrit.wikimedia.org/r/438125 (https://phabricator.wikimedia.org/T170494) [21:26:04] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200) [21:27:14] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [21:27:14] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200) [21:28:24] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy [21:30:22] 10Operations, 10Mail, 10Patch-For-Review: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361#4273978 (10Krenair) Related change to your standard::mail::sender changes above: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/439774/ >>! In T175361#4137331, @herron wrote: > Also... [21:32:24] Reedy: Is the security window quiescent? I was going to do a minor ORES update. [21:32:29] _joe_: hey. Can you give a quick sanity check to https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/436252/ ? [21:34:42] <_joe_> AaronSchulz: this will use mcrouter but still not use prefixes for setting/deleting keys, right? [21:35:14] <_joe_> because I think we need to change the routing handler for broadcast sets soon-ish [21:35:39] (03CR) 10Alex Monk: "modules/standard/manifests/mail/sender.pp:2 wmf-style: Found hiera call in class 'standard::mail::sender' for 'standard::mail::sender::rou" [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk) [21:38:02] _joe_: hmm, I can add the mcrouterAware flag there [21:38:03] (03CR) 10Alex Monk: "Going after that one in I19a28579" [puppet] - 10https://gerrit.wikimedia.org/r/436431 (https://phabricator.wikimedia.org/T184244) (owner: 10Alex Monk) [21:38:19] <_joe_> AaronSchulz: no, don't for now [21:38:28] <_joe_> we can do it in a second pass [21:38:33] right [21:39:02] (03PS2) 10Alex Monk: Followup If545182a: Actually use cert_name now [puppet] - 10https://gerrit.wikimedia.org/r/439451 (https://phabricator.wikimedia.org/T184244) [21:39:16] (03CR) 10Giuseppe Lavagetto: [C: 031] Add "memcached-mcrouter" to $wgObjectCaches as default for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436252 (owner: 10Aaron Schulz) [21:43:17] !log awight@deploy1001 Started deploy [ores/deploy@6ee8775]: ORES: bswiki, euwiki, srwiki models [21:43:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:46] \o/ [21:43:59] Oh wait. that should go to beta first :| [21:44:07] * halfak sends it to beta asap :) [21:44:13] hargh [21:44:18] ok well this is just canary for now [21:44:18] ty [21:44:36] Should have specified that my tests were local :) [21:44:36] Mostly, I'm testing scap :) [21:44:47] naw it was obvious, my fault [21:45:05] midnight deployments might turn out to not be my thing. [21:46:36] Always midnight somewhere [21:47:35] hehe [21:47:46] halfak: ok we're live on ores1001, looking at the machine now [21:48:17] kk. Just about to go to beta [21:48:25] * halfak crosses fingers and toes [21:50:43] \o/ LFS success [21:50:48] happy scappy [21:51:34] workers are healthy. [21:52:08] halfak: If beta is good, I'll put this on the rest of the cluster. [21:52:41] still waiting on https://ores-beta.wmflabs.org/ [21:52:47] Aha! Alive [21:53:09] https://ores-beta.wmflabs.org/v3/scores/euwiki/345678 [21:53:13] It's ALIVE [21:53:18] OK awight. Looks good [21:53:44] +1, I ran through the 3 new models and they're all functional [21:53:46] kk continuing [21:56:29] yay for git-lfs [22:01:34] PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.53 seconds [22:01:44] PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 317.54 seconds [22:17:15] !log awight@deploy1001 Finished deploy [ores/deploy@6ee8775]: ORES: bswiki, euwiki, srwiki models (duration: 33m 58s) [22:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:21:32] awight, confirmed that all looks good [22:21:51] +1, thanks! [22:37:16] (03CR) 10Paladox: "This is being done next monday :)" [puppet] - 10https://gerrit.wikimedia.org/r/439444 (https://phabricator.wikimedia.org/T196812) (owner: 10Paladox) [22:48:54] (03PS4) 10Paladox: Add gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835) [22:49:03] (03PS5) 10Paladox: Add gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835) [22:53:24] PROBLEM - Long running screen/tmux on mw1230 is CRITICAL: Return code of 255 is out of bounds [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180611T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:01:15] 10Operations, 10ops-codfw, 10Traffic, 10netops: switch port configuration for lvs200[7-10] - https://phabricator.wikimedia.org/T196946#4274128 (10Papaul) a:05Papaul>03None [23:01:23] (03PS1) 10Paladox: Gerrit: Add support for adding additional domains to alias in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 [23:02:30] !log phabricator: restarting apache2 on phab1001 to free up apache workers [23:02:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:03:27] ah [23:03:27] https://phabricator.wikimedia.org/diffusion/ [23:03:29] finally [23:03:31] twentyafterfour ^^ [23:03:44] it's showing refs as in the full name of the commit now [23:04:11] it parsed https://phabricator.wikimedia.org/rGERRITDEPLOY94e8165abf7907965d9133d469c08054ad4a15d3 at least really quick just did that one [23:05:40] i hadn't put my SWAT patch up but will now and can deploy it, should i wait for phab work to be done? [23:06:20] ebernhardson is it a gerrit patch or a phabricator patch? [23:06:30] (03PS2) 10EBernhardson: Promote MLR models from AB test to prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430797 (https://phabricator.wikimedia.org/T187148) [23:06:41] ebernhardson: go ahead, phab work is ongoing and should not interfere with what you are doing [23:06:54] ok perfect, thanks! [23:07:22] (03CR) 10EBernhardson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430797 (https://phabricator.wikimedia.org/T187148) (owner: 10EBernhardson) [23:09:05] (03Merged) 10jenkins-bot: Promote MLR models from AB test to prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430797 (https://phabricator.wikimedia.org/T187148) (owner: 10EBernhardson) [23:09:20] (03CR) 10jenkins-bot: Promote MLR models from AB test to prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430797 (https://phabricator.wikimedia.org/T187148) (owner: 10EBernhardson) [23:10:22] (03PS3) 10EBernhardson: Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180) [23:18:06] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Promote Cirrus MLR models from AB test to prod (duration: 00m 51s) [23:18:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:18:54] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received [23:19:55] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy [23:20:04] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received [23:22:05] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [23:22:30] (03CR) 10EBernhardson: [C: 032] Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson) [23:23:56] (03Merged) 10jenkins-bot: Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson) [23:26:11] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: Tune CirrusSearch slow logging (duration: 00m 48s) [23:26:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:00] (03CR) 10jenkins-bot: Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson) [23:28:16] (03CR) 10Greg Grossmeier: [C: 031] "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [23:31:35] (03PS1) 10Papaul: DNS: Add mgmt and productionn DNS entries for bast2002 [dns] - 10https://gerrit.wikimedia.org/r/439786 (https://phabricator.wikimedia.org/T196665) [23:31:42] (03CR) 10EBernhardson: Lower CirrusSearch delayed job drop to 2 hours (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson) [23:32:13] (03PS2) 10EBernhardson: Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 [23:32:25] (03CR) 10EBernhardson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson) [23:33:23] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4274184 (10Papaul) [23:33:42] (03CR) 10jerkins-bot: [V: 04-1] Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson) [23:34:45] RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 49.30 seconds [23:34:55] RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 30.55 seconds [23:35:48] 10Operations, 10ops-codfw, 10netops: switch port configuration for bast2002 - https://phabricator.wikimedia.org/T196957#4274185 (10Papaul) p:05Triage>03Normal [23:37:48] (03CR) 10Legoktm: [C: 04-1] Gerrit: Add CoC and privacy policy to footer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [23:38:31] (03PS5) 10Paladox: Gerrit: Add CoC and privacy policy to footer [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) [23:38:42] (03CR) 10Paladox: Gerrit: Add CoC and privacy policy to footer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [23:39:02] (03PS6) 10Paladox: Add gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835) [23:39:34] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4274201 (10Papaul) [23:40:52] (03PS3) 10EBernhardson: Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 [23:41:28] 10Operations, 10Beta-Cluster-Infrastructure: Mails through deployment-mx SPF & DKIM fails - https://phabricator.wikimedia.org/T87338#4274206 (10Krenair) Alright so I had some run-ins with Designate while trying to do DKIM (turns out you can't use a 2048 bit RSA key because that puts your public key over a leng... [23:42:02] (03CR) 10EBernhardson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson) [23:43:39] (03Merged) 10jenkins-bot: Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson) [23:45:33] !log ebernhardson@deploy1001 Synchronized wmf-config/CirrusSearch-common.php: SWAT: Lower CirrusSearch delayed job drop timeout (duration: 00m 50s) [23:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:25] (03PS1) 10Alex Monk: exim: Permit DKIM domain to be changed by hiera [puppet] - 10https://gerrit.wikimedia.org/r/439791 (https://phabricator.wikimedia.org/T87338) [23:48:14] (03CR) 10jenkins-bot: Lower CirrusSearch delayed job drop to 2 hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429233 (owner: 10EBernhardson) [23:49:47] 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4274212 (10Jrbranaa) >>! In T187194#4271773, @danstillman wrote: > Not sure what you're planning, but the initial vers... [23:49:49] (03PS1) 10Papaul: DHCP: Add MAC address for bast2002 [puppet] - 10https://gerrit.wikimedia.org/r/439792 (https://phabricator.wikimedia.org/T196665) [23:51:25] (03PS2) 10Alex Monk: exim: Permit DKIM domain to be changed by hiera [puppet] - 10https://gerrit.wikimedia.org/r/439791 (https://phabricator.wikimedia.org/T87338) [23:51:41] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4274215 (10Papaul) [23:52:12] 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4274216 (10Jrbranaa) We're looking to have Audiences->Contributors->Editing be the Code Stewards for this moving forwa... [23:59:25] (03CR) 10Alex Monk: "should be good to go" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436430 (https://phabricator.wikimedia.org/T184244) (owner: 10Alex Monk)