[00:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180123T0000). [00:00:04] No GERRIT patches in the queue for this window AFAICS. [00:02:02] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3912816 (10Platonides) > However, it's operating without any redundancy, in terms of both individual hardware failure, an... [00:11:15] (03CR) 10Hoo man: Fix killing dumpers in Wikidata entity dumpers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/393923 (owner: 10Hoo man) [00:11:19] (03PS3) 10Hoo man: Fix killing dumpers in Wikidata entity dumpers [puppet] - 10https://gerrit.wikimedia.org/r/393923 [00:14:48] (03PS2) 10Brion VIBBER: Update comment avconv -> ffmpeg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403470 [00:16:56] 10Operations, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3919246 (10Bstorm) [00:24:57] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3919263 (10Platonides) [00:33:24] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3919279 (10faidon) >>! In T185319#3919216, @Platonides wrote: >> However, it's operating without any redundancy, in terms... [00:34:15] 10Operations, 10Analytics-Data-Quality, 10Traffic: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3919281 (10Tbayer) Same for November (ulsfo): ``` total_bytes requests 2124819017324015 65380125071 1 row selected (3809.329 second... [00:38:06] 10Operations, 10Analytics-Data-Quality, 10Traffic: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3919287 (10faidon) Seems fairly consistent: LibreNMS has recorded November as 2.45PB. October is incomplete, unfortunately, so we can't compare... [00:52:20] RECOVERY - Check Varnish expiry mailbox lag on cp4022 is OK: OK: expiry mailbox lag is 0 [01:13:28] 10Operations, 10Analytics-Data-Quality, 10Traffic: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3919331 (10Tbayer) >>! In T185350#3918346, @Tbayer wrote: > @ottomata notes that the response_size field should correspond to the "Size of respo... [01:36:14] 10Operations, 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10HHVM: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail - https://phabricator.wikimedia.org/T185024#3919380 (10Krinkle) [01:36:37] 10Operations, 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10HHVM: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail - https://phabricator.wikimedia.org/T185024#3903647 (10Krinkle) p:05Normal>03High This is breaking the HHVM build on Travis. 10Operations, 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10HHVM: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail - https://phabricator.wikimedia.org/T185024#3919404 (10Krinkle) p:05High>03Unbreak! [01:37:30] 10Operations, 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10HHVM: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail - https://phabricator.wikimedia.org/T185024#3903647 (10Legoktm) Let's skip the test on broken HHVM versions? :/ [01:43:08] 10Operations, 10Analytics-Data-Quality, 10Traffic: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3919416 (10Ottomata) Correct! :) [01:48:14] 10Operations, 10Analytics-Data-Quality, 10Traffic: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3919423 (10Tbayer) Regarding 1.: We haven't yet heard back from @Milimetric about what the issue was back then, but after looking at [[https://g... [01:51:06] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3919427 (10Platonides) I was just trying to address the stated concern, not make it a perfect solution (we will probably... [01:56:47] 10Operations, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3919438 (10bd808) [x] Added @Bstorm to WMCS google drive [x] Added @Bstorm to WMCS team calendar [x] Scheduled weekly 1:1 for @Bstorm and @bd808 [02:01:19] 10Operations, 10Internet-Archive, 10Offline-Working-Group: Create backups of Wikimedia content in diverse geographic places - https://phabricator.wikimedia.org/T156544#3919440 (10Pine) Backup improvements in general would be welcome. Perhaps this should be a subtask of a larger "campaign" regarding backups. [02:01:34] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3912816 (10Krenair) Pretty sure MW has supported having multiple destinations for these streams for years now. So you cou... [02:22:37] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.17) (duration: 05m 31s) [02:22:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:32:00] PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 25927384 [02:33:00] RECOVERY - Postgres Replication Lag on maps2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 52568 [02:39:41] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [02:39:42] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [02:52:41] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [02:54:50] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [03:25:30] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 736.01 seconds [03:54:31] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 205.44 seconds [05:12:11] PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS1299/IPv4: Active, AS1299/IPv6: Active [05:22:20] RECOVERY - BGP status on cr2-eqiad is OK: BGP OK - up: 166, down: 0, shutdown: 4 [05:23:50] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0 [05:36:00] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [05:38:30] PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS1299/IPv4: Active, AS1299/IPv6: Active [05:41:30] RECOVERY - BGP status on cr2-eqiad is OK: BGP OK - up: 166, down: 0, shutdown: 4 [05:43:00] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0 [05:43:28] (03CR) 10TerraCodes: [C: 031] Add NS_MAIN to $wgNamespacesWithSubpages for cawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405755 (https://phabricator.wikimedia.org/T185436) (owner: 10Framawiki) [06:06:30] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [06:07:31] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [06:17:10] (03PS1) 10Marostegui: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405835 (https://phabricator.wikimedia.org/T162807) [06:19:43] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405835 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [06:21:09] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405835 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [06:21:21] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405835 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [06:22:31] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 - T162807 (duration: 00m 57s) [06:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:44] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [06:23:38] !log Stop replicaiton in sync db1089 and db1105:3311 - T162807 [06:23:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:30:31] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [06:32:50] PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS1299/IPv6: Active, AS1299/IPv4: Active [06:41:51] RECOVERY - BGP status on cr2-eqiad is OK: BGP OK - up: 168, down: 0, shutdown: 2 [06:47:16] !log Stop replication in sync on db2048 and db1089 - T162807 [06:47:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:30] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [06:50:13] !log restart varnish backend on cp4021, 503s and mailbox lag [06:50:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:51] RECOVERY - Check Varnish expiry mailbox lag on cp4021 is OK: OK: expiry mailbox lag is 0 [06:55:48] upload 503s are down as far as I can see, waiting for the icinga recovery [06:59:38] the main issue seems to be that text is also mildly returning 503s [06:59:39] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&from=now-3h&to=now&var-site=All&var-cache_type=text&var-status_type=5 [07:00:25] elukey: I am checking the DB layer [07:00:29] but seems not a huge deal [07:00:31] just in case [07:00:33] marostegui: thanks! [07:00:44] upload seemed to be varnish related [07:01:45] there was a spike in errors at 6:49 because of the maintenance I was doing…on codfw slaves, which is weird [07:01:51] as nothing should be selecting from there [07:02:10] Not sure why mw2239 was selecting from a codfw slave [07:02:41] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [07:02:44] but the text errors are there before this, so probably not related [07:02:50] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [07:03:35] marostegui: is it possible that the health check page for mw2239 triggered a db call? [07:03:50] ah a slave, weird [07:04:14] the errors for that host started at 6:49 and ended at 6:49 as well [07:04:16] there were only 80 errors [07:04:19] I think it is not related [07:04:25] how dare you cause 80 errors? [07:04:27] :D [07:04:36] haha [07:04:44] I am sure it was not even user facing [07:04:46] it was a codfw slave [07:05:48] and the text 503s seems to be esams related [07:07:53] https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&from=now-1h&to=now&var-datasource=esams%20prometheus%2Fops&var-cache_type=text&var-server=All [07:08:42] there are some failed fetches for esams that are not that heavy, so I am inclined not to play with Varnish anymore and see how it evolves [07:15:38] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405840 [07:17:22] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405840 (owner: 10Marostegui) [07:18:52] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405840 (owner: 10Marostegui) [07:19:02] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405840 (owner: 10Marostegui) [07:20:16] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 - T162807 (duration: 00m 56s) [07:20:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:27] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [07:30:57] (03PS1) 10Marostegui: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405842 [07:33:12] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405842 (owner: 10Marostegui) [07:34:46] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405842 (owner: 10Marostegui) [07:36:05] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1089 (duration: 00m 56s) [07:36:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:37:02] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405842 (owner: 10Marostegui) [08:23:14] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405854 [08:24:59] !log installing gdk-pixbuf security updates [08:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:18] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405854 (owner: 10Marostegui) [08:28:03] 10Operations, 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10HHVM: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail - https://phabricator.wikimedia.org/T185024#3919591 (10MoritzMuehlenhoff) For the HHVM builds on apt.wikimedia.org this has been fixed in 3.18.5+dfsg-1... [08:28:45] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405854 (owner: 10Marostegui) [08:28:56] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405854 (owner: 10Marostegui) [08:30:04] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1089 (duration: 00m 56s) [08:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:00] PROBLEM - Check Varnish expiry mailbox lag on cp4024 is CRITICAL: CRITICAL: expiry mailbox lag is 2133172 [08:43:13] no failed fetches for the moment https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&from=now-3h&to=now&var-datasource=ulsfo%20prometheus%2Fops&var-cache_type=upload&var-server=All [08:59:28] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405857 [09:03:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405857 (owner: 10Marostegui) [09:04:26] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405857 (owner: 10Marostegui) [09:05:43] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1089 (duration: 00m 56s) [09:05:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:01] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405857 (owner: 10Marostegui) [09:36:56] (03PS1) 10Muehlenhoff: Add ops-staff-group to exception list for ops membership consistency check [puppet] - 10https://gerrit.wikimedia.org/r/405859 [09:37:47] (03PS2) 10Muehlenhoff: Add ops-staff-group to exception list for ops membership consistency check [puppet] - 10https://gerrit.wikimedia.org/r/405859 [09:38:21] (03CR) 10Muehlenhoff: [C: 032] Add ops-staff-group to exception list for ops membership consistency check [puppet] - 10https://gerrit.wikimedia.org/r/405859 (owner: 10Muehlenhoff) [09:42:03] (03PS1) 10Muehlenhoff: Record extended MOUs for various researchers [puppet] - 10https://gerrit.wikimedia.org/r/405860 [09:42:52] (03CR) 10Muehlenhoff: [C: 032] Record extended MOUs for various researchers [puppet] - 10https://gerrit.wikimedia.org/r/405860 (owner: 10Muehlenhoff) [10:26:19] 10Operations, 10Citoid, 10VisualEditor, 10Services (watching), 10User-mobrovac: Wiley requests for DOI and some other publishers don't work in production - https://phabricator.wikimedia.org/T165105#3919661 (10The_RedBurn) From Twitter above: * 10.1080/03014223.1975.9517878 (tandfonline.com) The delay for... [10:30:30] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0 [10:30:50] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 35, down: 2, dormant: 0, excluded: 0, unused: 0 [10:31:01] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [10:37:30] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [10:40:30] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0 [10:43:13] !log installing sudo security updates [10:43:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:10] !log installing libx11 security updates [10:56:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:21] (03PS1) 10Muehlenhoff: Add library hint for libxi [puppet] - 10https://gerrit.wikimedia.org/r/405865 [11:05:14] (03PS2) 10Muehlenhoff: Add library hints for libxi and libxtst [puppet] - 10https://gerrit.wikimedia.org/r/405865 [11:14:15] (03CR) 10Muehlenhoff: [C: 032] Add library hints for libxi and libxtst [puppet] - 10https://gerrit.wikimedia.org/r/405865 (owner: 10Muehlenhoff) [11:22:21] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [11:22:50] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [11:23:10] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [11:37:40] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 35, down: 2, dormant: 0, excluded: 0, unused: 0 [11:38:00] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [11:38:20] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0 [11:46:10] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [11:49:41] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [11:50:21] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [12:21:50] PROBLEM - BGP status on cr2-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv4: OpenSent, AS6939/IPv6: Active [12:23:31] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 105 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [12:24:51] RECOVERY - BGP status on cr2-ulsfo is OK: BGP OK - up: 80, down: 4, shutdown: 2 [12:27:50] PROBLEM - BGP status on cr2-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv6: OpenSent, AS6939/IPv4: Connect [12:28:31] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 15 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [12:35:31] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 53 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [12:40:30] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 15 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [12:41:00] RECOVERY - BGP status on cr2-ulsfo is OK: BGP OK - up: 80, down: 4, shutdown: 2 [12:44:01] PROBLEM - BGP status on cr2-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv6: Connect [12:59:39] (03PS1) 10Muehlenhoff: Add library hints for libxrandr and libxfixes [puppet] - 10https://gerrit.wikimedia.org/r/405871 [13:00:46] (03CR) 10Muehlenhoff: [C: 032] Add library hints for libxrandr and libxfixes [puppet] - 10https://gerrit.wikimedia.org/r/405871 (owner: 10Muehlenhoff) [13:03:45] !log installing libxtst, libxfixes, libxrandr, libxi security updates [13:03:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:30] RECOVERY - BGP status on cr2-ulsfo is OK: BGP OK - up: 80, down: 4, shutdown: 2 [13:09:20] PROBLEM - puppet last run on mw2138 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [13:10:40] PROBLEM - BGP status on cr2-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv4: Connect, AS6939/IPv6: Connect [13:16:41] RECOVERY - BGP status on cr2-ulsfo is OK: BGP OK - up: 78, down: 6, shutdown: 2 [13:20:50] PROBLEM - BGP status on cr2-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv6: Connect, AS6939/IPv4: Connect [13:23:40] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 22 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [13:28:40] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 14 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [13:33:00] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:36:00] RECOVERY - BGP status on cr2-ulsfo is OK: BGP OK - up: 80, down: 4, shutdown: 2 [13:38:11] (03PS2) 10Aklapper: Allow discourse-mediawiki.wmflabs.org RSS feed on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404653 (https://phabricator.wikimedia.org/T185087) [13:39:20] RECOVERY - puppet last run on mw2138 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [13:40:00] PROBLEM - BGP status on cr2-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv6: Connect, AS6939/IPv4: OpenConfirm [13:40:11] (03CR) 10Aklapper: "Meh. Well, I still like the idea of allowing the "Latest Topic" RSS feed. (And I fail to update the commit message only to reflect that.)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404653 (https://phabricator.wikimedia.org/T185087) (owner: 10Aklapper) [13:40:40] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 91 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [13:45:40] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 14 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [13:48:48] (03CR) 10Hashar: [C: 031] "Yup that looks good. Those classes are solely invoked for the role::ci::master so we can have the http modules applied at the role level." [puppet] - 10https://gerrit.wikimedia.org/r/403730 (owner: 10Dzahn) [13:49:21] (03CR) 10Hashar: [C: 031] contint: Lower caching length on doc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/403401 (https://phabricator.wikimedia.org/T184255) (owner: 10Legoktm) [13:53:45] 10Operations, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3919836 (10chasemp) [13:58:00] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the European Mid-day SWAT(Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180123T1400). [14:00:04] Lucas_WMDE and Biplab: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:32] o/ [14:01:04] * Lucas_WMDE waves [14:01:26] I have +2ed https://gerrit.wikimedia.org/r/#/c/405728/ already and it got merged [14:01:31] looks all straightforward to me [14:01:56] oh, cool! thanks [14:02:01] (I 2ed it in advances in order to have the patch merged by the time swat starts) [14:02:02] hashar: if you could take a look at the other commit, there were some -1 commits earlier [14:02:10] I can SWAT today [14:02:33] Lucas_WMDE: do you want to deploy your own change? (I can do it, just asking) [14:02:39] (03PS6) 10Hashar: Update the project namespace in Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404148 (https://phabricator.wikimedia.org/T184865) (owner: 10Biplab Anand) [14:02:46] I still don’t have deploy rights, just like yesterday ;) [14:02:55] I’m just a lowly code monkey, no NDA or anything :D [14:03:11] Lucas_WMDE: ah sorry, did not remember the name [14:03:16] (I guess eventually I’ll get around to it) [14:03:21] np, thanks for asking :) [14:10:59] Lucas_WMDE: the patch is at mwdebug1002, please test and let me know if I can deploy [14:12:18] zeljkof: seems to work! [14:12:28] Lucas_WMDE: ok, deploying [14:13:37] !log zfilipin@tin Synchronized php-1.31.0-wmf.17/extensions/WikibaseQualityConstraints/: SWAT: [[gerrit:405728|Add missing DISTINCT to SPARQL query (T184705)]] (duration: 01m 02s) [14:13:40] RECOVERY - BGP status on cr2-ulsfo is OK: BGP OK - up: 84, down: 0, shutdown: 2 [14:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:50] T184705: distinct values constraint lists items several times (days: 1) - https://phabricator.wikimedia.org/T184705 [14:13:55] Lucas_WMDE: deployed, please check and thanks for deploying with #releng ;) [14:14:02] (03CR) 10Zfilipin: "This is scheduled for EU SWAT today, but Biplab Anand was not available in #wikimedia-operations, so it was not deployed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404148 (https://phabricator.wikimedia.org/T184865) (owner: 10Biplab Anand) [14:14:04] and now it works without debug extension as well, thank you :) [14:14:46] skipping 404148 since Biplab is not here [14:14:53] !log EU SWAT finished [14:15:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:39] hashar: https://gerrit.wikimedia.org/r/#/c/404148/ I can see. [14:16:04] zeljkof: ^^^ :) [14:16:21] Jayprakash12345: we can deploy it if you can verify it works :) [14:16:37] Jayprakash12345: oh, do you want to be in charge of 404148? [14:17:01] hashar: ok [14:17:39] !log continuing EU SWAT [14:17:41] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 26 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [14:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:07] Jayprakash12345: I will let you know in a few minutes when 404148 is at mwdebug1002, so you can test there [14:18:07] hashar: Let me know when it is on mwdebud1002 [14:18:23] Jayprakash12345: I am doing SWAT today :) [14:18:57] Zeljkof: Ok [14:19:39] Jayprakash12345: do I need to run scripts after deployment cc hashar [14:19:52] looks like 404148 is changing namespaces [14:19:54] Zeljkof: Yes [14:20:16] Jayprakash12345: please copy/paste commands I need to run in gerrit [14:22:24] zeljkof: php namespaceDupes.php --wiki newiki [14:22:24] ah I see hashar left comments in phab https://phabricator.wikimedia.org/T184865#3919886 [14:22:37] https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#namespaceDupes :} [14:22:40] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 14 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [14:22:52] yeah [14:22:53] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404148 (https://phabricator.wikimedia.org/T184865) (owner: 10Biplab Anand) [14:23:08] so usually what I do is to pull the patch ontin [14:23:16] then on terbium.eqiad.wmnet I would "scap pull" [14:23:23] and run the namespaceDupes.php script there [14:23:26] then sync [14:23:28] then --fix [14:23:38] really, I need to run scap pull or terbium? [14:23:41] but the order probably doesn't matter much [14:23:43] not just the script? [14:23:54] so guess you can just deploy it as usual [14:24:01] ok, will do :) [14:24:09] yeah just do as usual [14:24:14] sorry [14:24:27] (03Merged) 10jenkins-bot: Update the project namespace in Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404148 (https://phabricator.wikimedia.org/T184865) (owner: 10Biplab Anand) [14:25:58] Jayprakash12345_: the commit is at mwdebug1002, can you test there? [14:26:18] zeljkof: mwscript namespaceDupes.php --wiki newiki [14:26:27] but not sure [14:26:48] Jayprakash12345: first, can you test at mwdebug? [14:27:00] (03CR) 10jenkins-bot: Update the project namespace in Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404148 (https://phabricator.wikimedia.org/T184865) (owner: 10Biplab Anand) [14:28:39] Jayprakash12345: the commit is at mwdebug1002, can you test there? [14:29:13] zeljkof: please deploy [14:29:37] zeljkof: everthing is ok. [14:30:00] Jayprakash12345: ok, deploying [14:31:28] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:404148|Update the project namespace in Nepali Wikipedia (T184865)]] (duration: 00m 56s) [14:31:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:40] T184865: Update the project namespace in Nepali Wikipedia - https://phabricator.wikimedia.org/T184865 [14:31:42] Jayprakash12345: it's deployed, running script [14:33:32] (03PS1) 10Marostegui: db-eqiad.php: Unify comments about db1095 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405876 [14:34:30] PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:35:19] zeljkof: anything else for me [14:35:35] Jayprakash12345_: this is the output of the script https://phabricator.wikimedia.org/T184865#3919958 [14:35:36] (03PS2) 10Marostegui: db-eqiad.php: Unify comments about db1095 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405876 [14:35:44] some things were not fixed by the script [14:35:49] you will have to fix them manually [14:36:24] zeljkof: due to be poor internet connectivty. i am being disconnect. [14:36:31] Jayprakash12345_: please check if things are ok [14:36:51] !log EU SWAT finished [14:37:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:36] (03PS1) 10Gehel: Metrics are exposed by Blazegraph directly [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/405878 (https://phabricator.wikimedia.org/T182857) [14:37:38] (03PS1) 10Volans: Migrate the server side to Python3 [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/405879 [14:37:38] zeljkof: Looks good, thanks [14:38:45] zeljkof: Thanks for being here. [14:40:42] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Unify comments about db1095 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405876 (owner: 10Marostegui) [14:41:54] Jayprakash12345_: no problem, it's my job! :) [14:43:13] (03Merged) 10jenkins-bot: db-eqiad.php: Unify comments about db1095 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405876 (owner: 10Marostegui) [14:43:27] (03CR) 10jenkins-bot: db-eqiad.php: Unify comments about db1095 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405876 (owner: 10Marostegui) [14:44:51] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Unify comments about sanitarium masters (duration: 00m 56s) [14:45:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:10] (03CR) 10Muehlenhoff: [C: 031] "Looks fine, one nit." (031 comment) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/405879 (owner: 10Volans) [14:50:20] (03PS1) 10Rush: cloud: add chicocvenancio to shinken [puppet] - 10https://gerrit.wikimedia.org/r/405880 (https://phabricator.wikimedia.org/T185273) [14:56:06] (03CR) 10Rush: [C: 032] cloud: add chicocvenancio to shinken [puppet] - 10https://gerrit.wikimedia.org/r/405880 (https://phabricator.wikimedia.org/T185273) (owner: 10Rush) [15:02:43] (03PS1) 10Rush: Revert "ircecho: Remove support for sysvinit script" [puppet] - 10https://gerrit.wikimedia.org/r/405882 [15:02:53] (03PS2) 10Rush: Revert "ircecho: Remove support for sysvinit script" [puppet] - 10https://gerrit.wikimedia.org/r/405882 [15:03:15] (03CR) 10jerkins-bot: [V: 04-1] Revert "ircecho: Remove support for sysvinit script" [puppet] - 10https://gerrit.wikimedia.org/r/405882 (owner: 10Rush) [15:04:28] RECOVERY - puppet last run on elastic1025 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [15:04:31] (03CR) 10Rush: [V: 032 C: 032] Revert "ircecho: Remove support for sysvinit script" [puppet] - 10https://gerrit.wikimedia.org/r/405882 (owner: 10Rush) [15:05:49] (03PS6) 10Volans: Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 [15:05:51] (03PS2) 10Volans: Backends: add known hosts files backend [software/cumin] - 10https://gerrit.wikimedia.org/r/405719 [15:10:39] (03PS1) 10Rush: cloud: remove errant shinken line in contacts [puppet] - 10https://gerrit.wikimedia.org/r/405884 [15:10:56] (03PS2) 10Rush: cloud: remove errant shinken line in contacts [puppet] - 10https://gerrit.wikimedia.org/r/405884 [15:11:44] (03CR) 10Rush: [C: 032] cloud: remove errant shinken line in contacts [puppet] - 10https://gerrit.wikimedia.org/r/405884 (owner: 10Rush) [15:16:53] (03PS1) 10Rush: cloud: add aborrero to shinken contact groups [puppet] - 10https://gerrit.wikimedia.org/r/405885 (https://phabricator.wikimedia.org/T178807) [15:21:46] 10Operations, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920079 (10chasemp) [15:21:54] (03PS2) 10Rush: cloud: add aborrero to shinken contact groups [puppet] - 10https://gerrit.wikimedia.org/r/405885 (https://phabricator.wikimedia.org/T178807) [15:22:45] (03CR) 10Rush: [C: 032] cloud: add aborrero to shinken contact groups [puppet] - 10https://gerrit.wikimedia.org/r/405885 (https://phabricator.wikimedia.org/T178807) (owner: 10Rush) [15:26:32] (03PS1) 10Rush: cloud: add bstorm to shinken instance [puppet] - 10https://gerrit.wikimedia.org/r/405886 (https://phabricator.wikimedia.org/T185493) [15:28:31] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3920097 (10Gehel) [15:28:34] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Create Prometheus exporter for wdqs-updater - https://phabricator.wikimedia.org/T182773#3920095 (10Gehel) 05Resolved>03Open I'm reopening this. With the current exporter, the metrics go through the following... [15:29:34] (03PS2) 10Rush: cloud: add bstorm to shinken instance [puppet] - 10https://gerrit.wikimedia.org/r/405886 (https://phabricator.wikimedia.org/T185493) [15:29:41] (03PS1) 10Gehel: wdqs: replace prometheus-wdqs-updater-exporter with prometheus-jmx-exporter [puppet] - 10https://gerrit.wikimedia.org/r/405887 (https://phabricator.wikimedia.org/T182773) [15:29:43] (03PS1) 10Gehel: wdqs: remove cleanup code after migrating to prometheus jmx exporter [puppet] - 10https://gerrit.wikimedia.org/r/405888 (https://phabricator.wikimedia.org/T182773) [15:30:09] (03CR) 10jerkins-bot: [V: 04-1] wdqs: replace prometheus-wdqs-updater-exporter with prometheus-jmx-exporter [puppet] - 10https://gerrit.wikimedia.org/r/405887 (https://phabricator.wikimedia.org/T182773) (owner: 10Gehel) [15:30:21] (03CR) 10jerkins-bot: [V: 04-1] wdqs: remove cleanup code after migrating to prometheus jmx exporter [puppet] - 10https://gerrit.wikimedia.org/r/405888 (https://phabricator.wikimedia.org/T182773) (owner: 10Gehel) [15:30:30] (03CR) 10Rush: [C: 032] cloud: add bstorm to shinken instance [puppet] - 10https://gerrit.wikimedia.org/r/405886 (https://phabricator.wikimedia.org/T185493) (owner: 10Rush) [15:32:02] (03PS2) 10Gehel: wdqs: replace prometheus-wdqs-updater-exporter with prometheus-jmx-exporter [puppet] - 10https://gerrit.wikimedia.org/r/405887 (https://phabricator.wikimedia.org/T182773) [15:32:04] (03PS2) 10Gehel: wdqs: remove cleanup code after migrating to prometheus jmx exporter [puppet] - 10https://gerrit.wikimedia.org/r/405888 (https://phabricator.wikimedia.org/T182773) [15:34:26] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920133 (10chasemp) [15:36:12] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3917465 (10chasemp) [15:39:21] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920144 (10chasemp) @bstorm for `Add to cloud-wide root` you can put up a patchset to this file `modules/passwords/templates/root-authorized-keys.erb` in this repo https://gerri... [15:41:11] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920148 (10chasemp) [15:42:51] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3917465 (10chasemp) [15:46:19] (03PS1) 10Ottomata: Add IPv6 to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/405891 (https://phabricator.wikimedia.org/T185262) [15:46:41] (03CR) 10jerkins-bot: [V: 04-1] Add IPv6 to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/405891 (https://phabricator.wikimedia.org/T185262) (owner: 10Ottomata) [15:49:01] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920171 (10Bstorm) [15:53:24] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920174 (10Bstorm) [15:56:12] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920175 (10Bstorm) [15:59:20] (03PS1) 10Elukey: package_builder: add dh-make-golang to the list of req. packages [puppet] - 10https://gerrit.wikimedia.org/r/405892 (https://phabricator.wikimedia.org/T180442) [15:59:54] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920185 (10Bstorm) [16:04:58] (03CR) 10Alexandros Kosiaris: [C: 031] package_builder: add dh-make-golang to the list of req. packages [puppet] - 10https://gerrit.wikimedia.org/r/405892 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [16:05:16] (03PS2) 10Ottomata: Add IPv6 to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/405891 (https://phabricator.wikimedia.org/T185262) [16:05:41] (03CR) 10jerkins-bot: [V: 04-1] Add IPv6 to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/405891 (https://phabricator.wikimedia.org/T185262) (owner: 10Ottomata) [16:06:49] (03PS3) 10Ottomata: Add IPv6 to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/405891 (https://phabricator.wikimedia.org/T185262) [16:07:22] (03CR) 10jerkins-bot: [V: 04-1] Add IPv6 to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/405891 (https://phabricator.wikimedia.org/T185262) (owner: 10Ottomata) [16:07:27] (03PS1) 10Ottomata: 2.2.1 binary release for Hadoop 2.6 [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/405894 (https://phabricator.wikimedia.org/T185581) [16:08:41] (03CR) 10Muehlenhoff: [C: 031] package_builder: add dh-make-golang to the list of req. packages [puppet] - 10https://gerrit.wikimedia.org/r/405892 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [16:08:50] (03PS2) 10Ottomata: 2.2.1 binary release for Hadoop 2.6 [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/405894 (https://phabricator.wikimedia.org/T185581) [16:10:28] (03CR) 10Elukey: [C: 032] package_builder: add dh-make-golang to the list of req. packages [puppet] - 10https://gerrit.wikimedia.org/r/405892 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [16:11:22] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920228 (10chasemp) [16:13:27] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920241 (10chasemp) [16:17:11] (03CR) 10Muehlenhoff: [C: 031] Metrics are exposed by Blazegraph directly [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/405878 (https://phabricator.wikimedia.org/T182857) (owner: 10Gehel) [16:21:27] (03PS1) 10Muehlenhoff: Revoke SSH key for ezachte [puppet] - 10https://gerrit.wikimedia.org/r/405898 [16:21:52] (03CR) 10jerkins-bot: [V: 04-1] Revoke SSH key for ezachte [puppet] - 10https://gerrit.wikimedia.org/r/405898 (owner: 10Muehlenhoff) [16:23:38] (03PS2) 10Muehlenhoff: Revoke SSH key for ezachte [puppet] - 10https://gerrit.wikimedia.org/r/405898 [16:25:36] (03CR) 10Muehlenhoff: [C: 032] Revoke SSH key for ezachte [puppet] - 10https://gerrit.wikimedia.org/r/405898 (owner: 10Muehlenhoff) [16:29:48] (03PS3) 10Ottomata: 2.2.1 binary release for Hadoop 2.6 [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/405894 (https://phabricator.wikimedia.org/T185581) [16:46:18] 10Operations, 10Ops-Access-Requests: Requesting access to stat1004, stat1005, stat1006 for mneisler - https://phabricator.wikimedia.org/T184838#3920283 (10RobH) p:05Triage>03Normal a:05MNeisler>03RobH [16:51:13] (03PS1) 10RobH: adding Megan Neisler to shell users [puppet] - 10https://gerrit.wikimedia.org/r/405900 (https://phabricator.wikimedia.org/T184838) [16:52:16] (03PS2) 10RobH: adding Megan Neisler to shell users [puppet] - 10https://gerrit.wikimedia.org/r/405900 (https://phabricator.wikimedia.org/T184838) [16:52:54] stupid typos. [16:53:02] (03PS3) 10RobH: adding Megan Neisler to shell users [puppet] - 10https://gerrit.wikimedia.org/r/405900 (https://phabricator.wikimedia.org/T184838) [16:54:05] (03PS4) 10RobH: adding Megan Neisler to shell users [puppet] - 10https://gerrit.wikimedia.org/r/405900 (https://phabricator.wikimedia.org/T184838) [16:54:31] (03CR) 10RobH: [C: 032] adding Megan Neisler to shell users [puppet] - 10https://gerrit.wikimedia.org/r/405900 (https://phabricator.wikimedia.org/T184838) (owner: 10RobH) [16:59:20] (03PS1) 10RobH: Adding Megan Neisler to groups [puppet] - 10https://gerrit.wikimedia.org/r/405901 (https://phabricator.wikimedia.org/T184838) [17:00:04] godog, moritzm, and _joe_: I, the Bot under the Fountain, allow thee, The Deployer, to do Puppet SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180123T1700). [17:00:04] No GERRIT patches in the queue for this window AFAICS. [17:01:43] 10Operations, 10Ops-Access-Requests: Requesting access to bast1001, stat1005, stat1006 for risler - https://phabricator.wikimedia.org/T185356#3920314 (10RobH) a:03Ramsey-WMF I've assigned this back to @Ramsey-WMF for feedback. Ops Clinic Notes: This is WMF All Hands week, so there is a very real chance tha... [17:10:58] PROBLEM - Host mc2036 is DOWN: PING CRITICAL - Packet loss = 100% [17:16:37] PROBLEM - IPsec on mc1036 is CRITICAL: Strongswan CRITICAL - ok: 0 not-conn: mc2036_v4 [17:29:59] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Create Prometheus exporter for wdqs-updater - https://phabricator.wikimedia.org/T182773#3920394 (10Gehel) Jolokia agent is not loaded anymore with https://gerrit.wikimedia.org/r/405907. We could also remove all r... [17:34:06] 10Operations, 10ops-codfw, 10netops: rack spare switches in c1-codfw - https://phabricator.wikimedia.org/T185336#3920415 (10Papaul) msw-c1-codfw port information ex4300-spare1-codfw = port 43 ex4300-spare2-codfw = port 44 qfx5100-spare1-codfw = port 45 qfx5100-spare2-codfw =port 46 [17:34:23] 10Operations, 10ops-codfw, 10netops: rack spare switches in c1-codfw - https://phabricator.wikimedia.org/T185336#3920416 (10Papaul) [17:34:27] 10Operations, 10ops-codfw: mc2036 mainboard fuse failure - https://phabricator.wikimedia.org/T185587#3920417 (10RobH) p:05Triage>03Normal [17:35:05] 10Operations, 10ops-codfw, 10netops: rack spare switches in c1-codfw - https://phabricator.wikimedia.org/T185336#3913206 (10Papaul) a:05Papaul>03ayounsi [17:35:48] ACKNOWLEDGEMENT - Host mc2036 is DOWN: PING CRITICAL - Packet loss = 100% rhalsell https://phabricator.wikimedia.org/T185587 [17:37:14] !log mc2036 offline until mainboard fix [17:37:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:22] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920430 (10Bstorm) [17:39:44] (03CR) 10Krinkle: [C: 031] "@mutante @joe Ready to deploy." [puppet] - 10https://gerrit.wikimedia.org/r/402867 (https://phabricator.wikimedia.org/T181413) (owner: 10Imarlier) [17:41:47] (03CR) 10RobH: [C: 032] Adding Megan Neisler to groups [puppet] - 10https://gerrit.wikimedia.org/r/405901 (https://phabricator.wikimedia.org/T184838) (owner: 10RobH) [17:42:38] 10Operations, 10Ops-Access-Requests: Requesting access to stat1004, stat1005, stat1006 for mneisler - https://phabricator.wikimedia.org/T184838#3920439 (10RobH) a:05RobH>03None [17:43:34] 10Operations, 10Ops-Access-Requests: Requesting access to stat1004, stat1005, stat1006 for mneisler - https://phabricator.wikimedia.org/T184838#3898174 (10RobH) 05Open>03Resolved a:03RobH @MNeisler: This access request has been merged live, and will sync out to the affected hosts within 30 minutes. If... [17:47:13] 10Operations, 10ops-codfw, 10DBA, 10hardware-requests: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 - https://phabricator.wikimedia.org/T184090#3920447 (10Papaul) [18:07:14] does anyone know why https://www.wikidata.org/wiki/Special:Versions’s entry for WikibaseQualityConstraints doesn’t link to the commit that was backported earlier today? [18:07:28] (cc zeljkof who did the SWAT) [18:08:59] Hmm, if it was just scap sync-file, maybe that doesn't update the git cache [18:15:44] ^ scap sync-{file,dir} does not sync the git-cache info information (although I think it may update it). [18:16:55] I guess it wouldn't really make sense, since its not actually syncing the whole commit [18:29:54] 10Operations, 10Ops-Access-Requests: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10Bstorm) [18:30:53] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920546 (10Bstorm) [18:33:23] (03CR) 10Dzahn: [C: 032] webperf: Handle oversamples differently than regular samples [puppet] - 10https://gerrit.wikimedia.org/r/402867 (https://phabricator.wikimedia.org/T181413) (owner: 10Imarlier) [18:33:39] (03PS3) 10Dzahn: webperf: Handle oversamples differently than regular samples [puppet] - 10https://gerrit.wikimedia.org/r/402867 (https://phabricator.wikimedia.org/T181413) (owner: 10Imarlier) [18:35:11] (03CR) 10Dzahn: "deployed on hafnium.eqiad.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/402867 (https://phabricator.wikimedia.org/T181413) (owner: 10Imarlier) [18:37:52] (03PS12) 10Dzahn: Gerrit: remove visibility: hidden; from [puppet] - 10https://gerrit.wikimedia.org/r/405368 (https://phabricator.wikimedia.org/T185506) (owner: 10Paladox) [18:41:46] (03PS1) 10Madhuvishy: Add bstorm to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) [18:43:27] (03PS1) 10Madhuvishy: Add user bstorm to group ops [puppet] - 10https://gerrit.wikimedia.org/r/405923 (https://phabricator.wikimedia.org/T185591) [18:44:49] (03CR) 10Dzahn: [C: 032] "it's a small partial revert of https://gerrit.wikimedia.org/r/#/c/402665/29" [puppet] - 10https://gerrit.wikimedia.org/r/405368 (https://phabricator.wikimedia.org/T185506) (owner: 10Paladox) [18:44:53] (03PS13) 10Dzahn: Gerrit: remove visibility: hidden; from [puppet] - 10https://gerrit.wikimedia.org/r/405368 (https://phabricator.wikimedia.org/T185506) (owner: 10Paladox) [18:45:16] 10Operations, 10Ops-Access-Requests, 10Gerrit: Access to create gerrit repos for Addshore - https://phabricator.wikimedia.org/T185594#3920600 (10Addshore) [18:45:34] 10Operations, 10Ops-Access-Requests, 10Gerrit, 10User-Addshore: Access to create gerrit repos for Addshore - https://phabricator.wikimedia.org/T185594#3920612 (10Addshore) [18:46:17] 10Operations, 10Ops-Access-Requests, 10Gerrit, 10User-Addshore: Access to create gerrit repos for Addshore - https://phabricator.wikimedia.org/T185594#3920600 (10Paladox) You need to be added here https://gerrit.wikimedia.org/r/#/admin/groups/119,members [18:47:01] 10Operations, 10Ops-Access-Requests, 10Gerrit, 10User-Addshore: Access to create gerrit repos for Addshore - https://phabricator.wikimedia.org/T185594#3920618 (10Paladox) I doint think this needs #operations or any ops tags as this just needs an admin to approve. [18:47:04] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920620 (10RobH) [18:47:06] mutante thanks :) [18:47:25] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10RobH) [18:47:36] 10Operations, 10Ops-Access-Requests, 10Gerrit, 10Release-Engineering-Team, 10User-Addshore: Access to create gerrit repos for Addshore - https://phabricator.wikimedia.org/T185594#3920622 (10Dzahn) [18:48:01] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10RobH) [18:48:33] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10RobH) [18:49:11] paladox: yw. deployed [18:49:11] 10Operations, 10Gerrit, 10Patch-For-Review, 10Performance: New gerrit login ui is causing performance problems when going through gerrit.wikimedia.org - https://phabricator.wikimedia.org/T185506#3920630 (10Paladox) 05Open>03Resolved a:03Paladox Should be resolved now :). Please reopen if it is not. [18:49:33] :) [18:49:58] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920636 (10RobH) So not everyone who is hired into the ops team is automatically added to the ops group. That particular group addition was NOT reviewed in the ops team m... [18:51:28] 10Operations, 10Ops-Access-Requests, 10Gerrit, 10Release-Engineering-Team, 10User-Addshore: Access to create gerrit repos for Addshore - https://phabricator.wikimedia.org/T185594#3920638 (10Legoktm) I don't think this needs an ops access request, we'd need to add you to https://gerrit.wikimedia.org/r/#/a... [18:52:21] 10Operations, 10Ops-Access-Requests, 10Gerrit, 10Release-Engineering-Team, 10User-Addshore: Access to create gerrit repos for Addshore - https://phabricator.wikimedia.org/T185594#3920600 (10Dzahn) Yea, historically this has been more of a Gerrit admin / releng thing than an ops thing. The people who curr... [18:54:07] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10madhuvishy) @RobH Got it, thank you. I can try and poke one of them to approve async some time over the next week, otherwise we'll wait until the next meeting. [18:55:59] 10Operations, 10Ops-Access-Requests: Requesting access to bast1001, stat1005, stat1006 for risler - https://phabricator.wikimedia.org/T185356#3920656 (10JKatzWMF) Access approved. [19:02:02] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920666 (10Dzahn) [19:24:54] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3917465 (10Dzahn) - subscribed to ops mailing list, invited to private mailing list - added to Phab group "WMF-NDA requests" (https://phabricator.wikimedia.org/project/members... [19:36:09] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920710 (10RobH) I'm guessing we should also append them into the wmf ldap group when we give them shell access? [19:37:23] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920712 (10madhuvishy) @RobH Yup, +1. We are tracking the full list here at T185493 [19:41:43] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920728 (10madhuvishy) [19:44:02] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920735 (10madhuvishy) [19:46:32] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920745 (10madhuvishy) [19:46:34] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to all for bstorm - https://phabricator.wikimedia.org/T185591#3920744 (10madhuvishy) [19:53:23] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Create Prometheus exporter for wdqs-updater - https://phabricator.wikimedia.org/T182773#3920757 (10Smalyshev) I think if somebody needs jolokia for their own reporting, they can add it back through the options. J... [20:01:45] (03CR) 10Legoktm: [C: 031] "Hah. Not sure if this should use the base Wikimedia image?" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/404084 (owner: 10Addshore) [20:02:10] (03PS1) 10Dzahn: admins: reactivate LDAP access for siddharth11 [puppet] - 10https://gerrit.wikimedia.org/r/405932 [20:04:04] 10Operations, 10ops-eqiad, 10netops: rack spare switches in c1-eqiad - https://phabricator.wikimedia.org/T185337#3920805 (10RobH) If these would be more easily racked and attached to serial in a different rack (perhaps in a1 with scs-a1-eqiad) that is also fine, use your best judgement. [20:04:35] (03PS2) 10Dzahn: admins: reactivate LDAP access for siddharth11 [puppet] - 10https://gerrit.wikimedia.org/r/405932 [20:07:25] (03PS3) 10Dzahn: admins: reactivate LDAP access for siddharth11 [puppet] - 10https://gerrit.wikimedia.org/r/405932 [20:07:32] (03CR) 10Smalyshev: [C: 031] "I don't understand half of it, but those part that I do understand are ok, so I'm fine with it" [puppet] - 10https://gerrit.wikimedia.org/r/405887 (https://phabricator.wikimedia.org/T182773) (owner: 10Gehel) [20:08:10] (03CR) 10Dzahn: [C: 032] admins: reactivate LDAP access for siddharth11 [puppet] - 10https://gerrit.wikimedia.org/r/405932 (owner: 10Dzahn) [20:13:31] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3920823 (10Bstorm) [20:13:41] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3920824 (10awight) @akosiaris Take your time, but I wanted to make sure yo... [20:15:55] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Cool idea - I didn't think about windows of course when I wrote the software, I can see how this could be useful. I have a few issues with" (033 comments) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/404084 (owner: 10Addshore) [20:29:28] 10Operations, 10Wikimedia-IRC-RC-Server, 10Patch-For-Review: Replace ircd-ratbox with something newer/maintained - https://phabricator.wikimedia.org/T134271#3920859 (10Peachey88) The patches we have for "only ops can create channels" and "only ops can talk" can both be done in core [[ http://www.inspircd.org... [20:31:05] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to ops group in admin for bstorm - https://phabricator.wikimedia.org/T185591#3920865 (10chasemp) [20:41:07] (03CR) 10Smalyshev: [C: 031] wdqs: cleanup JVM options for blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/388026 (https://phabricator.wikimedia.org/T175919) (owner: 10Gehel) [20:47:29] (03CR) 10Addshore: Add basic Dockerfile to run docker-pkg (032 comments) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/404084 (owner: 10Addshore) [20:49:48] 10Operations, 10MediaWiki-Containers: Homepage for https://docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T179696#3920890 (10Addshore) Even just a redirect to https://docker-registry.wikimedia.org/v2/_catalog would be nice. [21:00:25] 10Operations, 10Gerrit, 10Patch-For-Review, 10Performance: New gerrit login ui is causing performance problems when going through gerrit.wikimedia.org - https://phabricator.wikimedia.org/T185506#3920908 (10Krinkle) 05Resolved>03Open It was created this way by TYPO3 to avoid a flash of unstyled content.... [21:03:02] 10Operations, 10Gerrit, 10Patch-For-Review, 10Performance: New gerrit login ui is causing performance problems when going through gerrit.wikimedia.org - https://phabricator.wikimedia.org/T185506#3920925 (10Paladox) @Krinkle yep but then it affected the whole of gerrit so it made it look slow when in fact i... [21:22:34] Hey there, anyone available to look at a mail flow issue? Mutante is unavailble right now [21:22:54] RobH? [21:23:15] i can attempt but mutante is usually more practiced [21:23:24] whats up? [21:23:40] Josephine got married and changed her name [21:24:00] I changed her address in LDAP and now mail is not flowing to her [21:24:13] old: jgulingan new: jcabanero [21:26:51] Error: 550 Previous (cached) callout verification failure [21:27:46] bryon I see those failures in the mx log as well until about 10 minutes ago, but the most recent messages to jcabanero were relayed on to gmail [21:27:58] yeah [21:28:04] i was about to paste the output into pastebin in phab [21:28:13] it was getting verification failures [21:28:31] byron: so perhaps it is now working? [21:28:42] i'll paste into phab and make it visible to you =] [21:29:37] Ok, testing and inquring with J0 [21:29:43] She got it [21:29:52] oh, then ill stop my paste, i was about to save but meh [21:29:53] works now [21:29:56] Did we just have to wait longer or was there another issue? [21:30:05] i think it just had to sync up [21:30:06] but not sure [21:30:20] failed on 2018-01-23 20:19:47 [21:30:27] then passed on 2018-01-23 21:15:35 [21:30:46] So, I think if no one touched anything on your end, we just needed to wait [21:30:54] I touched absolutely nothing [21:30:58] bryon just needed to wait until the callout cache entry on that address expired [21:30:58] just greped logs [21:31:08] Ok, roger [21:31:09] oh yes [21:31:16] rejected RCPT : Previous (cached) callout verification failure [21:31:24] What is the "callout cache" set to? [21:31:54] not sure, grepping about [21:32:32] its the callout_negative_expire entry, still lookin [21:32:52] iirc it’s the default of 2h [21:33:00] yeah im not finding any overrride [21:33:05] i didnt know what default was though [21:33:32] i wonder how often we have those queries [21:34:26] not often it seems the only ones in the current mainlog are for this user [21:34:47] we could likely shorten it, but i suppose that opens up a potential ddos vector to the system on cache hits [21:34:51] but 2hours is a long time [21:35:10] someone sends email to address that doesnt exist since its a new hire, and then its added, it would have this same issue right? [21:36:07] No, it seems it's only when a name change happens [21:36:19] I changed the email and cn [21:37:18] Does the MX server cache the value of the "cn"? [21:41:54] from the logs it looks like gmail was not set to accept the new address (550-5.1.1 The email account that you tried to reach does not exist) at the time the first message was received by the mx. adding the desired new email address as an alias on the gmail side before updating ldap should help avoid a negative cache entry on the mx [21:42:58] Ok, so make sure the alias works, then change the "cn"? [21:43:12] that should do the trick! [21:43:22] Ok, I'll test. Thanks [22:10:33] (03PS3) 10Dzahn: apache: rm helper_scripts class, mv script to deployment_server [puppet] - 10https://gerrit.wikimedia.org/r/405806 [22:11:24] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9803/" [puppet] - 10https://gerrit.wikimedia.org/r/405806 (owner: 10Dzahn) [22:15:33] (03PS2) 10Dzahn: labs::nfs: move standard includes from site to roles [puppet] - 10https://gerrit.wikimedia.org/r/399704 [22:20:22] (03CR) 10Giuseppe Lavagetto: [C: 031] labs::nfs: move standard includes from site to roles [puppet] - 10https://gerrit.wikimedia.org/r/399704 (owner: 10Dzahn) [22:23:41] 10Operations, 10Ops-Access-Requests: Requesting access to bast1001, stat1005, stat1006 for risler - https://phabricator.wikimedia.org/T185356#3921006 (10Ramsey-WMF) Hi @RobH , to your questions: - As far as access groups, I think **statistics-privatedata-users** works. - the L3 doc says I've signed it. - "Yo... [22:23:49] jouncebot: next [22:23:49] In 1 hour(s) and 36 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180124T0000) [22:26:04] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9805/" [puppet] - 10https://gerrit.wikimedia.org/r/399704 (owner: 10Dzahn) [22:26:23] <_joe_> addshore: I'd use python3-devel fwiw [22:26:58] <_joe_> and I agree on https://phabricator.wikimedia.org/T179696, but I have no time to work on it [22:27:01] <_joe_> :( [22:29:46] 10Operations, 10Ops-Access-Requests: Requesting access to bast1001, stat1005, stat1006 for risler - https://phabricator.wikimedia.org/T185356#3921008 (10RobH) [22:30:07] (03PS2) 10Dzahn: labtest: move firewall/standard includes to roles [puppet] - 10https://gerrit.wikimedia.org/r/404790 [22:31:48] well, thats interesting..... [22:32:00] (03PS1) 10RobH: fixing past patchsets [puppet] - 10https://gerrit.wikimedia.org/r/405980 [22:32:02] i added to patches with ssh-rsa missing from the ssh key entry [22:32:41] =P [22:32:45] 10Operations, 10MediaWiki-Containers: Homepage for https://docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T179696#3921012 (10Joe) Portus seems like an interesting approach to two problems: -authn+authz and UI for a docker registry http://port.us.org/features.html I'll take a harder look as... [22:32:50] i suppose they worked since ssh rsa is default ssh key type? [22:33:20] (03CR) 10RobH: [C: 032] fixing past patchsets [puppet] - 10https://gerrit.wikimedia.org/r/405980 (owner: 10RobH) [22:37:26] (03CR) 10Dzahn: [C: 032] "noop except a resource name: http://puppet-compiler.wmflabs.org/9806/" [puppet] - 10https://gerrit.wikimedia.org/r/404790 (owner: 10Dzahn) [22:37:34] (03PS3) 10Dzahn: labtest: move firewall/standard includes to roles [puppet] - 10https://gerrit.wikimedia.org/r/404790 [22:39:22] (03PS1) 10RobH: adding new shell user Ramsey Isler [puppet] - 10https://gerrit.wikimedia.org/r/405981 (https://phabricator.wikimedia.org/T185356) [22:40:08] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to bast1001, stat1005, stat1006 for risler - https://phabricator.wikimedia.org/T185356#3921023 (10RobH) Yeah I'm not sure how I missed it, but I totally see it now. (Your wikitech user.) I'll create the patchsets and get it all merged. [22:41:35] (03PS1) 10RobH: adding Ramsey Isler to statistics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/405982 (https://phabricator.wikimedia.org/T185356) [22:43:13] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to bast1001, stat1005, stat1006 for risler - https://phabricator.wikimedia.org/T185356#3921037 (10RobH) [22:43:25] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to bast1001, stat1005, stat1006 for risler - https://phabricator.wikimedia.org/T185356#3913933 (10RobH) a:05Ramsey-WMF>03None [22:59:04] (03PS2) 10Giuseppe Lavagetto: contint: Lower caching length on doc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/403401 (https://phabricator.wikimedia.org/T184255) (owner: 10Legoktm) [23:00:01] (03CR) 10Giuseppe Lavagetto: [C: 032] contint: Lower caching length on doc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/403401 (https://phabricator.wikimedia.org/T184255) (owner: 10Legoktm) [23:05:00] (03PS1) 10Dzahn: site: remove duplicate firewall include on some appservers [puppet] - 10https://gerrit.wikimedia.org/r/405984 [23:07:17] (03PS2) 10Dzahn: site: remove duplicate firewall include on some appservers [puppet] - 10https://gerrit.wikimedia.org/r/405984 [23:08:32] (03CR) 10Dzahn: [C: 032] site: remove duplicate firewall include on some appservers [puppet] - 10https://gerrit.wikimedia.org/r/405984 (owner: 10Dzahn) [23:14:08] (03PS1) 10Dzahn: horizon: move standard/firewall include to role [puppet] - 10https://gerrit.wikimedia.org/r/405987 [23:16:29] (03PS5) 10Gergő Tisza: Adding config for WikimediaEvents module for logging behaviour data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404910 (https://phabricator.wikimedia.org/T183869) (owner: 10Groovier1) [23:19:53] (03PS2) 10Dzahn: horizon: move standard/firewall include to role [puppet] - 10https://gerrit.wikimedia.org/r/405987 [23:20:01] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9807/californium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/405987 (owner: 10Dzahn) [23:25:44] (03PS2) 10Zoranzoki21: Stop rewriting m.wikipedia.org and zero.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/404158 (https://phabricator.wikimedia.org/T69015) (owner: 10Mholloway) [23:30:42] (03PS1) 10Dzahn: wmcs::puppetmaster: move standard/firewall include to roles [puppet] - 10https://gerrit.wikimedia.org/r/405990 [23:32:38] (03CR) 10Giuseppe Lavagetto: [C: 031] wmcs::puppetmaster: move standard/firewall include to roles [puppet] - 10https://gerrit.wikimedia.org/r/405990 (owner: 10Dzahn) [23:37:07] (03CR) 10Dzahn: "needs a fake secret in labs/private for compiler: invalid secret wmcspuppetmaster/puppet_pubkey.pem at /srv/jenkins-workspace/puppet-com" [puppet] - 10https://gerrit.wikimedia.org/r/405990 (owner: 10Dzahn) [23:50:43] jouncebot: next [23:50:44] In 0 hour(s) and 9 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180124T0000) [23:50:59] huh, just me. [23:54:52] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to ops group in admin for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10bd808) +1 for the obligatory manager approval [23:57:50] (03PS1) 10Dzahn: add fake keys for wmcspuppetmaster to allow compiling [labs/private] - 10https://gerrit.wikimedia.org/r/406001 [23:59:19] (03PS2) 10Dzahn: add fake keys for wmcspuppetmaster to allow compiling [labs/private] - 10https://gerrit.wikimedia.org/r/406001 [23:59:34] (03CR) 10Dzahn: [V: 032 C: 032] add fake keys for wmcspuppetmaster to allow compiling [labs/private] - 10https://gerrit.wikimedia.org/r/406001 (owner: 10Dzahn)