[00:00:05] RoanKattouw ostriches Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160225T0000). [00:00:40] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [00:01:30] !log ori@tin Synchronized wmf-config/StartProfiler.php: I016e23d81: xhgui: Sample fewer requests (1:100k instead of 1:10k) (duration: 01m 58s) [00:01:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:02:29] thcipriani: when is the scap mtime fix going out? [00:03:23] (03PS1) 10Andrew Bogott: Consolidate labs pdns settings into a hiera dict. [puppet] - 10https://gerrit.wikimedia.org/r/273144 [00:03:45] !log ori@tin Synchronized wmf-config/CommonSettings.php: I4cc836f3ca: Fully-qualify EventLoggingBaseUri (duration: 01m 40s) [00:03:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:03:52] ori: package is up-to-date. Need to make a puppet patch with to update ensure, get it on beta to ensure everything works, then merge. [00:04:21] *puppet patch to update the package 'ensure' [00:05:29] I'll get beta done this evening (as long as deployment-tin isn't broken). Could put puppet patch up for swat tomorrow. [00:10:26] (03PS3) 10Dzahn: wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 [00:10:41] (03PS1) 10Southparkfan: Import Bootstrap files to avoid external CDN usage [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273145 [00:10:45] (03PS1) 10GWicke: REST path escaping [puppet] - 10https://gerrit.wikimedia.org/r/273146 [00:11:21] (03PS2) 10GWicke: REST path escaping [puppet] - 10https://gerrit.wikimedia.org/r/273146 (https://phabricator.wikimedia.org/T127387) [00:11:41] thcipriani: cool, thanks! [00:11:50] (03CR) 10jenkins-bot: [V: 04-1] wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 (owner: 10Dzahn) [00:12:05] (03PS4) 10Dzahn: wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 [00:12:57] (03PS2) 10Andrew Bogott: Consolidate labs pdns settings into a hiera dict. [puppet] - 10https://gerrit.wikimedia.org/r/273144 [00:14:14] (03CR) 10Southparkfan: [C: 04-1] wikistats: crons for db backup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/236238 (owner: 10Dzahn) [00:14:20] (03CR) 10jenkins-bot: [V: 04-1] wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 (owner: 10Dzahn) [00:14:38] (03CR) 10jenkins-bot: [V: 04-1] Consolidate labs pdns settings into a hiera dict. [puppet] - 10https://gerrit.wikimedia.org/r/273144 (owner: 10Andrew Bogott) [00:16:47] (03PS5) 10Dzahn: wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 [00:17:19] 6Operations, 10RESTBase, 6Services, 10Traffic, 3Mobile-Content-Service: Split slash decoding from general percent normalization in Varnish VCL - https://phabricator.wikimedia.org/T127387#2062034 (10GWicke) @bblack: A patch implementing REST path encoding normalization based on a blacklist of delimiters i... [00:17:28] (03PS3) 10Andrew Bogott: Consolidate labs pdns settings into a hiera dict. [puppet] - 10https://gerrit.wikimedia.org/r/273144 [00:17:48] (03PS6) 10Dzahn: wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 [00:19:11] (03CR) 10jenkins-bot: [V: 04-1] wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 (owner: 10Dzahn) [00:19:38] (03PS7) 10Dzahn: wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 [00:19:44] (03PS3) 10GWicke: REST path escaping normalization [puppet] - 10https://gerrit.wikimedia.org/r/273146 (https://phabricator.wikimedia.org/T127387) [00:20:48] (03PS8) 10Dzahn: wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 [00:21:03] (03CR) 10jenkins-bot: [V: 04-1] wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 (owner: 10Dzahn) [00:21:48] (03PS2) 10Ori.livneh: remove gdash.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/272670 (https://phabricator.wikimedia.org/T104365) [00:21:57] (03CR) 10Ori.livneh: [C: 032] remove gdash.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/272670 (https://phabricator.wikimedia.org/T104365) (owner: 10Ori.livneh) [00:22:10] (03CR) 10jenkins-bot: [V: 04-1] wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 (owner: 10Dzahn) [00:24:49] (03PS4) 10Andrew Bogott: Consolidate labs pdns settings into a hiera dict. [puppet] - 10https://gerrit.wikimedia.org/r/273144 [00:26:25] (03PS9) 10Dzahn: wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 [00:27:38] (03CR) 10jenkins-bot: [V: 04-1] wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 (owner: 10Dzahn) [00:29:58] (03PS10) 10Dzahn: wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 [00:31:09] (03CR) 10Dzahn: [C: 032 V: 032] Import Bootstrap files to avoid external CDN usage [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273145 (owner: 10Southparkfan) [00:31:37] (03CR) 10Dzahn: [C: 032] wikistats: crons for db backup [puppet] - 10https://gerrit.wikimedia.org/r/236238 (owner: 10Dzahn) [00:34:12] (03CR) 10Andrew Bogott: "Puppet compiler confirms this is a no-op; labs testing confirms the same for labs instances." [puppet] - 10https://gerrit.wikimedia.org/r/273144 (owner: 10Andrew Bogott) [00:40:10] (03PS1) 10Southparkfan: Set td font-size to 90% [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273151 [00:40:44] (03CR) 10Dzahn: [C: 032 V: 032] Set td font-size to 90% [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273151 (owner: 10Southparkfan) [00:43:32] PROBLEM - logstash process on logstash1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 998 (logstash), command name java, args logstash [00:54:00] (03PS1) 10BBlack: vcl_layers: rewrite_proxy_urls fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273152 [00:54:02] (03PS1) 10BBlack: vcl_layers: move recv_purge [puppet] - 10https://gerrit.wikimedia.org/r/273153 [00:54:04] (03PS1) 10BBlack: vcl_layers: consolidate 3x fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273154 [00:54:06] (03PS1) 10BBlack: vcl_layers: re-arrange fe_ip/xff/xcdis-clear [puppet] - 10https://gerrit.wikimedia.org/r/273155 [00:54:08] (03PS1) 10BBlack: vcl_layers: https+fe layer-conditional fixups [puppet] - 10https://gerrit.wikimedia.org/r/273156 [00:54:10] (03PS1) 10BBlack: vcl_layers: allowed_methods fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273157 [00:54:12] (03PS1) 10BBlack: vcl_layers: beacon should only be frontend [puppet] - 10https://gerrit.wikimedia.org/r/273158 [00:54:14] (03PS1) 10BBlack: vcl_layers: varnishcheck can happen earlier [puppet] - 10https://gerrit.wikimedia.org/r/273159 [00:54:16] (03PS1) 10BBlack: vcl_layers: move xcache up a bit [puppet] - 10https://gerrit.wikimedia.org/r/273160 [00:54:18] (03PS1) 10BBlack: vcl_layers: more layer-conditional redundancy [puppet] - 10https://gerrit.wikimedia.org/r/273161 [00:54:20] (03PS1) 10BBlack: vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 [00:54:22] (03PS1) 10BBlack: vcl_layers: move basic fe-only things to fe [puppet] - 10https://gerrit.wikimedia.org/r/273163 [00:54:24] (03PS1) 10BBlack: vcl_layers: split up vcl_recv itself [puppet] - 10https://gerrit.wikimedia.org/r/273164 [00:54:26] (03PS1) 10BBlack: vcl_layers: split hit|miss|pass|fetch [puppet] - 10https://gerrit.wikimedia.org/r/273165 [00:54:28] (03PS1) 10BBlack: vcl_layers: split deliver|error [puppet] - 10https://gerrit.wikimedia.org/r/273166 [00:57:42] RECOVERY - logstash process on logstash1002 is OK: PROCS OK: 1 process with UID = 998 (logstash), command name java, args logstash [00:57:55] !log Started crashed Logstash process on logstash1002 (systemd doesn't restart authomatically due to T127677) [00:57:56] T127677: Auto generated Logstash unit file has "Restart=no" - https://phabricator.wikimedia.org/T127677 [00:58:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:59:30] 6Operations: Queires in Hue always return an empty result set - https://phabricator.wikimedia.org/T128039#2062100 (10bmansurov) [00:59:32] 6Operations, 10Wikimedia-Logstash: Auto generated Logstash unit file has "Restart=no" - https://phabricator.wikimedia.org/T127677#2062114 (10bd808) Maybe some nice soul from #operations can give me tips on how to make a custom unit file to put in our Puppet config? [00:59:44] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: https+fe layer-conditional fixups [puppet] - 10https://gerrit.wikimedia.org/r/273156 (owner: 10BBlack) [01:00:04] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: allowed_methods fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273157 (owner: 10BBlack) [01:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160225T0100). [01:01:17] (03PS1) 10Dzahn: wikistats: fix date in filename for db dumps [puppet] - 10https://gerrit.wikimedia.org/r/273167 [01:03:09] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: beacon should only be frontend [puppet] - 10https://gerrit.wikimedia.org/r/273158 (owner: 10BBlack) [01:03:23] (03CR) 10Dzahn: [C: 032] wikistats: fix date in filename for db dumps [puppet] - 10https://gerrit.wikimedia.org/r/273167 (owner: 10Dzahn) [01:05:51] (03PS2) 10BBlack: vcl_layers: https+fe layer-conditional fixups [puppet] - 10https://gerrit.wikimedia.org/r/273156 [01:05:53] (03PS2) 10BBlack: vcl_layers: allowed_methods fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273157 [01:05:55] (03PS2) 10BBlack: vcl_layers: beacon should only be frontend [puppet] - 10https://gerrit.wikimedia.org/r/273158 [01:05:57] (03PS2) 10BBlack: vcl_layers: varnishcheck can happen earlier [puppet] - 10https://gerrit.wikimedia.org/r/273159 [01:05:59] (03PS2) 10BBlack: vcl_layers: move xcache up a bit [puppet] - 10https://gerrit.wikimedia.org/r/273160 [01:06:01] (03PS2) 10BBlack: vcl_layers: more layer-conditional redundancy [puppet] - 10https://gerrit.wikimedia.org/r/273161 [01:06:03] (03PS2) 10BBlack: vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 [01:06:05] (03PS2) 10BBlack: vcl_layers: move basic fe-only things to fe [puppet] - 10https://gerrit.wikimedia.org/r/273163 [01:06:07] (03PS2) 10BBlack: vcl_layers: split up vcl_recv itself [puppet] - 10https://gerrit.wikimedia.org/r/273164 [01:06:09] (03PS2) 10BBlack: vcl_layers: split hit|miss|pass|fetch [puppet] - 10https://gerrit.wikimedia.org/r/273165 [01:06:11] (03PS2) 10BBlack: vcl_layers: split deliver|error [puppet] - 10https://gerrit.wikimedia.org/r/273166 [01:06:37] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: varnishcheck can happen earlier [puppet] - 10https://gerrit.wikimedia.org/r/273159 (owner: 10BBlack) [01:07:00] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: move xcache up a bit [puppet] - 10https://gerrit.wikimedia.org/r/273160 (owner: 10BBlack) [01:07:04] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: more layer-conditional redundancy [puppet] - 10https://gerrit.wikimedia.org/r/273161 (owner: 10BBlack) [01:07:15] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 (owner: 10BBlack) [01:08:04] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: move basic fe-only things to fe [puppet] - 10https://gerrit.wikimedia.org/r/273163 (owner: 10BBlack) [01:09:17] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: split up vcl_recv itself [puppet] - 10https://gerrit.wikimedia.org/r/273164 (owner: 10BBlack) [01:13:05] (03CR) 10Dzahn: [V: 032] wikistats: fix date in filename for db dumps [puppet] - 10https://gerrit.wikimedia.org/r/273167 (owner: 10Dzahn) [01:17:07] 6Operations, 6Discovery, 10procurement: Refresh elastic10{01..16}.eqiad.wmnet servers - https://phabricator.wikimedia.org/T128000#2060818 (10Deskana) p:5Triage>3Normal [01:21:56] (03Abandoned) 10BBlack: vcl_recv: single definition in wikimedia.vcl [puppet] - 10https://gerrit.wikimedia.org/r/271990 (https://phabricator.wikimedia.org/T127481) (owner: 10BBlack) [01:22:07] (03Abandoned) 10BBlack: vcl_(hit|miss|pass|deliver): single definition [puppet] - 10https://gerrit.wikimedia.org/r/271991 (https://phabricator.wikimedia.org/T127481) (owner: 10BBlack) [01:25:42] (03PS1) 10Dzahn: wikistats: fix date in mysqldump file name again [puppet] - 10https://gerrit.wikimedia.org/r/273168 [01:36:03] (03PS1) 10Bmansurov: Run the survey at lowered rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273169 (https://phabricator.wikimedia.org/T125946) [01:39:11] (03PS2) 10Bmansurov: Run the survey at lowered rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273169 (https://phabricator.wikimedia.org/T125946) [01:39:13] (03PS3) 10Bmansurov: Run the survey at normal rate to test DNT [mediawiki-config] - 10https://gerrit.wikimedia.org/r/270792 (https://phabricator.wikimedia.org/T125946) [01:59:07] (03CR) 10Dzahn: [C: 032] wikistats: fix date in mysqldump file name again [puppet] - 10https://gerrit.wikimedia.org/r/273168 (owner: 10Dzahn) [02:03:53] (03PS1) 10Dzahn: wikistats: no interactive password for mysqldump [puppet] - 10https://gerrit.wikimedia.org/r/273171 [02:14:24] 6Operations, 10Traffic, 10Wiki-Loves-Monuments-General, 7HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2062278 (10Dzahn) I personally think it's a serious issue. Every user who goes to that site on https gets a fullscreen "wikilovesmonuments.org uses an i... [02:18:41] 6Operations, 10DNS, 10Traffic, 10domains, 13Patch-For-Review: wikipedia.es - https://phabricator.wikimedia.org/T101060#2062292 (10Dzahn) We still have an issue with getting replies on domain related tickets (T105829) etc. [02:22:55] 6Operations, 10DNS, 10Traffic, 10Wikimedia-Site-Requests, and 3 others: Move oldwikisource on www.wikisource.org to mul.wikisource.org - https://phabricator.wikimedia.org/T64717#2062328 (10Dzahn) I don't see why this is "Traffic". but it's declined anyways [02:23:41] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:23:52] (03CR) 10Dzahn: [C: 032] wikistats: no interactive password for mysqldump [puppet] - 10https://gerrit.wikimedia.org/r/273171 (owner: 10Dzahn) [02:24:12] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:52] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [02:27:02] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [02:35:21] RECOVERY - nutcracker process on mw1099 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [02:35:44] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 19m 03s) [02:35:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:35:51] RECOVERY - nutcracker port on mw1099 is OK: TCP OK - 0.000 second response time on port 11212 [02:40:41] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [02:50:41] 6Operations, 10Traffic, 7HTTPS: implement Public Key Pinning (HPKP) for Wikimedia domains - https://phabricator.wikimedia.org/T92002#1101271 (10csteipp) @Bblack, would it be possible to try Public-Key-Pins-Report-Only with a short max-age, just to see how much of an issue the cross-signing really is? I think... [02:50:42] PROBLEM - puppet last run on db2065 is CRITICAL: CRITICAL: puppet fail [03:10:24] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 18m 00s) [03:10:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:17:22] RECOVERY - puppet last run on db2065 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [03:19:30] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Feb 25 03:19:30 UTC 2016 (duration 9m 6s) [03:19:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:26:01] PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: puppet fail [04:33:20] RECOVERY - cassandra-b CQL 10.64.32.195:9042 on restbase1008 is OK: TCP OK - 0.001 second response time on port 9042 [04:35:22] !log restarting restbase1008-a to cancel rebuild T108611 T119935 [04:35:24] T108611: perform initial (manual) repair of Cassandra cluster - https://phabricator.wikimedia.org/T108611 [04:35:24] T119935: Upgrade restbase100[7-9] to match restbase100[1-6] hardware - https://phabricator.wikimedia.org/T119935 [04:35:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:39:41] PROBLEM - cassandra-a CQL 10.64.32.187:9042 on restbase1008 is CRITICAL: Connection refused [04:43:10] RECOVERY - cassandra-a CQL 10.64.32.187:9042 on restbase1008 is OK: TCP OK - 0.004 second response time on port 9042 [04:44:10] !log decommissioning Cassandra on restbase1008-a.eqiad.wmnet T119935 [04:44:11] T119935: Upgrade restbase100[7-9] to match restbase100[1-6] hardware - https://phabricator.wikimedia.org/T119935 [04:44:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:52:32] RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [05:01:59] (03PS1) 10Dzahn: use bootstrap/datatables for display.php as well [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273176 [05:03:22] (03CR) 10Dzahn: [C: 032 V: 032] "http://wikistats.wmflabs.org/display.php?t=wb" [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273176 (owner: 10Dzahn) [05:07:38] (03PS1) 10Dzahn: set new classes for in grand total table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273177 [05:08:21] (03CR) 10Dzahn: [C: 032 V: 032] set new classes for
in grand total table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273177 (owner: 10Dzahn) [06:20:41] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 65.22% of data above the critical threshold [5000000.0] [06:27:20] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [06:28:41] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [06:30:11] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:53] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:00] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:02] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:10] PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:22] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [06:31:22] PROBLEM - puppet last run on eventlog2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:30] PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:01] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:01] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:11] PROBLEM - puppet last run on wtp2015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:42] PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:51] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:52] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:52:26] 6Operations, 10DNS, 10Traffic, 10domains, 13Patch-For-Review: wikipedia.es - https://phabricator.wikimedia.org/T101060#2062564 (10Nemo_bis) >>! In T101060#2062292, @Dzahn wrote: > We still have an issue with getting replies on domain related tickets (T105829) etc. >>! In T101060#1698243, @Dzahn wrote: >... [06:56:21] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:56:30] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:56:30] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:57:11] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:57:12] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:21] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:57:21] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:57:22] RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:57:22] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:41] RECOVERY - puppet last run on eventlog2001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:57:41] RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:31] RECOVERY - puppet last run on wtp2015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:00] RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [07:16:35] 6Operations, 10MediaWiki-Logging, 10Wikimedia-IRC-RC-Server, 10Wikimedia-Stream, and 3 others: Verify that logs, irc, rcstream changes can flow from codfw to eqiad - https://phabricator.wikimedia.org/T126472#2062615 (10Joe) @gehel good job, one small correction: rcs* use a local redis, which is installed... [07:33:28] 6Operations, 10MediaWiki-Logging, 10Wikimedia-IRC-RC-Server, 10Wikimedia-Stream, and 3 others: Verify that logs, irc, rcstream changes can flow from codfw to eqiad - https://phabricator.wikimedia.org/T126472#2062627 (10Joe) I just verified that: # logstash receives correctly messages from codfw # udp2log... [07:33:38] 6Operations, 10MediaWiki-Logging, 10Wikimedia-IRC-RC-Server, 10Wikimedia-Stream, and 3 others: Verify that logs, irc, rcstream changes can flow from codfw to eqiad - https://phabricator.wikimedia.org/T126472#2062628 (10Joe) 5Open>3Resolved p:5Triage>3Normal [07:43:57] (03PS1) 10Giuseppe Lavagetto: varnishkafka: update for logrotate [puppet] - 10https://gerrit.wikimedia.org/r/273187 [07:49:37] (03CR) 10Giuseppe Lavagetto: [C: 032] varnishkafka: update for logrotate [puppet] - 10https://gerrit.wikimedia.org/r/273187 (owner: 10Giuseppe Lavagetto) [07:53:51] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [07:58:41] <_joe_> this ^^ is because no one gives a damn about alerts [07:59:07] <_joe_> so we stayed with the two puppetmasters in a inconsistent state for one night [07:59:12] <_joe_> GRRRRR [08:14:11] 6Operations, 10Wikimedia-Logstash: Auto generated Logstash unit file has "Restart=no" - https://phabricator.wikimedia.org/T127677#2050308 (10MoritzMuehlenhoff) I wrote an systemd unit based on the current init script. It's totally untested, though! https://phabricator.wikimedia.org/P2671 Some notes: - In the... [08:21:07] 6Operations, 10ops-codfw: db2018 failed disk (degraded RAID) - https://phabricator.wikimedia.org/T128057#2062657 (10jcrespo) [08:42:11] ACKNOWLEDGEMENT - RAID on db2018 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Jcrespo https://phabricator.wikimedia.org/T128057 [08:52:51] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [08:53:32] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [08:54:05] 6Operations, 10DNS, 10Traffic, 10Wikimedia-Site-Requests, and 3 others: Move oldwikisource on www.wikisource.org to mul.wikisource.org - https://phabricator.wikimedia.org/T64717#2062666 (10Candalua) >>! In T64717#2062328, @Dzahn wrote: > I don't see why this is "Traffic". but it's declined anyways Why is... [09:19:31] PROBLEM - HTTPS-toolserver on www.toolserver.org is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: connect: Connection timed out [09:20:04] (03PS1) 10Elukey: Remove mc1014 from memcached/redis pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273191 (https://phabricator.wikimedia.org/T123711) [09:21:21] RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate toolserver.org valid until 2016-06-30 17:56:02 +0000 (expires in 126 days) [09:26:20] (03CR) 10Giuseppe Lavagetto: [C: 031] Remove mc1014 from memcached/redis pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273191 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [09:27:00] (03CR) 10Elukey: [C: 032] Remove mc1014 from memcached/redis pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273191 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [09:29:51] !log removed mc1014.eqiad from the redis/memcached pool for maintenance [09:29:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:40:23] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, to be merged on monday perhaps?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273007 (owner: 10Aaron Schulz) [09:45:38] (03CR) 10Ema: [C: 032 V: 032] New WMF version: 4.1.1-1wm2 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/272468 (https://phabricator.wikimedia.org/T124279) (owner: 10Ema) [09:48:49] 6Operations, 10RESTBase, 10hardware-requests: normalize eqiad restbase cluster - replace restbase1001-1006 - https://phabricator.wikimedia.org/T125842#2062696 (10fgiunchedi) @Cmjohnson thanks! seems fine to go with row A to me too, let me know how I can help [09:57:36] !log Stopping Jenkins [09:57:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:58:29] (03CR) 10Ema: "LGTM, we can merge this tomorrow (2016-02-26) as per https://wikitech.wikimedia.org/wiki/Ops_Clinic_Duty#Access_requests" [puppet] - 10https://gerrit.wikimedia.org/r/273038 (https://phabricator.wikimedia.org/T127808) (owner: 10Dzahn) [09:58:59] 6Operations, 13Patch-For-Review, 7audits-data-retention: Broken log rotation for many services (was nginx and varnishkafka on cpXXXX) - https://phabricator.wikimedia.org/T127025#2062729 (10Joe) 5Open>3Resolved [10:03:11] !log starting Jenkins [10:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:05:10] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 606 [10:08:10] 6Operations, 10DBA: db1024 (s2 master) will run out of disk space in ~4 months - https://phabricator.wikimedia.org/T122048#2062745 (10jcrespo) 5Open>3Resolved [10:15:10] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 772 [10:23:31] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 64.00% of data above the critical threshold [5000000.0] [10:25:10] RECOVERY - check_mysql on db1008 is OK: Uptime: 3178014 Threads: 2 Questions: 26347596 Slow queries: 21180 Opens: 7358 Flush tables: 2 Open tables: 406 Queries per second avg: 8.290 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:29:05] (03PS1) 10Jcrespo: Repool db1021 and db1024, both with low/non critical load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273196 [10:34:22] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [10:40:22] (03PS1) 10Giuseppe Lavagetto: nutcracker: do not declare redis_codfw in labs [puppet] - 10https://gerrit.wikimedia.org/r/273197 (https://phabricator.wikimedia.org/T127845) [10:41:31] !log set xff to 0.01 for graphite metrics swift.*.containers (was 0.5) [10:41:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:48:31] PROBLEM - HTTPS on ms1001 is CRITICAL: SSL CRITICAL - Certificate dumps.wikimedia.org valid until 2016-03-26 10:47:38 +0000 (expires in 29 days) [10:48:40] PROBLEM - Host cp2010 is DOWN: PING CRITICAL - Packet loss = 100% [10:51:23] is that scheduled? [10:52:42] cp2010 ? not sure, ema ? [10:52:55] don't think so [10:53:35] I do not see a spike of errors, but it may just be the load balancer [10:53:41] checking [10:53:56] godog: checking [10:54:33] checking via mgmt, seems to be a hardware problem: [10:55:00] the only output I get via the serial console is [10:55:02] [ 0.000000 ] I [10:55:17] so seems to have locked up really early in the boot [10:56:30] maybe reboot, see the backlog? [10:56:45] it could not be any worse [10:57:09] I will let it in control of whoever has the session in use [10:57:17] yeah, will powercycle it [10:57:24] +1 [10:58:44] !log powercycling cp2010 [10:58:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:59:01] PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2010_v4, cp2010_v6 [10:59:02] PROBLEM - IPsec on kafka1013 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp2010_v4, cp2010_v6 [10:59:02] PROBLEM - Host mc1014 is DOWN: PING CRITICAL - Packet loss = 100% [10:59:20] PROBLEM - IPsec on cp1066 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2010_v4, cp2010_v6 [10:59:36] that is normal an expected [10:59:40] all the IPSec issues are related to cp2010, yes [10:59:41] PROBLEM - IPsec on cp1052 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2010_v4, cp2010_v6 [10:59:49] *and [10:59:51] PROBLEM - IPsec on kafka1012 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp2010_v4, cp2010_v6 [10:59:57] and also I hate my laptop keyboard [11:00:01] PROBLEM - IPsec on kafka1020 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp2010_v4, cp2010_v6 [11:00:01] PROBLEM - IPsec on cp1065 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2010_v4, cp2010_v6 [11:00:22] !sal [11:00:22] https://wikitech.wikimedia.org/wiki/Server_Admin_Log https://tools.wmflabs.org/sal/production See it and you will know all you need. [11:00:31] RECOVERY - Host cp2010 is UP: PING OK - Packet loss = 0%, RTA = 36.35 ms [11:01:01] RECOVERY - IPsec on cp1067 is OK: Strongswan OK - 58 ESP OK [11:01:01] RECOVERY - IPsec on kafka1013 is OK: Strongswan OK - 146 ESP OK [11:01:11] RECOVERY - IPsec on cp1066 is OK: Strongswan OK - 58 ESP OK [11:01:23] hashar, it is not caused by an admin action [11:01:33] so do not expect a sal entry [11:01:41] RECOVERY - IPsec on cp1052 is OK: Strongswan OK - 58 ESP OK [11:01:51] RECOVERY - IPsec on kafka1012 is OK: Strongswan OK - 146 ESP OK [11:02:01] RECOVERY - IPsec on kafka1020 is OK: Strongswan OK - 146 ESP OK [11:02:02] RECOVERY - IPsec on cp1065 is OK: Strongswan OK - 58 ESP OK [11:02:10] ^looks good [11:02:39] oh, misunderstood hashar, my apologies [11:03:33] jynus: ho I invoked sal for a different reason [11:03:54] yes, saw that, too late, I misundestood !sal with sal! [11:03:57] (03PS1) 10Filippo Giunchedi: codfw: add statsd service entry [dns] - 10https://gerrit.wikimedia.org/r/273199 (https://phabricator.wikimedia.org/T127976) [11:03:58] :-) [11:04:08] annnnarhrhrhrhrhr left over cherry pick / undeployed grbmblblb [11:04:56] I was about to make use of it, do you want me to check it? [11:05:41] na it is on the mediawiki wmf branches [11:05:59] https://gerrit.wikimedia.org/r/#/q/I17386b97e229b492723b46db1e1ae16fd4b0fc5a,n,z "SessionBackend: skip isUserSessionPrevented check for anons" [11:06:56] ok, I was only going to do filesync a db-eqiad.php, but will wait for your green light just in case [11:07:50] (03CR) 10Jcrespo: [C: 031] Repool db1021 and db1024, both with low/non critical load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273196 (owner: 10Jcrespo) [11:07:53] jynus: if you just sync-file the files you need it should be fine [11:08:04] will dig in code to look at the status [11:08:12] and if it has not been deployed will revert [11:08:24] ok, then. Yes, that is the safest option. [11:08:28] I have some issues getting DHCP working for mc1014 after PXE boot.. anybody that can help? [11:08:46] elukey, does it timeout? [11:09:02] or you mean on the installer itself? [11:09:20] installer itself, it fails when using DHCP [11:09:33] is it trying the right ethernet card? [11:09:56] I tried both, one fails the PXE boot the other one fails at the DHCP step [11:10:09] maybe I can check the mac addresses [11:10:24] against puppet/modules/install_server/files/dhcpd/linux-host-entries ? [11:11:12] also make sure it is applied on carbon correctly [11:11:48] ah yes right, good suggestion [11:11:51] RECOVERY - Host mc1014 is UP: PING OK - Packet loss = 0%, RTA = 2.43 ms [11:13:18] yeah that's a pitfall I keep falling into heh [11:13:27] thanks btw for cp2010 [11:15:00] <_joe_> elukey: grep the carbon logs too [11:15:09] <_joe_> for the mac address [11:15:20] jynus: mmm DHCP request looks good, the mac address is the one in linux-host-entries and there is a DHCPACK [11:15:24] <_joe_> if it's not there, then maybe you had the wrong one [11:15:33] <_joe_> ugh [11:16:38] if the installed did not go through, maybe restart it and check it works from the previous os to discard hardware issues? [11:18:07] already done, everything boots correctly again with Ubuntu [11:18:49] (03CR) 10Jcrespo: [C: 032] Repool db1021 and db1024, both with low/non critical load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273196 (owner: 10Jcrespo) [11:19:02] either I am stupid or the host is joking with me (or both) [11:23:04] did you try dhcp from there ( I know that will break the host, but you are reimaging anyway) [11:23:44] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1021 and db1024 (duration: 01m 45s) [11:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:24:43] jynus I tried dhclient eth0 and it worked, powercycled, pinged palladium, everything's fine [11:25:55] then what I would do is go to the installer, go to the shell, and test again from there [11:26:04] also enable the logs [11:26:17] so we can spot the error [11:26:37] I'll follow your suggestion thanks :) [11:26:47] (I think they are enabled by default, but saving them :-)) [11:27:33] maybe there is a kernel module having problems, etc. [11:28:00] check what is different with the other nodes [11:29:07] (03PS1) 10Giuseppe Lavagetto: apache-fast-test: fix pybal url, add codfw and options [puppet] - 10https://gerrit.wikimedia.org/r/273200 [11:29:43] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: do not declare redis_codfw in labs [puppet] - 10https://gerrit.wikimedia.org/r/273197 (https://phabricator.wikimedia.org/T127845) (owner: 10Giuseppe Lavagetto) [11:32:21] while checking the logs, I saw a small, but noticible amount of "No such file or directory in /srv/mediawiki/php-1.27.0-wmf.13/includes/filebackend/FileBackendStore.php" [11:33:19] !log depool ms-fe1004 for trusty dist-upgrade [11:33:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:41:19] so looking into carbon's complete syslog the DHCP works before jessie-installer but not after, I don't see the DHCP REQUEST landing to carbon [11:43:42] 6Operations, 7Puppet, 10Beta-Cluster-Infrastructure, 6Release-Engineering-Team, 13Patch-For-Review: deployment-tin puppet Error 400 on SERVER: Failed to parse template nutcracker/nutcracker.yml.erb - https://phabricator.wikimedia.org/T127845#2062919 (10Joe) 5Open>3Resolved [11:47:32] !log Reverting session manager cherry picks from wmf branches ( https://gerrit.wikimedia.org/r/#/c/273201/ and https://gerrit.wikimedia.org/r/#/c/273202/ ) they have not been deployed after they got merged [11:47:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:50:56] elukey: is it working now? I saw the dhcp requests on carbon [11:51:44] godog: I ran dhclient manually on ubuntu as a test (again), it is super weird that after the PXE logs of jessie-installer I don't see anything [11:55:06] indeed, no news from tftp [11:57:09] retrying again the PXE boot [12:05:10] yes confirmed: the first DHCP request before atftpd works fine, but then nothing [12:05:34] so it is either the installer or some other config that doesn't work (maybe?_ [12:08:26] elukey: I can take a look on the console [12:12:14] godog: all yours, you will be promted in the debian installer [12:12:16] thanks! [12:24:11] (03Abandoned) 10Ladsgroup: Move ORES settings to beta features part [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272526 (owner: 10Ladsgroup) [12:24:44] elukey: takes forever to reboot heh [12:25:09] godog: ah yes it is a bit slow :P [12:36:23] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:36:32] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:38:10] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [12:38:21] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [12:39:52] so there's a mariadb slave lag alert in icinga for dbstore1002 [12:40:19] I've been discussing it for a bit with jynus, maybe someone here has an idea for how to go about it [12:41:00] paravoid: hey, around? I've a question regarding ORES moving to prod cluster. https://phabricator.wikimedia.org/T125562 [12:41:34] I'm around but not very familiar with the setup [12:43:00] thanks, what do you think is needed? aksorias is in vacation [12:43:08] for what? [12:44:19] to finish this migration from labs to prod. [12:44:40] that's fairly generic :) [12:45:02] I'd have to dive in and I can't do that right now [12:45:42] I thought Yuvi was also familiar with all that? [12:46:21] he's also on VAC, coming back next week IIRC [12:46:42] I'll ask them both when they get back, how does that sound? [12:46:54] https://phabricator.wikimedia.org/T106867 [12:47:14] sorry, my connection is driving me crazy [12:48:11] paravoid: yuvi is in vacation too, his help would be a great asset but I'm not sure if he is the only person needed (it's more a team work) [12:48:37] it's kind of urgent for us, blocked deployment of the extension in several wikis [12:49:25] more people can and will get involved but it's kinda hard to start from the beginning without any institutional knowledge just so that we don't wait for a few days [12:50:18] 6Operations: mc1014 fails to pxe-boot with jessie - https://phabricator.wikimedia.org/T128068#2063002 (10fgiunchedi) [12:50:22] elukey: ^ [12:50:37] :( [12:51:00] <_joe_> Amir1: I have to say that tickets as vague as https://phabricator.wikimedia.org/T124199 don't help evaluating what needs to be done [12:51:18] I see, in the meantime, is there anything we can do to speed up the process once they come back? [12:51:38] elukey: that's as far as I can go now though, but should be a lead [12:51:51] godog: allow_unsupported_sfps=*0*? [12:51:52] or =1? [12:52:03] thanks for chiming _joe_. I will look into this [12:52:26] *chiming in [12:53:00] godog: it is far more than my discoveries, thanks! [12:53:04] <_joe_> Amir1: I'm looking at the tasks, a few of them are pretty obvious to me too, but others have no info and it's hard to help tbh; [12:53:10] these parts I guess is the after setup that should be done by halfak himself [12:53:28] <_joe_> Amir1: I don't think so [12:53:37] maybe I can help with that [12:53:50] paravoid: err, =1 of course! fixing... [12:53:51] 6Operations: mc1014 fails to pxe-boot with jessie - https://phabricator.wikimedia.org/T128068#2063014 (10fgiunchedi) [12:54:07] _joe_: have you checked this? https://phabricator.wikimedia.org/T124199#1964618 [12:54:09] <_joe_> Amir1: this won't go in production before yuvi/alex are back anyways [12:54:22] Modify puppet to use debs instead of pip [12:54:30] 6Operations: mc1014 fails to pxe-boot with jessie - https://phabricator.wikimedia.org/T128068#2063002 (10MoritzMuehlenhoff) Since this is a re-image; was the same hardware supported with the 3.2 kernel from precise? Or is that a new NIC? [12:55:01] <_joe_> yeah that means that packaging of dependencies was the chosen option; but they seem to have changed their minds [12:55:12] I understand it definitely, I was thinking if it's possible to do what we can do and let them to finish the job [12:55:13] <_joe_> so, I can speak with haflak later when he is around [12:55:20] reducing the workload [12:55:51] <_joe_> but tbh, this means I need to catch up this evening [12:56:10] <_joe_> and maybe start doing some work on friday, stepping on other people's work [12:56:28] <_joe_> so that on monday I have to debrief them on what I've done [12:56:40] <_joe_> I don't think this is going to gain us so much [12:56:58] <_joe_> and it's not like I don't have other urgent things to attend to [12:57:45] I understand [12:58:11] <_joe_> Amir1: just to understand this better: when is ORES planned to be released on the wikis? [12:58:29] <_joe_> the extension, I mean [12:58:31] ORES is still hosted in labs [12:58:53] first open window after the ores migrated to prod [12:58:56] 6Operations: mc1014 fails to pxe-boot with jessie - https://phabricator.wikimedia.org/T128068#2063002 (10faidon) So, some background here: we've had this issue back when we first installed precise on these boxes. At the time there was no such module option, so I had to write a patch for ixgbe, build it (separate... [12:59:30] mark: yeah, the service is being migrated to production cluster (so ores.wmflabs.org -> ores.wikimedia.org) [13:03:20] (03PS1) 10Giuseppe Lavagetto: standard: move to own module [puppet] - 10https://gerrit.wikimedia.org/r/273209 (https://phabricator.wikimedia.org/T119042) [13:05:59] 6Operations, 10MediaWiki-Interface, 10Traffic, 5MW-1.27-release, and 4 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2063059 (10Danny_B) spotted now again: https... [13:08:45] (03PS1) 10Hashar: beta: $wmgUseRC2UDP = false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273211 (https://phabricator.wikimedia.org/T128006) [13:16:01] (03CR) 10Hashar: [C: 032] beta: $wmgUseRC2UDP = false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273211 (https://phabricator.wikimedia.org/T128006) (owner: 10Hashar) [13:16:44] (03Merged) 10jenkins-bot: beta: $wmgUseRC2UDP = false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273211 (https://phabricator.wikimedia.org/T128006) (owner: 10Hashar) [13:28:14] (03PS2) 10BBlack: vcl_layers: rewrite_proxy_urls fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273152 [13:28:16] (03PS2) 10BBlack: vcl_layers: move recv_purge [puppet] - 10https://gerrit.wikimedia.org/r/273153 [13:28:18] (03PS2) 10BBlack: vcl_layers: consolidate 3x fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273154 [13:28:20] (03PS2) 10BBlack: vcl_layers: re-arrange fe_ip/xff/xcdis-clear [puppet] - 10https://gerrit.wikimedia.org/r/273155 [13:28:22] (03PS3) 10BBlack: vcl_layers: https+fe layer-conditional fixups [puppet] - 10https://gerrit.wikimedia.org/r/273156 [13:28:24] (03PS3) 10BBlack: vcl_layers: move backend 403 check earlier [puppet] - 10https://gerrit.wikimedia.org/r/273157 [13:28:26] (03PS3) 10BBlack: vcl_layers: beacon should only be frontend [puppet] - 10https://gerrit.wikimedia.org/r/273158 [13:28:28] (03PS3) 10BBlack: vcl_layers: varnishcheck can happen earlier [puppet] - 10https://gerrit.wikimedia.org/r/273159 [13:28:30] (03PS3) 10BBlack: vcl_layers: move xcache up a bit [puppet] - 10https://gerrit.wikimedia.org/r/273160 [13:28:32] (03PS3) 10BBlack: vcl_layers: more layer-conditional redundancy [puppet] - 10https://gerrit.wikimedia.org/r/273161 [13:28:34] (03PS3) 10BBlack: vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 [13:28:36] (03PS3) 10BBlack: vcl_layers: move basic fe-only things to fe [puppet] - 10https://gerrit.wikimedia.org/r/273163 [13:28:38] (03PS3) 10BBlack: vcl_layers: split up vcl_recv itself [puppet] - 10https://gerrit.wikimedia.org/r/273164 [13:28:40] (03PS3) 10BBlack: vcl_layers: split hit|miss|pass|fetch [puppet] - 10https://gerrit.wikimedia.org/r/273165 [13:28:42] (03PS3) 10BBlack: vcl_layers: split deliver|error [puppet] - 10https://gerrit.wikimedia.org/r/273166 [13:28:44] (03PS1) 10BBlack: wikimedia.vcl: trailing whitespace fixups [puppet] - 10https://gerrit.wikimedia.org/r/273213 [13:28:46] (03PS1) 10BBlack: vcl_layers: consolidate fe-only sub blocks [puppet] - 10https://gerrit.wikimedia.org/r/273214 [13:28:48] (03PS1) 10BBlack: vcl_layers: move cluster include to layer files [puppet] - 10https://gerrit.wikimedia.org/r/273215 [13:28:50] (03PS1) 10BBlack: upload VCL: refactor a bit [puppet] - 10https://gerrit.wikimedia.org/r/273216 [13:28:52] (03PS1) 10BBlack: VCL: re-arrange subs into standard ordering [puppet] - 10https://gerrit.wikimedia.org/r/273217 [13:28:54] (03PS1) 10BBlack: vcl_recv: single definition in wikimedia.vcl [puppet] - 10https://gerrit.wikimedia.org/r/273218 [13:28:56] (03PS1) 10BBlack: vcl_(hash|hit|miss|pass|fetch|deliver): single definition [puppet] - 10https://gerrit.wikimedia.org/r/273219 [13:28:58] (03PS1) 10BBlack: vcl_error: single definition [puppet] - 10https://gerrit.wikimedia.org/r/273220 [13:29:00] (03PS1) 10BBlack: VCL: move role include up [puppet] - 10https://gerrit.wikimedia.org/r/273221 [13:29:02] (03PS1) 10BBlack: VCL: rename cluster-common funcs for clarity [puppet] - 10https://gerrit.wikimedia.org/r/273222 [13:29:04] (03PS1) 10BBlack: text VCL: clean up 1be misspass mangling [puppet] - 10https://gerrit.wikimedia.org/r/273223 [13:29:10] just a regular day at the office, bblack? [13:29:20] :P [13:29:22] oh man [13:29:43] the end result is worth it, I hope! :) [13:32:25] (which is things like: "sub vcl_foo" is only define once for any daemon's VCL, and control flows downwards from wikimedia.vcl->cluster.vcl always, and subroutine names make sense, and layer==whatever conditionals are gone from code, and backend VCL behavior is much easier to understand) [13:32:42] \o/ [13:32:50] 6Operations: mc1014 fails to pxe-boot with jessie - https://phabricator.wikimedia.org/T128068#2063143 (10elukey) On the installer's console: rmmod ixgbe modprobe ixgbe allow_unsupported_sfp=1 dmesg: ``` [ 2944.105012] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.19.1-k [ 2944.105014] ixg... [13:33:51] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [13:33:52] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [13:59:52] 6Operations: mc1014 fails to pxe-boot with jessie - https://phabricator.wikimedia.org/T128068#2063221 (10elukey) Rebooting and issuing the above commands on the shell, I can obtain the following config: ``` ~ # ip a 1: lo: mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 0... [14:03:01] good {morning,afternoon,evening} all [14:05:21] PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 59.09% of data above the critical threshold [5000000.0] [14:10:11] (03CR) 10Ema: [C: 031] wikimedia.vcl: trailing whitespace fixups [puppet] - 10https://gerrit.wikimedia.org/r/273213 (owner: 10BBlack) [14:11:47] 6Operations, 7Puppet, 10Beta-Cluster-Infrastructure, 6Release-Engineering-Team, 13Patch-For-Review: deployment-tin puppet Error 400 on SERVER: Failed to parse template nutcracker/nutcracker.yml.erb - https://phabricator.wikimedia.org/T127845#2063230 (10hashar) Thank you @joe ! [14:15:06] (03CR) 10Ema: [C: 031] vcl_layers: consolidate 3x fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273154 (owner: 10BBlack) [14:16:40] RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 50.00% above the threshold [1000000.0] [14:22:53] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:25:32] RECOVERY - HTTPS on ms1001 is OK: SSL OK - Certificate dumps.wikimedia.org valid until 2017-04-26 10:47:38 +0000 (expires in 425 days) [14:26:03] (03CR) 10Ema: [C: 031] vcl_layers: https+fe layer-conditional fixups [puppet] - 10https://gerrit.wikimedia.org/r/273156 (owner: 10BBlack) [14:30:21] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [14:32:29] (03CR) 10Ema: [C: 031] vcl_layers: re-arrange fe_ip/xff/xcdis-clear [puppet] - 10https://gerrit.wikimedia.org/r/273155 (owner: 10BBlack) [14:33:23] 6Operations, 10Wikimedia-Logstash: Auto generated Logstash unit file has "Restart=no" - https://phabricator.wikimedia.org/T127677#2063289 (10bd808) >>! In T127677#2062644, @MoritzMuehlenhoff wrote: > I wrote an systemd unit based on the current init script. It's totally untested, though! > https://phabricator... [14:35:40] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 3 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2063305 (10Gehel) Some constraints about certs are exposed in T111654, it probably applies here as well. [14:42:24] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 3 others: Create a PKI that can be used by Puppet and for general purpose certificates - https://phabricator.wikimedia.org/T128077#2063317 (10Gehel) [14:42:51] RECOVERY - Unmerged changes on repository puppet on labcontrol1002 is OK: No changes to merge. [14:56:22] (03CR) 10Ema: [C: 031] VCL: rename cluster-common funcs for clarity [puppet] - 10https://gerrit.wikimedia.org/r/273222 (owner: 10BBlack) [15:00:44] (03CR) 10Ema: [C: 031] vcl_layers: rewrite_proxy_urls fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273152 (owner: 10BBlack) [15:08:15] (03PS5) 10Andrew Bogott: Consolidate labs pdns settings into a hiera dict. [puppet] - 10https://gerrit.wikimedia.org/r/273144 [15:09:04] ostriches: around? [15:10:17] or greg-g ? [15:11:19] 6Operations, 10DNS, 10Traffic, 10domains, 13Patch-For-Review: wikipedia.es - https://phabricator.wikimedia.org/T101060#2063386 (10Dzahn) Looks like this question has solved itself because meanwhile nameservers have been updated: ``` ;; ANSWER SECTION: wikipedia.es. 86338 IN NS ns2.wikimedia.org. wikip... [15:11:22] 6Operations, 10DNS, 10Traffic, 10domains, 13Patch-For-Review: wikipedia.es - https://phabricator.wikimedia.org/T101060#2063387 (10Dzahn) 5Open>3Resolved [15:12:34] (03CR) 10Andrew Bogott: [C: 032] Consolidate labs pdns settings into a hiera dict. [puppet] - 10https://gerrit.wikimedia.org/r/273144 (owner: 10Andrew Bogott) [15:12:49] aude: I'm out today but briefly checking in. Sup? [15:12:51] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-ulsfo:xe-1/2/0 (Telia, IC-313592, 51ms) {#11372} [10Gbps wave]BR [15:13:02] 6Operations, 10DNS, 10Traffic, 10Wikimedia-Site-Requests, and 3 others: Move oldwikisource on www.wikisource.org to mul.wikisource.org - https://phabricator.wikimedia.org/T64717#2063405 (10Dzahn) >>! In T64717#2062666, @Candalua wrote: >>>! In T64717#2062328, @Dzahn wrote: >> I don't see why this is "Traff... [15:13:30] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 (Telia, IC-313592, 51ms) {#1502} [10Gbps wave]BR [15:13:36] ostriches: i would like to put wikidata back on wmf13 because creating new items is broken [15:13:54] (03PS1) 10Ottomata: Remove symlinks made for $cdh::hadoop::gelf_logging_enabled when it is disabled [puppet] - 10https://gerrit.wikimedia.org/r/273239 [15:14:44] i am not sure if updateWikiversions can be used for this or exactly how [15:14:56] (03CR) 10Ottomata: [C: 032] Remove symlinks made for $cdh::hadoop::gelf_logging_enabled when it is disabled [puppet] - 10https://gerrit.wikimedia.org/r/273239 (owner: 10Ottomata) [15:15:08] aude: Just edit wikiversions.json and sync-wikiversions [15:15:17] ok [15:15:18] That's fine by me, just let thcipriani|afk know, he's doing the train this afternoon. [15:15:20] and then i have to commit it [15:15:28] I can do it real quick one sec [15:15:32] ok, thanks [15:15:40] testwikidata too or just wikidata? [15:15:42] https://phabricator.wikimedia.org/T128075 [15:15:43] just wikiata [15:15:48] wikidata* [15:16:02] when we apply the fix, then we can try it on test.wikidata etc [15:16:06] okie dokie [15:16:09] thanks [15:16:53] (03PS1) 10Chad: wikidata back to wmf.13, creating new items has a regression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273240 [15:17:03] ^^^^ aude [15:17:32] thanks [15:19:37] (03CR) 10Chad: [C: 032] wikidata back to wmf.13, creating new items has a regression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273240 (owner: 10Chad) [15:20:15] 6Operations: UnicodeDecodeError invalid continuation byte on ms-fe1004 - https://phabricator.wikimedia.org/T128081#2063443 (10fgiunchedi) [15:20:19] (03Merged) 10jenkins-bot: wikidata back to wmf.13, creating new items has a regression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273240 (owner: 10Chad) [15:21:19] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: wikidata back to wmf.13 for now [15:21:22] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [15:21:25] aude: You're done. [15:21:50] thanks [15:22:08] np [15:22:11] soon as james or someone is around who can help with oojs, we will do backport [15:22:33] (03CR) 10Dzahn: [C: 04-1] "the URL is mentioned on https://cy.wikipedia.org/wiki/Wicipedia_Cymraeg .. we should ask the "cy" community" [dns] - 10https://gerrit.wikimedia.org/r/254055 (owner: 10Dzahn) [15:27:01] (03PS1) 10Faidon Liambotis: autoinstall: add ixgbe.allow_unsupported_sfp=1 to jessie [puppet] - 10https://gerrit.wikimedia.org/r/273242 (https://phabricator.wikimedia.org/T128068) [15:27:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:27:01] (03CR) 10Faidon Liambotis: [C: 032] autoinstall: add ixgbe.allow_unsupported_sfp=1 to jessie [puppet] - 10https://gerrit.wikimedia.org/r/273242 (https://phabricator.wikimedia.org/T128068) (owner: 10Faidon Liambotis) [15:27:02] 6Operations, 10ops-codfw, 6Labs: Figure out what labstore hardware is viable in codfw - https://phabricator.wikimedia.org/T128083#2063474 (10chasemp) [15:27:32] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:27:41] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:29:02] 6Operations, 10DNS, 10Traffic, 10Wikimedia-Site-Requests, and 3 others: Move oldwikisource on www.wikisource.org to mul.wikisource.org - https://phabricator.wikimedia.org/T64717#2063499 (10Candalua) OK. I'm opening a formal discussion on the project to gather consensus about the move: https://wikisource.o... [15:29:31] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [15:30:40] 6Operations, 10ops-codfw, 6Labs: Figure out what labstore hardware is viable in codfw - https://phabricator.wikimedia.org/T128083#2063519 (10Papaul) @chasemp yes all those boxes are still in place but nerve got configured. see ticket below T102626 [15:31:21] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [15:31:58] 6Operations, 10Traffic, 10domains: figure out if we can park wicipediacymraeg.org - https://phabricator.wikimedia.org/T128085#2063521 (10Dzahn) [15:32:24] (03PS2) 10Dzahn: deactivate wicipediacymraeg.org [dns] - 10https://gerrit.wikimedia.org/r/254055 (https://phabricator.wikimedia.org/T128085) [15:32:32] !log stop cassandra/restbase on restbase2001 to finish raid0 grow [15:34:01] ACKNOWLEDGEMENT - cassandra-a CQL 10.192.16.162:9042 on restbase2001 is CRITICAL: Connection refused Filippo Giunchedi raid grow [15:34:02] ACKNOWLEDGEMENT - cassandra-a service on restbase2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed Filippo Giunchedi raid grow [15:34:02] ACKNOWLEDGEMENT - cassandra-b CQL 10.192.16.163:9042 on restbase2001 is CRITICAL: Connection refused Filippo Giunchedi raid grow [15:34:02] ACKNOWLEDGEMENT - cassandra-c CQL 10.192.16.164:9042 on restbase2001 is CRITICAL: Connection refused Filippo Giunchedi raid grow [15:34:02] ACKNOWLEDGEMENT - cassandra-c service on restbase2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed Filippo Giunchedi raid grow [15:34:31] ACKNOWLEDGEMENT - cassandra-a CQL 10.192.16.162:9042 on restbase2001 is CRITICAL: Connection refused Filippo Giunchedi raid grow [15:34:31] ACKNOWLEDGEMENT - cassandra-a service on restbase2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed Filippo Giunchedi raid grow [15:34:31] ACKNOWLEDGEMENT - cassandra-b CQL 10.192.16.163:9042 on restbase2001 is CRITICAL: Connection refused Filippo Giunchedi raid grow [15:34:31] ACKNOWLEDGEMENT - cassandra-c CQL 10.192.16.164:9042 on restbase2001 is CRITICAL: Connection refused Filippo Giunchedi raid grow [15:34:31] ACKNOWLEDGEMENT - cassandra-c service on restbase2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed Filippo Giunchedi raid grow [15:34:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:35:21] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:37:00] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:39:16] 6Operations, 10DNS, 10Traffic, 10Wikimedia-Site-Requests, and 3 others: Move oldwikisource on www.wikisource.org to mul.wikisource.org - https://phabricator.wikimedia.org/T64717#2063575 (10Vogone) >>! In T64717#2063405, @Dzahn wrote: >>>! In T64717#2062666, @Candalua wrote: >>>>! In T64717#2062328, @Dzahn... [15:41:59] (03PS1) 10Giuseppe Lavagetto: role::ntp: rename standard::ntp, move to the standard module [puppet] - 10https://gerrit.wikimedia.org/r/273246 [15:42:01] (03PS1) 10Giuseppe Lavagetto: ntp: further reorg, split of client and server code [puppet] - 10https://gerrit.wikimedia.org/r/273247 [15:42:03] (03PS1) 10Giuseppe Lavagetto: role::diamond: move to standard::diamond [puppet] - 10https://gerrit.wikimedia.org/r/273248 [15:43:58] (03CR) 10jenkins-bot: [V: 04-1] ntp: further reorg, split of client and server code [puppet] - 10https://gerrit.wikimedia.org/r/273247 (owner: 10Giuseppe Lavagetto) [15:44:24] (03CR) 10jenkins-bot: [V: 04-1] role::diamond: move to standard::diamond [puppet] - 10https://gerrit.wikimedia.org/r/273248 (owner: 10Giuseppe Lavagetto) [15:48:00] (03PS2) 10Giuseppe Lavagetto: role::diamond: move to standard::diamond [puppet] - 10https://gerrit.wikimedia.org/r/273248 [15:48:02] (03PS2) 10Giuseppe Lavagetto: ntp: further reorg, split of client and server code [puppet] - 10https://gerrit.wikimedia.org/r/273247 [15:48:04] (03PS2) 10Giuseppe Lavagetto: standard: move to own module [puppet] - 10https://gerrit.wikimedia.org/r/273209 (https://phabricator.wikimedia.org/T119042) [15:48:06] (03PS2) 10Giuseppe Lavagetto: role::ntp: rename standard::ntp, move to the standard module [puppet] - 10https://gerrit.wikimedia.org/r/273246 [15:50:30] (03CR) 10jenkins-bot: [V: 04-1] role::diamond: move to standard::diamond [puppet] - 10https://gerrit.wikimedia.org/r/273248 (owner: 10Giuseppe Lavagetto) [15:50:43] (03CR) 10jenkins-bot: [V: 04-1] ntp: further reorg, split of client and server code [puppet] - 10https://gerrit.wikimedia.org/r/273247 (owner: 10Giuseppe Lavagetto) [15:51:02] <_joe_> wat? ok my puppet lint and the one on our ci disagree... [15:51:21] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [15:51:31] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [15:53:22] (03PS3) 10Giuseppe Lavagetto: role::diamond: move to standard::diamond [puppet] - 10https://gerrit.wikimedia.org/r/273248 [15:53:24] (03PS3) 10Giuseppe Lavagetto: ntp: further reorg, split of client and server code [puppet] - 10https://gerrit.wikimedia.org/r/273247 [15:56:20] (03CR) 10Qgil: [C: 04-1] "-1 as per Ariel's last comment." [dumps/html/deploy] - 10https://gerrit.wikimedia.org/r/204964 (https://phabricator.wikimedia.org/T94457) (owner: 10GWicke) [15:57:19] (03PS1) 10Thcipriani: Update scap to 3.0.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/273253 [16:00:04] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160225T1600). Please do the needful. [16:00:04] James_F bmansurov: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [16:00:09] Yeah yeah. [16:00:19] Who's around to do SWAT? [16:00:22] I can SWAT [16:00:26] Awesome. [16:00:34] Hey thcipriani. [16:00:49] howdy James_F :) [16:01:59] (03CR) 10DCausse: Enable 'popqual' (quality+pageviews) scoring method for the completion suggester (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272963 (https://phabricator.wikimedia.org/T127943) (owner: 10DCausse) [16:02:02] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 3 others: Create a PKI that can be used by Puppet and for general purpose certificates - https://phabricator.wikimedia.org/T128077#2063688 (10Gehel) [16:02:54] (03PS4) 10DCausse: Enable 'popqual' (quality+pageviews) scoring method for the completion suggester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272963 (https://phabricator.wikimedia.org/T127943) [16:03:23] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 3 others: Create a PKI that can be used by Puppet and for general purpose certificates - https://phabricator.wikimedia.org/T128077#2063317 (10Gehel) @Volans is working on a similar problematic for mysql traffic encryption. [16:03:39] i'm here btw [16:04:15] bmansurov: ack, hi! [16:04:22] thcipriani: hi [16:04:46] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/270792 (https://phabricator.wikimedia.org/T125946) (owner: 10Bmansurov) [16:05:08] * aude waves [16:05:29] (03Merged) 10jenkins-bot: Run the survey at normal rate to test DNT [mediawiki-config] - 10https://gerrit.wikimedia.org/r/270792 (https://phabricator.wikimedia.org/T125946) (owner: 10Bmansurov) [16:05:40] 6Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2063709 (10jcrespo) [16:05:42] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 3 others: Create a PKI that can be used by Puppet and for general purpose certificates - https://phabricator.wikimedia.org/T128077#2063710 (10jcrespo) [16:05:51] 7Blocked-on-Operations, 6Operations, 10RESTBase-Cassandra: expand raid0 in restbase200[1-6] - https://phabricator.wikimedia.org/T127951#2063711 (10fgiunchedi) restbase2001 completed, the level remained at raid4 after reshape finished, however to fix it a `mdadm --grow /dev/md2 --level 0` is enough, also `pvr... [16:06:06] 6Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#1612781 (10jcrespo) a:5jcrespo>3None [16:07:09] thcipriani: can we revert? [16:07:23] thcipriani: I forgot about the dependency [16:07:27] sorry about it [16:07:34] bmansurov: sure, np [16:07:38] thank you [16:07:48] thcipriani: no need to deploy the other patch either [16:07:55] bmansurov: ack [16:08:03] (03PS1) 10Gehel: Expose elasticsearch through HTTP [puppet] - 10https://gerrit.wikimedia.org/r/273254 (https://phabricator.wikimedia.org/T124444) [16:10:17] (03PS1) 10Ottomata: Install spark on hadoop worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/273255 [16:10:30] (03Abandoned) 10Aude: Don't yet include wikidatasparql for graphoid [puppet] - 10https://gerrit.wikimedia.org/r/271334 (owner: 10Aude) [16:10:33] (03PS1) 10Thcipriani: Revert "Run the survey at normal rate to test DNT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273256 [16:10:53] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273256 (owner: 10Thcipriani) [16:10:56] (03CR) 10Ottomata: [C: 032 V: 032] Install spark on hadoop worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/273255 (owner: 10Ottomata) [16:11:29] (03Merged) 10jenkins-bot: Revert "Run the survey at normal rate to test DNT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273256 (owner: 10Thcipriani) [16:11:57] (03PS1) 10Elukey: Add mc1014.eqiad back to redis/memcached pool after maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273257 (https://phabricator.wikimedia.org/T123711) [16:13:21] thcipriani: Is it OK if I add another patch to the queue now, or is it too late? [16:13:38] (03PS2) 10Elukey: Add mc1014.eqiad back to redis/memcached pool after maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273257 (https://phabricator.wikimedia.org/T123711) [16:13:43] bmansurov: which patch? [16:13:54] https://gerrit.wikimedia.org/r/#/c/273036/3 [16:14:02] that's the dependency, thcipriani [16:14:12] after that we'll have to deploy the reverted patch [16:14:56] (03PS12) 10BBlack: Maps VCL forward-port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/269466 (https://phabricator.wikimedia.org/T124279) (owner: 10Ema) [16:15:36] (03CR) 10Elukey: [C: 032] Add mc1014.eqiad back to redis/memcached pool after maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273257 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [16:15:47] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [16:16:43] !log thcipriani@tin Synchronized php-1.27.0-wmf.14/resources/lib/oojs-ui/oojs-ui-core.js: SWAT: OOjs UI: Fix #gatherPreInfuseState called incorrectly, causing TypeErrors [[gerrit:273250]] (duration: 01m 42s) [16:16:47] !log added mc1014 back to the redis/memcached pool after maintenance [16:16:49] ^ James_F check please [16:16:58] thcipriani: James_F looking [16:17:02] Thanks thcipriani. [16:17:19] looks good on test.wikidata [16:17:26] thcipriani: It's not obviously and catastrophically broken. :-) [16:17:28] bmansurov: sure, go ahead and add that patch. Rebase the reverted patch, too, please. [16:17:34] James_F: :D [16:17:42] thcipriani: ok thanks [16:17:44] thcipriani: can you put wikidata back on wmf14? [16:17:53] we switched it back to wmf13 because of this issue [16:17:57] aude: Sorry about all this. [16:18:17] James_F: somehow this is missing in our tests and we should have this covered [16:18:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:18:32] aude: sure. [16:18:38] thcipriani: thanks [16:18:45] (03PS1) 10Jforrester: Revert "wikidata back to wmf.13, creating new items has a regression" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273259 [16:18:49] ^^^ [16:20:03] (03PS1) 10Bmansurov: Run the survey at normal rate to test DNT [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273261 (https://phabricator.wikimedia.org/T125946) [16:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:20:18] (03PS1) 10Thcipriani: Wikidatawiki to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273262 [16:20:31] (03PS3) 10Bmansurov: Run the survey at lowered rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273169 (https://phabricator.wikimedia.org/T125946) [16:20:59] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273259 (owner: 10Jforrester) [16:21:17] (03Abandoned) 10Thcipriani: Wikidatawiki to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273262 (owner: 10Thcipriani) [16:21:23] thcipriani: done [16:21:33] bmansurov: okie doke. [16:22:22] (03Merged) 10jenkins-bot: Revert "wikidata back to wmf.13, creating new items has a regression" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273259 (owner: 10Jforrester) [16:23:22] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: SWAT: wikidatawiki to wmf.14 [16:23:29] ^ aude should be on wmf.14 again [16:23:34] thanks [16:23:46] James_F: aude thanks! [16:26:02] bmansurov: I'm not sure if we have time to do the 30 minute revert after test :\ Have to merge change to extension master, backport to wmf.13 and wmf.14, sync then sync the config change. [16:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:26:26] bmansurov: I can get the initial js out to wmf.13 and wmf.14 does that sound ok? [16:26:32] thcipriani: ok [16:27:02] i think i am seeig some caching issues [16:27:26] works good in incognito mode [16:30:13] bmansurov: can I get you to doublecheck the two backports? https://gerrit.wikimedia.org/r/#/c/273268/1 https://gerrit.wikimedia.org/r/#/c/273267/ ? [16:30:27] ok, checking [16:31:20] thcipriani: done [16:32:42] (03CR) 10Ema: [C: 031] vcl_layers: consolidate fe-only sub blocks [puppet] - 10https://gerrit.wikimedia.org/r/273214 (owner: 10BBlack) [16:32:55] (03CR) 10Ema: [C: 031] vcl_layers: move recv_purge [puppet] - 10https://gerrit.wikimedia.org/r/273153 (owner: 10BBlack) [16:33:56] i keep getting an outdated version of https://www.wikidata.org/static/1.27.0-wmf.14/resources/lib/oojs-ui/oojs-ui-core.js [16:34:05] maybe i have to wait some minutes more [16:37:48] (03CR) 10Ema: [C: 031] vcl_layers: move backend 403 check earlier [puppet] - 10https://gerrit.wikimedia.org/r/273157 (owner: 10BBlack) [16:38:57] (03CR) 10Ema: [C: 031] vcl_layers: beacon should only be frontend [puppet] - 10https://gerrit.wikimedia.org/r/273158 (owner: 10BBlack) [16:39:26] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [16:41:00] :\ there are not 2 unmerged changes on mira. [16:41:42] 6Operations, 10Mail: remove bugzilla related mail aliases - https://phabricator.wikimedia.org/T127491#2063825 (10Aklapper) bugzilla-admin@ was used for the "audit_log" emails (RT #4802). Kill it. [16:42:53] (03CR) 10Ema: [C: 031] vcl_layers: varnishcheck can happen earlier [puppet] - 10https://gerrit.wikimedia.org/r/273159 (owner: 10BBlack) [16:44:03] (03CR) 10Ema: [C: 031] vcl_layers: move xcache up a bit [puppet] - 10https://gerrit.wikimedia.org/r/273160 (owner: 10BBlack) [16:48:14] (03PS2) 10BBlack: vcl_error: single definition [puppet] - 10https://gerrit.wikimedia.org/r/273220 [16:48:16] (03PS2) 10BBlack: VCL: move role include up [puppet] - 10https://gerrit.wikimedia.org/r/273221 [16:48:18] (03PS2) 10BBlack: VCL: rename cluster-common funcs for clarity [puppet] - 10https://gerrit.wikimedia.org/r/273222 [16:48:20] (03PS2) 10BBlack: text VCL: clean up 1be misspass mangling [puppet] - 10https://gerrit.wikimedia.org/r/273223 [16:48:22] (03PS2) 10BBlack: upload VCL: refactor a bit [puppet] - 10https://gerrit.wikimedia.org/r/273216 [16:48:24] (03PS2) 10BBlack: VCL: re-arrange subs into standard ordering [puppet] - 10https://gerrit.wikimedia.org/r/273217 [16:48:26] (03PS2) 10BBlack: vcl_recv: single definition in wikimedia.vcl [puppet] - 10https://gerrit.wikimedia.org/r/273218 [16:48:29] (03PS2) 10BBlack: vcl_(hash|hit|miss|pass|fetch|deliver): single definition [puppet] - 10https://gerrit.wikimedia.org/r/273219 [16:48:31] (03PS4) 10BBlack: vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 [16:48:32] (03PS4) 10BBlack: vcl_layers: move basic fe-only things to fe [puppet] - 10https://gerrit.wikimedia.org/r/273163 [16:48:34] (03PS4) 10BBlack: vcl_layers: split up vcl_recv itself [puppet] - 10https://gerrit.wikimedia.org/r/273164 [16:48:36] (03PS2) 10BBlack: vcl_layers: move cluster include to layer files [puppet] - 10https://gerrit.wikimedia.org/r/273215 [16:48:38] (03PS4) 10BBlack: vcl_layers: split hit|miss|pass|fetch [puppet] - 10https://gerrit.wikimedia.org/r/273165 [16:48:40] (03PS4) 10BBlack: vcl_layers: split deliver|error [puppet] - 10https://gerrit.wikimedia.org/r/273166 [16:48:44] (03PS13) 10BBlack: Maps VCL forward-port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/269466 (https://phabricator.wikimedia.org/T124279) (owner: 10Ema) [16:51:34] !log thcipriani@tin Synchronized php-1.27.0-wmf.14/extensions/QuickSurveys/resources/ext.quicksurveys.init/init.js: SWAT: Do not show a survey if DNT is enabled [[gerrit:273267]] (duration: 01m 35s) [16:51:40] ^ bmansurov check please [16:51:57] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [16:53:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:53:10] thcipriani: doesn't seem to be working [16:53:21] (03PS1) 10Elukey: Remove mc1015.eqiad.wmnet from the redis/memcached pool for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273272 (https://phabricator.wikimedia.org/T123711) [16:54:15] (03CR) 10Elukey: [C: 032] Remove mc1015.eqiad.wmnet from the redis/memcached pool for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273272 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [16:54:19] bmansurov: :( keep in mind this is only on non *pedia wikis currently [16:54:40] thcipriani: when will it be live for *pedia wikis? [16:55:02] once wmf.13 backport merges and is sync'd [16:55:26] thcipriani: ok [16:56:00] !log applying schema change to officewiki:flow (s3) [16:56:33] is anyone there? [16:56:34] /win 13 [16:56:36] oops [16:58:31] bmansurov: syncing wmf.13 now, FYI. [16:58:37] cool [16:59:11] (03CR) 10Ema: [C: 04-1] vcl_layers: split wikimedia.vcl in puppet terms (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/273162 (owner: 10BBlack) [16:59:14] morebots, hello [16:59:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:59:31] I am a logbot running on tools-exec-1215. [16:59:31] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [16:59:31] To log a message, type !log . [16:59:47] !log thcipriani@tin Synchronized php-1.27.0-wmf.13/extensions/QuickSurveys/resources/ext.quicksurveys.init/init.js: SWAT: Do not show a survey if DNT is enabled [[gerrit:273268]] (duration: 01m 31s) [16:59:49] ^ bmansurov check please [17:00:04] _joe_ godog: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160225T1700). Please do the needful. [17:00:05] thcipriani SMalyshev: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [17:00:25] I do not know if the bot is lagging or it is me [17:00:36] thcipriani: still not working [17:01:59] <_joe_> thcipriani: give me 2 mins, I'm in a meeting [17:02:11] (03PS3) 10BBlack: vcl_error: single definition [puppet] - 10https://gerrit.wikimedia.org/r/273220 [17:02:13] (03PS3) 10BBlack: VCL: move role include up [puppet] - 10https://gerrit.wikimedia.org/r/273221 [17:02:15] (03PS3) 10BBlack: VCL: rename cluster-common funcs for clarity [puppet] - 10https://gerrit.wikimedia.org/r/273222 [17:02:17] (03PS3) 10BBlack: text VCL: clean up 1be misspass mangling [puppet] - 10https://gerrit.wikimedia.org/r/273223 [17:02:19] (03PS3) 10BBlack: upload VCL: refactor a bit [puppet] - 10https://gerrit.wikimedia.org/r/273216 [17:02:21] (03PS3) 10BBlack: VCL: re-arrange subs into standard ordering [puppet] - 10https://gerrit.wikimedia.org/r/273217 [17:02:23] (03PS3) 10BBlack: vcl_recv: single definition in wikimedia.vcl [puppet] - 10https://gerrit.wikimedia.org/r/273218 [17:02:25] (03PS3) 10BBlack: vcl_(hash|hit|miss|pass|fetch|deliver): single definition [puppet] - 10https://gerrit.wikimedia.org/r/273219 [17:02:27] (03PS5) 10BBlack: vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 [17:02:28] bmansurov: I can verify that the code made it to a few machines I spot-checked. Are you seeing the updated code at least? [17:02:29] (03PS5) 10BBlack: vcl_layers: move basic fe-only things to fe [puppet] - 10https://gerrit.wikimedia.org/r/273163 [17:02:31] (03PS5) 10BBlack: vcl_layers: split up vcl_recv itself [puppet] - 10https://gerrit.wikimedia.org/r/273164 [17:02:33] (03PS3) 10BBlack: vcl_layers: move cluster include to layer files [puppet] - 10https://gerrit.wikimedia.org/r/273215 [17:02:35] (03PS5) 10BBlack: vcl_layers: split hit|miss|pass|fetch [puppet] - 10https://gerrit.wikimedia.org/r/273165 [17:02:37] (03PS5) 10BBlack: vcl_layers: split deliver|error [puppet] - 10https://gerrit.wikimedia.org/r/273166 [17:02:39] (03PS2) 10Elukey: Remove mc1015.eqiad.wmnet from the redis-memcached pool for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273272 (https://phabricator.wikimedia.org/T123711) [17:02:41] (03PS14) 10BBlack: Maps VCL forward-port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/269466 (https://phabricator.wikimedia.org/T124279) (owner: 10Ema) [17:02:44] thcipriani: let me check the code [17:02:47] bmansurov: if not, are you using ?debug=true? [17:02:50] kk [17:02:57] (03CR) 10BBlack: vcl_layers: split wikimedia.vcl in puppet terms (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/273162 (owner: 10BBlack) [17:03:00] thcipriani: no i'm not using debug=true, should I be? [17:03:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:03:34] <_joe_> thcipriani: still on the regular SWAT? [17:03:48] bmansurov: it'll help rule out some caching [17:03:57] <_joe_> SMalyshev: you're in queue after thcipriani and the regular SWAT [17:04:03] _joe_: yeah, just finished last sync, should only be another minute, sorry [17:04:14] 6Operations, 10ops-eqiad: Failed drive in labstore1001 array - https://phabricator.wikimedia.org/T127076#2031522 (10mark) so there tends to be a way to have a drive led blink, usually set using the raid controller management tool. Usually that also allows to figure out which physical slot is associated with it... [17:04:23] thcipriani: https://en.wikipedia.org/static/1.27.0-wmf.13/extensions/QuickSurveys/resources/ext.quicksurveys.init/init.js shows the old file [17:04:37] !log applying schema change to flowdb (x1) [17:04:38] _joe_: I can take thcipriani's patch, it was meant for last week [17:04:45] thcipriani: only after adding debug=true it was updated [17:05:15] <_joe_> godog: don't worry, and btw I know it's cherry-picked in labs already [17:05:31] (03CR) 10jenkins-bot: [V: 04-1] vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 (owner: 10BBlack) [17:05:54] _joe_: yeah, only thing outside labs was dnsmasq on labnet [17:06:00] bmansurov: lemme touch and re-sync. godog lemme finish up some swat. [17:06:28] * godog pulls thcipriani in all directions [17:06:33] :D [17:07:19] the worst part of it is, one patch for SWAT is for the scap update that will (hopefully) speed up sync-masters which makes sync-file take forever :P [17:08:09] 6Operations, 10Wikimedia-Logstash: Auto generated Logstash unit file has "Restart=no" - https://phabricator.wikimedia.org/T127677#2063880 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [17:08:13] !log thcipriani@tin Synchronized php-1.27.0-wmf.13/extensions/QuickSurveys/resources/ext.quicksurveys.init/init.js: SWAT (after touch): Do not show a survey if DNT is enabled [[gerrit:273268]] (duration: 01m 32s) [17:08:49] 6Operations, 10Wikimedia-Logstash: Auto generated Logstash unit file has "Restart=no" - https://phabricator.wikimedia.org/T127677#2050308 (10MoritzMuehlenhoff) Ok, I'll turn this into a proper gerrit patch. [17:09:24] <_joe_> SMalyshev: around? [17:09:27] <_joe_> or gehel :) [17:09:39] gehel's here [17:09:40] bmansurov: I see the new file without the debug now [17:09:54] let me check [17:10:01] _joe_: how can I help you? [17:10:17] thcipriani: i still see the old one [17:10:27] thcipriani: if you think we should wait, then it's fine too. [17:10:28] <_joe_> gehel: I'm merging https://gerrit.wikimedia.org/r/#/c/272908/ [17:10:52] <_joe_> it's scheduled for puppetswat and it's simple enough and I know WDQS well enough to go on and merge it [17:10:58] (03PS2) 10Giuseppe Lavagetto: Always add Access-Control-Allow-Origin for WDQS backend response [puppet] - 10https://gerrit.wikimedia.org/r/272908 (https://phabricator.wikimedia.org/T115476) (owner: 10Smalyshev) [17:11:04] bmansurov: yeah, this is a caching thing at this point. let's let puppet swat go ahead. [17:11:07] <_joe_> jsut an FYI [17:11:07] thcipriani: oh yeah, i'ts working now [17:11:15] thcipriani: thanks a lot [17:11:22] _joe_: thanks for the head's up! [17:11:25] bmansurov: thank for checking. [17:11:31] _joe_: godog SWAT is complete. [17:11:58] !log installing xerces-c security updates [17:12:06] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Always add Access-Control-Allow-Origin for WDQS backend response [puppet] - 10https://gerrit.wikimedia.org/r/272908 (https://phabricator.wikimedia.org/T115476) (owner: 10Smalyshev) [17:12:15] _joe_: I do not know enough about WDQS, but the change seemed trivial enough [17:13:03] thcipriani: ack, I'll merge your patch shortly then [17:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:15:06] (03PS6) 10Filippo Giunchedi: Beta: Move deployment server [puppet] - 10https://gerrit.wikimedia.org/r/270343 (https://phabricator.wikimedia.org/T126377) (owner: 10Thcipriani) [17:15:27] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Beta: Move deployment server [puppet] - 10https://gerrit.wikimedia.org/r/270343 (https://phabricator.wikimedia.org/T126377) (owner: 10Thcipriani) [17:17:54] thcipriani: {{done}} ^ [17:18:23] godog: awesome! thanks, I'll fix up deployment-prep! [17:20:47] godog: I also added a patch to update the scap package to puppetswat (if you missed it on the deployments page) [17:21:10] oh, checking thcipriani [17:23:10] (03PS2) 10Filippo Giunchedi: Update scap to 3.0.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/273253 (owner: 10Thcipriani) [17:23:20] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Update scap to 3.0.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/273253 (owner: 10Thcipriani) [17:23:57] thcipriani: {{done}} [17:24:59] godog: neat. possible to get a puppet run on the masters? tin and mira? Just want to verify everything's still kosher there. [17:24:59] _joe_ sorry, just got to the office [17:25:15] _joe_: but I see you already took care of it, thanks! [17:26:10] <_joe_> SMalyshev: yup [17:26:14] <_joe_> SMalyshev: yw [17:26:16] <_joe_> :) [17:26:54] (03CR) 10Ema: [C: 031] vcl_layers: more layer-conditional redundancy [puppet] - 10https://gerrit.wikimedia.org/r/273161 (owner: 10BBlack) [17:27:35] thcipriani: sure, on tin is just finished in only 89s ! [17:27:39] s/is/has/ [17:27:51] 89.81s to be exact [17:27:56] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [5000000.0] [17:28:02] heh, nice. [17:30:29] kk, I'm going to try a sync-file real quick. [17:32:45] !log thcipriani@tin Synchronized README: test sync-file (duration: 01m 46s) [17:33:09] (03PS3) 10Elukey: Remove mc1015.eqiad.wmnet from the redis-memcached pool for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273272 (https://phabricator.wikimedia.org/T123711) [17:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:35:17] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 50.00% above the threshold [1000000.0] [17:37:01] !log thcipriani@tin Synchronized README: test sync-file (duration: 01m 42s) [17:37:56] !log removed mc1015 from the redis/memcached pool for maintenance [17:38:03] andrewbogott: FYI from last week, merging https://gerrit.wikimedia.org/r/#/c/270343/ on labnet1002 did change /etc/dnsmasq-nova.conf but didn't restart the service and sending sighup doesn't make dnsmasq reload the configuration afaics [17:38:34] godog: ok... [17:38:35] thcipriani: I'm assuming all good re: scap? [17:38:51] I can restart nova-network right now and see if it helps… it’s hard to know what happened in retrospect. [17:38:53] godog: yup, I think so, thanks for the update [17:39:07] godog: want me to do that? [17:39:24] godog: now that I think of it, I think that I’ve seen this before :( [17:39:58] andrewbogott: thanks! does that have any impact? anyways the diff is this [17:40:01] -alias=208.80.155.191,10.68.16.58 [17:40:03] +alias=208.80.155.191,10.68.17.240 [17:40:04] I think that nova-network starts dnsmasq but doesn’t manage it properly after the fact. [17:40:11] and that I have to kill and restart it by hand. [17:40:14] I’ll be quick~ [17:41:13] godog: did that fix it? [17:41:46] andrewbogott: yeah I think so, dnsmasq just got restarted [17:42:18] godog: I did # kill -9 ;; #service nova-network restart [17:42:23] A big hammer for a small problem [17:43:26] andrewbogott: haha indeed, thanks for your help! [17:46:09] (03PS3) 10Filippo Giunchedi: restbase: make statsd metric prefix configurable [puppet] - 10https://gerrit.wikimedia.org/r/238431 (https://phabricator.wikimedia.org/T112644) [17:47:33] (03CR) 10Mark Bergsma: [C: 031] "The name of this parameter is really not worth bikeshedding over for 6 months." [puppet] - 10https://gerrit.wikimedia.org/r/238431 (https://phabricator.wikimedia.org/T112644) (owner: 10Filippo Giunchedi) [17:48:52] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: make statsd metric prefix configurable [puppet] - 10https://gerrit.wikimedia.org/r/238431 (https://phabricator.wikimedia.org/T112644) (owner: 10Filippo Giunchedi) [17:55:41] 6Operations, 10Mail, 10fundraising-tech-ops: donation aliases for moneybookers? - https://phabricator.wikimedia.org/T127489#2064042 (10Dzahn) [17:56:17] (03CR) 10Ema: Maps VCL forward-port to Varnish 4 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/269466 (https://phabricator.wikimedia.org/T124279) (owner: 10Ema) [17:56:58] apergos: Thanks for sharing your mail on wikmedia mailinglist. [17:57:16] payments io thread down? [17:57:31] Steinsplitter: you're welcome [17:57:53] Steinsplitter: +! [17:57:57] !log bounce cassandra instances on restbase2001, cql not listening [17:57:58] !log ori@tin Synchronized php-1.27.0-wmf.14/includes/session/SessionBackend.php: I43cde3a48: Revert Revert SessionBackend: skip isUserSessionPrevented check for anons (duration: 01m 39s) [17:58:22] (03PS6) 10Andrew Bogott: Updates to designate/mdns/pdns setup for Labs internal dns [puppet] - 10https://gerrit.wikimedia.org/r/272771 (https://phabricator.wikimedia.org/T124680) [17:59:37] !log ori@tin Synchronized php-1.27.0-wmf.13/includes/session/SessionBackend.php: I43cde3a48: Revert Revert SessionBackend: skip isUserSessionPrevented check for anons (duration: 01m 38s) [18:00:04] yurik gwicke cscott arlolra subbu: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160225T1800). [18:00:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:01:26] (03CR) 10Ema: Maps VCL forward-port to Varnish 4 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/269466 (https://phabricator.wikimedia.org/T124279) (owner: 10Ema) [18:03:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:05:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:05:31] (03PS1) 10Yurik: Disabled Graph namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273283 [18:05:47] ori: we gave thcipriani horrible code review. The "fix" for the mtime bug is inverted (missing "not") [18:05:54] anyone to take a look at the config patch ^^ [18:06:03] thcipriani: that'll teach you [18:06:24] missed the word "could" :) ^^ [18:06:33] ugh :) [18:06:43] thcipriani: what were you thinking, trusting us? [18:07:29] (03PS7) 10Andrew Bogott: Updates to designate/mdns/pdns setup for Labs internal dns [puppet] - 10https://gerrit.wikimedia.org/r/272771 (https://phabricator.wikimedia.org/T124680) [18:07:41] !log testing schema change on db2070 (enwiki) [18:07:42] reagan got this one right. trust but verify. [18:07:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:13:28] 6Operations, 10RESTBase: install restbase1010-restbase1016 - https://phabricator.wikimedia.org/T128107#2064139 (10fgiunchedi) [18:13:51] (03CR) 10Ema: vcl_error: single definition (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/273220 (owner: 10BBlack) [18:15:28] 6Operations, 10RESTBase: install restbase1010-restbase1016 - https://phabricator.wikimedia.org/T128107#2064167 (10fgiunchedi) we'll be starting with row 'A', same row as restbase1001 / restbase1002 / restbase1007, at two machines per row [18:15:32] cmjohnson1: ^ to track rack/install of the restbase hw [18:17:06] godog: okay, thx lmk when it's okay to start [18:18:02] cmjohnson1: anytime, existing hw will stay in place for a bit while the new hw is commissioned and the old decommissioned [18:18:23] oh..okay, that works then [18:28:48] (03CR) 10Andrew Bogott: [C: 032] Updates to designate/mdns/pdns setup for Labs internal dns [puppet] - 10https://gerrit.wikimedia.org/r/272771 (https://phabricator.wikimedia.org/T124680) (owner: 10Andrew Bogott) [18:31:13] aude, could you review https://gerrit.wikimedia.org/r/#/c/273283/ [18:35:16] PROBLEM - check_mysql on lutetium is CRITICAL: Cant connect to local MySQL server through socket /tmp/mysql.sock (2) [18:39:39] (03PS2) 10BBlack: wikimedia.vcl: trailing whitespace fixups [puppet] - 10https://gerrit.wikimedia.org/r/273213 [18:40:16] PROBLEM - check_mysql on lutetium is CRITICAL: Cant connect to local MySQL server through socket /tmp/mysql.sock (2) [18:40:22] ^Jeff_Green [18:40:44] (03CR) 10BBlack: [C: 032 V: 032] wikimedia.vcl: trailing whitespace fixups [puppet] - 10https://gerrit.wikimedia.org/r/273213 (owner: 10BBlack) [18:40:45] yup. package update [18:41:03] (03PS3) 10BBlack: vcl_layers: rewrite_proxy_urls fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273152 [18:41:10] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: rewrite_proxy_urls fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273152 (owner: 10BBlack) [18:41:49] (03PS2) 10BBlack: vcl_layers: consolidate fe-only sub blocks [puppet] - 10https://gerrit.wikimedia.org/r/273214 [18:41:58] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: consolidate fe-only sub blocks [puppet] - 10https://gerrit.wikimedia.org/r/273214 (owner: 10BBlack) [18:42:25] (03PS3) 10BBlack: vcl_layers: move recv_purge [puppet] - 10https://gerrit.wikimedia.org/r/273153 [18:42:40] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: move recv_purge [puppet] - 10https://gerrit.wikimedia.org/r/273153 (owner: 10BBlack) [18:42:53] \o/ [18:43:00] (03PS3) 10BBlack: vcl_layers: consolidate 3x fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273154 [18:43:08] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: consolidate 3x fe-only [puppet] - 10https://gerrit.wikimedia.org/r/273154 (owner: 10BBlack) [18:43:53] just one chunk of them for now :) [18:44:06] PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 57.69% of data above the critical threshold [5000000.0] [18:44:08] a brave new world in vcl land, excited for it! [18:45:16] RECOVERY - check_mysql on lutetium is OK: Uptime: 244 Threads: 2 Questions: 29202 Slow queries: 0 Opens: 85 Flush tables: 2 Open tables: 64 Queries per second avg: 119.680 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [18:54:10] apergos, i don't think lila's resignation has impacted VCL that much! [18:54:19] hahahaha [18:54:24] no that was all bblack :-) [18:54:30] yurik: reasoning about varnish side-effects is notoriously difficult [18:54:34] kickin varnish *ss and takin names [18:57:11] ori, +100 [18:57:27] i wonder if it beats the random() [18:58:17] RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 50.00% above the threshold [1000000.0] [18:58:47] !log testing schema changes on db2057:testwiki (s3) [18:58:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:00:12] jynus, crashing the entire wiki during the metrics might be .... telling [19:00:16] ;) [19:00:47] I could do that, that is why I am doing it on a host that is not in use [19:00:56] *but [19:01:05] not as impactful though :) [19:01:36] * yurik likes changes with a substantial impact [19:05:59] jouncebot: next [19:05:59] In 0 hour(s) and 54 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160225T2000) [19:07:29] yurik: do you still need review? [19:07:56] aude, a +1 if you have a sec - i added it to SWAT [19:08:13] i'll take a look after i eat dinner [19:11:10] (03PS1) 10Elukey: Add mc1015.eqiad.wmnet back in the redis/memcached pools. [puppet] - 10https://gerrit.wikimedia.org/r/273299 (https://phabricator.wikimedia.org/T123711) [19:12:37] (03CR) 10Elukey: [C: 032] Add mc1015.eqiad.wmnet back in the redis/memcached pools. [puppet] - 10https://gerrit.wikimedia.org/r/273299 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [19:13:57] !log added mc1015.eqiad.wmnet back to the redis/memcached pools [19:14:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:20:13] PROBLEM - cassandra-a CQL 10.64.32.187:9042 on restbase1008 is CRITICAL: Connection refused [19:27:22] (03PS2) 10Ottomata: Use different my.cnf in labs for analytics-meta instance [puppet] - 10https://gerrit.wikimedia.org/r/272896 [19:29:23] ostriches: wmf14 is going out to all wikis today, right? [19:30:13] ori: yeah, ostriches is out the rest of the day, I'll be pushing it out in ~30mins [19:30:25] thanks thcipriani [19:36:38] (03CR) 10Ottomata: [C: 032] Use different my.cnf in labs for analytics-meta instance [puppet] - 10https://gerrit.wikimedia.org/r/272896 (owner: 10Ottomata) [19:37:34] (03Abandoned) 10Ottomata: Move hive-server, hive-metastore and oozie from analytics1027 to analytics1015 [puppet] - 10https://gerrit.wikimedia.org/r/264760 (https://phabricator.wikimedia.org/T110090) (owner: 10Ottomata) [19:46:52] 6Operations, 10DBA, 10MediaWiki-Configuration, 6Release-Engineering-Team, and 3 others: codfw is in read only according to mediawiki - https://phabricator.wikimedia.org/T124795#1966220 (10jcrespo) p:5Triage>3High [19:47:11] 6Operations, 10DBA, 10MediaWiki-Configuration, 6Release-Engineering-Team, and 3 others: codfw is in read only according to mediawiki - https://phabricator.wikimedia.org/T124795#1966220 (10jcrespo) a:3jcrespo [19:49:52] 6Operations, 10ops-eqiad, 6Labs: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2027631 (10jcrespo) Cmjohnson: See my comment on T118174#2062707 [19:55:06] 6Operations, 10ops-codfw, 10Incident-20150617-LabsNFSOutage: Labstore2001 controller or shelf failure - https://phabricator.wikimedia.org/T102626#2064521 (10tom29739) [20:00:04] ostriches: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160225T2000). [20:00:50] * thcipriani does train [20:02:46] (03PS1) 10Thcipriani: all wikis to 1.27.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273304 [20:04:09] (03CR) 10Thcipriani: [C: 032] all wikis to 1.27.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273304 (owner: 10Thcipriani) [20:04:35] (03Merged) 10jenkins-bot: all wikis to 1.27.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273304 (owner: 10Thcipriani) [20:04:59] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.14 [20:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:06:33] seeing a lot of Undefined property: CirrusSearch\InterwikiSearcher::$config in /srv/mediawiki/php-1.27.0-wmf.14/extensions/CirrusSearch/includes/InterwikiSearcher.php on line 79 [20:07:45] which is a notice, but it is definitely spiking the logs [20:12:29] task created https://phabricator.wikimedia.org/T128122 [20:20:35] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [20:23:36] Session "{session}": Metadata merge failed: {exception} [20:24:30] (03PS3) 10Krinkle: Set $wgResourceBasePath to "/w" for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271711 (https://phabricator.wikimedia.org/T99096) [20:25:29] Session "{session}": Unverified user provided and no metadata to auth it [20:26:36] alright. I'm going to roll back. Lots of session errors. plus a lot of log spam in fatalmonitor. [20:28:25] (03PS3) 10BBlack: vcl_layers: re-arrange fe_ip/xff/xcdis-clear [puppet] - 10https://gerrit.wikimedia.org/r/273155 [20:28:32] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: re-arrange fe_ip/xff/xcdis-clear [puppet] - 10https://gerrit.wikimedia.org/r/273155 (owner: 10BBlack) [20:28:46] (03PS4) 10BBlack: vcl_layers: https+fe layer-conditional fixups [puppet] - 10https://gerrit.wikimedia.org/r/273156 [20:28:52] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: https+fe layer-conditional fixups [puppet] - 10https://gerrit.wikimedia.org/r/273156 (owner: 10BBlack) [20:29:00] (03PS1) 10Thcipriani: Revert "all wikis to 1.27.0-wmf.14" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273306 [20:29:04] (03PS4) 10BBlack: vcl_layers: move backend 403 check earlier [puppet] - 10https://gerrit.wikimedia.org/r/273157 [20:29:11] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: move backend 403 check earlier [puppet] - 10https://gerrit.wikimedia.org/r/273157 (owner: 10BBlack) [20:29:21] (03PS4) 10BBlack: vcl_layers: beacon should only be frontend [puppet] - 10https://gerrit.wikimedia.org/r/273158 [20:29:27] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: beacon should only be frontend [puppet] - 10https://gerrit.wikimedia.org/r/273158 (owner: 10BBlack) [20:29:45] (03PS4) 10BBlack: vcl_layers: varnishcheck can happen earlier [puppet] - 10https://gerrit.wikimedia.org/r/273159 [20:30:10] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: varnishcheck can happen earlier [puppet] - 10https://gerrit.wikimedia.org/r/273159 (owner: 10BBlack) [20:30:12] (03CR) 10Thcipriani: [C: 032] Revert "all wikis to 1.27.0-wmf.14" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273306 (owner: 10Thcipriani) [20:30:21] (03PS4) 10BBlack: vcl_layers: move xcache up a bit [puppet] - 10https://gerrit.wikimedia.org/r/273160 [20:30:30] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: move xcache up a bit [puppet] - 10https://gerrit.wikimedia.org/r/273160 (owner: 10BBlack) [20:30:33] ooooohh round two :-) [20:30:36] (03Merged) 10jenkins-bot: Revert "all wikis to 1.27.0-wmf.14" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273306 (owner: 10Thcipriani) [20:30:49] (03PS4) 10BBlack: vcl_layers: more layer-conditional redundancy [puppet] - 10https://gerrit.wikimedia.org/r/273161 [20:30:57] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: more layer-conditional redundancy [puppet] - 10https://gerrit.wikimedia.org/r/273161 (owner: 10BBlack) [20:31:18] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: rollback wmf.14 [20:31:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:32:27] OK. Back with wikipedias on wmf.13 and all other wikis on wmf.14. [20:32:59] thcipriani: the session errors are happening on wmf.14 [20:33:09] apparently [20:33:19] and they are happening on itwiki? [20:33:28] mostly [20:33:58] twentyafterfour: itwiki is back on wmf.13 now. [20:33:59] itwiki and fawiki have the most , but that seems to be CirrusSearch interwiki issues [20:37:04] "Session "{session}": Metadata merge failed: {exception}" is normal and expected [20:37:21] the fawiki problem is a single bad bot [20:37:26] seems fawikipedia had mostly session errors, with itwikipedia having a lot of CirrusSearch interwiki [20:37:52] that bot has been broken for weeks with no response from the owner on talk pages [20:37:55] ignore it [20:38:17] if you ask me, I would rather ban it [20:40:28] (03PS1) 10Cmjohnson: Removing mgmt entries for old cp/decommissioned servers [dns] - 10https://gerrit.wikimedia.org/r/273308 [20:40:30] has someone pinged discovery about their undef property notice? [20:40:57] task created https://phabricator.wikimedia.org/T128122 [20:41:38] bd808: yeah Deskana is pinging Erik [20:41:48] *nod* [20:41:57] wondering how it is only on itwiki [20:42:06] (03PS2) 10Cmjohnson: Removing mgmt entries for old cp/decommissioned servers [dns] - 10https://gerrit.wikimedia.org/r/273308 [20:42:54] (03CR) 10Cmjohnson: [C: 032] Removing mgmt entries for old cp/decommissioned servers [dns] - 10https://gerrit.wikimedia.org/r/273308 (owner: 10Cmjohnson) [20:43:11] hashar: this is the account that is causing the fawiki error storm -- https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Rezabot [20:44:25] bd808, hashar: I'm pinging ALL the people. [20:44:52] bd808: it is apparently on itwiki as well I have seen that rezabot string [20:44:53] Deskana: :) PM response action #1 [20:44:58] Deskana: thank you! [20:45:47] hashar: anomie pinged here -- https://www.wikidata.org/wiki/User_talk:Yamaha5#Please_attend_to_your_bot -- and was told "yeah but not my problem" basically [20:46:58] !log starting bootstrap of restbase1008-a T119935 [20:46:59] T119935: Upgrade restbase100[7-9] to match restbase100[1-6] hardware - https://phabricator.wikimedia.org/T119935 [20:47:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:49:28] bd808: Looks like we also have Obaid-bot on urwiki being dumb now. [20:50:12] last week i have seen a bot making an insane amount of requests, forgot to fill it as a task. Some research being conducted at a university [20:50:35] that spammed the recent changes api feed several time per seconds [20:50:59] so research in pissing of 'pedians? [20:51:04] *off [20:51:40] (03PS1) 10Ottomata: Set up Analytics MySQL Meta backup with backup::mysqlset [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) [20:51:44] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 60.87% of data above the critical threshold [5000000.0] [20:52:04] (03PS2) 10Ottomata: Set up Analytics MySQL Meta backup with backup::mysqlset [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) [20:52:22] * anomie requests a rezabot shutdown in #wikimedia-labs [20:52:40] I can't find the account that runs it [20:52:42] anomie: would you mind task filling it for the record [20:52:51] the account named reza is another bot entirely [20:53:00] (03PS1) 10Andrew Bogott: Lower max-mthreads for pdns recursor. [puppet] - 10https://gerrit.wikimedia.org/r/273314 (https://phabricator.wikimedia.org/T124680) [20:57:30] !log disabling puppet on caches for scarier VCL merges [20:57:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:59:17] (03PS6) 10BBlack: vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 [20:59:41] (03PS3) 10Ottomata: Set up Analytics MySQL Meta backup with backup::mysqlset [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) [21:00:00] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: split wikimedia.vcl in puppet terms [puppet] - 10https://gerrit.wikimedia.org/r/273162 (owner: 10BBlack) [21:00:07] (03PS4) 10Ottomata: Set up Analytics MySQL Meta backup with backup::mysqlset [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) [21:00:41] unless you are ina hurry, allow me to review that [21:00:51] (not today, though) [21:01:46] (03PS4) 10BBlack: vcl_layers: move cluster include to layer files [puppet] - 10https://gerrit.wikimedia.org/r/273215 [21:02:25] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 50.00% above the threshold [1000000.0] [21:05:00] (03PS5) 10Ottomata: Set up Analytics MySQL Meta backup with backup::mysqlset [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) [21:06:00] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: move cluster include to layer files [puppet] - 10https://gerrit.wikimedia.org/r/273215 (owner: 10BBlack) [21:06:12] (03PS6) 10BBlack: vcl_layers: move basic fe-only things to fe [puppet] - 10https://gerrit.wikimedia.org/r/273163 [21:06:20] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: move basic fe-only things to fe [puppet] - 10https://gerrit.wikimedia.org/r/273163 (owner: 10BBlack) [21:06:30] (03PS6) 10BBlack: vcl_layers: split up vcl_recv itself [puppet] - 10https://gerrit.wikimedia.org/r/273164 [21:06:38] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: split up vcl_recv itself [puppet] - 10https://gerrit.wikimedia.org/r/273164 (owner: 10BBlack) [21:08:44] (03PS6) 10BBlack: vcl_layers: split deliver|error [puppet] - 10https://gerrit.wikimedia.org/r/273166 [21:08:46] (03CR) 10BBlack: [C: 032 V: 032] vcl_layers: split deliver|error [puppet] - 10https://gerrit.wikimedia.org/r/273166 (owner: 10BBlack) [21:08:48] (03CR) 10Ottomata: "Whatchyall think? Will it blend? Should I just try it and then fix what goes wrong?" [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) (owner: 10Ottomata) [21:17:37] bd808: so all of the session...whatever errors, expected? [21:17:51] it looks like the notice spike was "fixed" but not sync'd out. [21:18:03] (03PS1) 10BBlack: wikimedia-common VCL: split output files [puppet] - 10https://gerrit.wikimedia.org/r/273333 [21:18:04] I'm going to try syncing that out and rolling forward again. [21:18:32] thcipriani: yeah I'm pretty sure that the session manager wanrings you were seeing are ok [21:18:49] kk, then I'm going to go ahead with train...again. [21:19:29] 6Operations, 10fundraising-tech-ops, 13Patch-For-Review: remove fundraising banner log related cruft from production puppet - https://phabricator.wikimedia.org/T118325#2064781 (10Jgreen) >>! In T118325#1919687, @akosiaris wrote: > @JGreen, I think all the blockers are done, should we move with the cleanup ?... [21:20:19] (03CR) 10BBlack: [C: 032] wikimedia-common VCL: split output files [puppet] - 10https://gerrit.wikimedia.org/r/273333 (owner: 10BBlack) [21:20:23] (03PS1) 10Cmjohnson: adding dns entries for new restabse servers. These are all the intial 1G connections and management [dns] - 10https://gerrit.wikimedia.org/r/273350 [21:20:45] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [21:21:31] !log ebernhardson@tin Synchronized php-1.27.0-wmf.14/extensions/CirrusSearch/includes/Searcher.php: Update file that wasnt synced properly (duration: 01m 50s) [21:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:22:31] thcipriani: i didn't see your response, but i saw the file was on -staging and not the main so synced it out [21:22:35] ebernhardson: heh, I was just prepping that message. Thanks :) [21:22:44] (03PS2) 10Cmjohnson: adding dns entries for new restabse servers. These are all the intial 1G connections and management [dns] - 10https://gerrit.wikimedia.org/r/273350 [21:23:00] ok, going to roll forward with the train, now. [21:23:05] (03CR) 10jenkins-bot: [V: 04-1] adding dns entries for new restabse servers. These are all the intial 1G connections and management [dns] - 10https://gerrit.wikimedia.org/r/273350 (owner: 10Cmjohnson) [21:23:20] (03CR) 10Cmjohnson: [C: 032] adding dns entries for new restabse servers. These are all the intial 1G connections and management [dns] - 10https://gerrit.wikimedia.org/r/273350 (owner: 10Cmjohnson) [21:24:29] (03PS1) 10Thcipriani: Revert "Revert "all wikis to 1.27.0-wmf.14"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273366 [21:27:24] (03CR) 10Thcipriani: [C: 032] Revert "Revert "all wikis to 1.27.0-wmf.14"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273366 (owner: 10Thcipriani) [21:27:51] (03Merged) 10jenkins-bot: Revert "Revert "all wikis to 1.27.0-wmf.14"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273366 (owner: 10Thcipriani) [21:28:31] oh my [21:28:48] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.14 [21:30:21] ebernhardson: I'm seeing Undefined variable: term in /srv/mediawiki/php-1.27.0-wmf.14/extensions/CirrusSearch/includes/InterwikiSearcher.php on line 83 now [21:33:12] thcipriani: :S looking [21:34:04] thcipriani: bug with a change we made to use the proper ObjectCache instead of $wgMemc ... i can have a patch in 2sec [21:34:12] * ebernhardson really wants static analysis in the CI [21:35:22] thcipriani: https://gerrit.wikimedia.org/r/#/c/273370/ (or i can get someone like max to +2) [21:36:40] * thcipriani shakes fist at closures [21:38:35] (03PS1) 10BBlack: wikimedia-common VCL: small whitespace output fixup [puppet] - 10https://gerrit.wikimedia.org/r/273371 [21:38:52] ebernhardson: do you have a cherry pick to wmf.14? [21:38:55] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: puppet fail [21:38:55] (03CR) 10BBlack: [C: 032 V: 032] wikimedia-common VCL: small whitespace output fixup [puppet] - 10https://gerrit.wikimedia.org/r/273371 (owner: 10BBlack) [21:38:59] thcipriani: not yet, but can make one [21:39:16] thcipriani: well, i accidently submitted to wmf.14 first, then chery picked to master and abandoned [21:39:38] thcipriani: i suppose should have just left it around: https://gerrit.wikimedia.org/r/273369 [21:40:51] (03PS4) 10BBlack: upload VCL: refactor a bit [puppet] - 10https://gerrit.wikimedia.org/r/273216 [21:41:02] ebernhardson: heh, yeah, if you can re-cherry-pick, I'll +2 and sync [21:41:02] (03CR) 10BBlack: [C: 032 V: 032] upload VCL: refactor a bit [puppet] - 10https://gerrit.wikimedia.org/r/273216 (owner: 10BBlack) [21:42:16] thcipriani: it just had to be restored, you can +2 it now [21:42:52] done. [21:43:13] Krenair: remember the wikitech page we had for Windows users? how to ssh with putty etc? [21:43:22] cant find it anymore ?? [21:43:32] no [21:43:42] i soooo expected the result when searching for "putty" [21:43:46] who wants to use windows? [21:43:46] i know we had it [21:43:50] (03PS4) 10BBlack: VCL: re-arrange subs into standard ordering [puppet] - 10https://gerrit.wikimedia.org/r/273217 [21:43:57] (03CR) 10BBlack: [C: 032 V: 032] VCL: re-arrange subs into standard ordering [puppet] - 10https://gerrit.wikimedia.org/r/273217 (owner: 10BBlack) [21:43:59] (03PS2) 10Gehel: Expose elasticsearch through HTTP [puppet] - 10https://gerrit.wikimedia.org/r/273254 (https://phabricator.wikimedia.org/T124444) [21:44:11] Krenair: pen testers [21:44:11] (03PS2) 10Andrew Bogott: Lower max-mthreads for pdns recursor. [puppet] - 10https://gerrit.wikimedia.org/r/273314 (https://phabricator.wikimedia.org/T124680) [21:44:58] (03CR) 10jenkins-bot: [V: 04-1] Expose elasticsearch through HTTP [puppet] - 10https://gerrit.wikimedia.org/r/273254 (https://phabricator.wikimedia.org/T124444) (owner: 10Gehel) [21:45:34] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [21:45:58] Krenair: oh, i know why search would not find it.. it's in the "Help:" namespace,, and not searching "Help" by default makes sense [21:46:08] mutante, ... the pentesters can't figure out ssh? [21:46:23] thcipriani: Since you were asking about session errors earlier, here are explanations of some of the current top ones: "Unverified user provided and no metadata to auth it" usually means they didn't check "remember me" when logging in and their session is now expired from redis. "Metadata merge failed" means that the session in redis is for CentralAuth but the code thinks it's local-only or vice versa. "User ID mismatch" means the cookies [21:46:23] somehow got out of sync with the session data in redis. "Failed to create empty session: {exception}" means it tried to reuse an existing session ID, but that session still exists and just isn't valid anymore (the code will then generate a new ID if necessary). [21:46:44] Krenair: yes, that's the case [21:46:54] :| [21:47:56] andrewbogott: fascinating, that pdns fix... I wonder if that's the cause of some recursor query failovers in prod too [21:48:01] anomie: thanks for the explanation. [21:49:17] !log Upgraded Grafana to v3.0.0-pre1. [21:49:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:50:24] ori: hey thank you for handling that! I think it learned ElasticSearch has a backend (yum graphs for logstash) [21:50:40] hashar: yes! exciting [21:50:53] andrewbogott: although in practice, our recursors in prod only seem to allocate two long-lived pthreads under the main task [21:51:29] oh maybe mtasker threads are not pthreads, they're some kind of coroutine/green threads or whatever [21:51:48] bblack: yeah, I didn’t count threads for the labs case, it just fits the ‘sort of looks like a dos without all that much traffic’ symptom. [21:52:17] bblack: it seems safe to merge for production, though, since having the limit at >1024 can’t have been doing any good [21:52:20] yeah I don't think they're OS pthreads, so you can't really tell how many it's using from the outside [21:52:26] andrewbogott: agreed [21:52:49] (03CR) 10BBlack: [C: 031] Lower max-mthreads for pdns recursor. [puppet] - 10https://gerrit.wikimedia.org/r/273314 (https://phabricator.wikimedia.org/T124680) (owner: 10Andrew Bogott) [21:54:01] bblack: should I just merge now? Do you have the bandwidth to babysit it in non-labs cases? (I’m looking at all the places it’s applied and it’s all hosts I’ve never touched. nescio? acamar? hydrogen?) [21:54:21] ori: please announce it somewhere once happy/done ;) I think Bryan/Chad went up with logstash to emit statsd metric so we can get graphs such as https://grafana.wikimedia.org/dashboard/db/production-logging [21:54:27] "nescio" always makes me feel so socratic [21:54:42] (03CR) 10Alex Monk: [C: 031] Lower max-mthreads for pdns recursor. [puppet] - 10https://gerrit.wikimedia.org/r/273314 (https://phabricator.wikimedia.org/T124680) (owner: 10Andrew Bogott) [21:55:49] hashar: those logstash derived metrics in graphite will be a lot faster than hitting Elasticsearch directly for things to graph [21:56:38] 6Operations, 10Mail, 10Wikipedia-Store: why is shop@ -> board@ ? - https://phabricator.wikimedia.org/T127503#2064898 (10Dzahn) [21:57:05] bd808: yeah I can imagine. Still wondering how creative we can be with graphing logstash data [21:57:14] 6Operations, 10Mail, 10Wikipedia-Store: why is shop@ -> board@ ? - https://phabricator.wikimedia.org/T127503#2045472 (10Dzahn) Hi Wikipedia-Store people, do you know anything about the email address shop@wikimedia and why it would forward to the board@? Can it be removed? [21:57:36] if you get too creative you will just crash our little elasticsearch servers ;) [21:57:53] ;--(( [21:58:20] ebernhardson: syncing change now [21:58:35] speaking of that, will reply to your nice mail about ci/logstash tomorrow. Will talk about it during our next week meeting and see what happens from there ;} nice input for sure, thank you [21:58:46] yw [21:58:59] (03PS4) 10BBlack: vcl_error: single definition [puppet] - 10https://gerrit.wikimedia.org/r/273220 [21:59:01] (03PS4) 10BBlack: VCL: move role include up [puppet] - 10https://gerrit.wikimedia.org/r/273221 [21:59:03] (03PS4) 10BBlack: VCL: rename cluster-common funcs for clarity [puppet] - 10https://gerrit.wikimedia.org/r/273222 [21:59:05] (03PS4) 10BBlack: text VCL: clean up 1be misspass mangling [puppet] - 10https://gerrit.wikimedia.org/r/273223 [21:59:07] (03PS4) 10BBlack: vcl_recv: single definition in wikimedia.vcl [puppet] - 10https://gerrit.wikimedia.org/r/273218 [21:59:09] (03PS4) 10BBlack: vcl_(hash|hit|miss|pass|fetch|deliver): single definition [puppet] - 10https://gerrit.wikimedia.org/r/273219 [21:59:22] !log thcipriani@tin Synchronized php-1.27.0-wmf.14/extensions/CirrusSearch/includes/InterwikiSearcher.php: Fix undefined variable $term in InterwikiSearcher [[gerrit:273369]] (duration: 01m 08s) [21:59:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:59:43] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [22:00:25] 6Operations, 10Mail: move travel related aliases to OIT - https://phabricator.wikimedia.org/T127549#2064923 (10Dzahn) I have opened a Zendesk ticket for this Your request (#10245) has been received, and will be reviewed by our support staff soon. [22:01:10] 6Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#2064928 (10Dzahn) [22:02:28] 6Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#2045104 (10Dzahn) Hi Fundraising people, can you tell us if the mail aliases above (problemsdonating@ and similar) are still in use? We would just like to move those over to OIT if tha... [22:05:31] (03PS1) 10BryanDavis: Monolog: Add processor for XFF resolved IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273376 (https://phabricator.wikimedia.org/T114700) [22:05:33] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:06:18] !log turning puppet back on for cp*, pushing changes through https://gerrit.wikimedia.org/r/273217 to all [22:06:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:08:43] 6Operations, 10Mail: status of studentgroups@ and studentclubs@ mail aliases? - https://phabricator.wikimedia.org/T127550#2046289 (10Dzahn) Hi @FloorKoudijs you have been recommend to me as a person who might know more about these. Are these mail addresses still used? Best regards, Daniel [22:12:08] 6Operations, 6Community-Tech, 10Mail: delete communicationsintern@ mail alias ? - https://phabricator.wikimedia.org/T127546#2064998 (10Dzahn) [22:13:37] 6Operations, 10Mail: move travel related aliases to OIT - https://phabricator.wikimedia.org/T127549#2046275 (10bbogaert) Hi Daniel, I have created the groups in LDAP to be Google Groups. Thanks, Byron [22:28:03] 6Operations, 10Mail: delete communicationsintern@ mail alias ? - https://phabricator.wikimedia.org/T127546#2046230 (10Dzahn) Added some random people in Communications. Hi comms people, do you know about this email address communicationsintern@wikimedia and if it's still in use? Best, Daniel [22:33:03] !log ori@tin Synchronized php-1.27.0-wmf.14/extensions/CentralAuth: I2cfcbf98f3: Reduce memcache traffic for central session storage (duration: 01m 21s) [22:33:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:33:28] !log disabling puppet on caches for more scary VCL merges [22:33:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:34:28] (03CR) 10BBlack: [C: 032 V: 032] vcl_recv: single definition in wikimedia.vcl [puppet] - 10https://gerrit.wikimedia.org/r/273218 (owner: 10BBlack) [22:34:43] (03CR) 10BBlack: [C: 032 V: 032] vcl_(hash|hit|miss|pass|fetch|deliver): single definition [puppet] - 10https://gerrit.wikimedia.org/r/273219 (owner: 10BBlack) [22:34:53] (03CR) 10BBlack: [C: 032 V: 032] vcl_error: single definition [puppet] - 10https://gerrit.wikimedia.org/r/273220 (owner: 10BBlack) [22:35:06] (03CR) 10BBlack: [C: 032 V: 032] VCL: move role include up [puppet] - 10https://gerrit.wikimedia.org/r/273221 (owner: 10BBlack) [22:39:44] (03PS5) 10Krinkle: Raise file upload limit to 2047MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266544 (https://phabricator.wikimedia.org/T116514) (owner: 10TheDJ) [22:42:13] 6Operations, 10Mail: who should receive fellows@ email? - https://phabricator.wikimedia.org/T127547#2065117 (10Dzahn) Asaf told me it was discontinued in 2013 and most likely isn't used. I also mailed Siko about it. [22:43:04] (03PS1) 10BBlack: Revert "VCL: move role include up" [puppet] - 10https://gerrit.wikimedia.org/r/273378 [22:43:06] (03PS1) 10BBlack: VCL: move role include up, attempt #2 [puppet] - 10https://gerrit.wikimedia.org/r/273379 [22:43:08] (03PS3) 10Andrew Bogott: Lower max-mthreads for pdns recursor. [puppet] - 10https://gerrit.wikimedia.org/r/273314 (https://phabricator.wikimedia.org/T124680) [22:43:33] (03CR) 10BBlack: [C: 032 V: 032] Revert "VCL: move role include up" [puppet] - 10https://gerrit.wikimedia.org/r/273378 (owner: 10BBlack) [22:43:43] (03CR) 10BBlack: [C: 032 V: 032] VCL: move role include up, attempt #2 [puppet] - 10https://gerrit.wikimedia.org/r/273379 (owner: 10BBlack) [22:45:28] 6Operations, 10Mail: status of wikigroup@ alias - https://phabricator.wikimedia.org/T127551#2065119 (10Dzahn) mailed Doreen Dunican [22:45:39] (03PS4) 10Andrew Bogott: Lower max-mthreads for pdns recursor. [puppet] - 10https://gerrit.wikimedia.org/r/273314 (https://phabricator.wikimedia.org/T124680) [22:45:43] * andrewbogott chases the tip [22:45:49] sorry! [22:46:31] it’s ok, it just makes the merge that much more satisfying [22:47:29] (03CR) 10Andrew Bogott: [C: 032] Lower max-mthreads for pdns recursor. [puppet] - 10https://gerrit.wikimedia.org/r/273314 (https://phabricator.wikimedia.org/T124680) (owner: 10Andrew Bogott) [22:48:44] 6Operations, 10Mail: status of fdcsupport@ ? - https://phabricator.wikimedia.org/T127548#2065125 (10Dzahn) mailed Katy Love and Winifred [22:49:52] (03PS5) 10BBlack: VCL: rename cluster-common funcs for clarity [puppet] - 10https://gerrit.wikimedia.org/r/273222 [22:50:01] (03CR) 10BBlack: [C: 032 V: 032] VCL: rename cluster-common funcs for clarity [puppet] - 10https://gerrit.wikimedia.org/r/273222 (owner: 10BBlack) [22:50:11] 6Operations, 6Labs, 10Mail, 10Tool-Labs: remove toolserver mail aliases - https://phabricator.wikimedia.org/T127543#2046186 (10Dzahn) Hey @Yuvipanda should we delete that ? ts-admins@wikimedia ? [22:51:07] (03PS1) 10Andrew Bogott: Crap, this breaks the recursor on startup somehow. Seemed so straightforward... [puppet] - 10https://gerrit.wikimedia.org/r/273381 [22:51:24] (03PS5) 10BBlack: text VCL: clean up 1be misspass mangling [puppet] - 10https://gerrit.wikimedia.org/r/273223 [22:51:36] (03CR) 10BBlack: [C: 032 V: 032] text VCL: clean up 1be misspass mangling [puppet] - 10https://gerrit.wikimedia.org/r/273223 (owner: 10BBlack) [22:52:03] (03CR) 10jenkins-bot: [V: 04-1] Crap, this breaks the recursor on startup somehow. Seemed so straightforward... [puppet] - 10https://gerrit.wikimedia.org/r/273381 (owner: 10Andrew Bogott) [22:52:52] heh [22:53:20] andrewbogott: you got a jenkins -1 because the tests failed, because they can't connect to gallium, because DNS is broken, because of the change you're trying to revert :) [22:53:26] (03PS1) 10Andrew Bogott: Stupid typo [puppet] - 10https://gerrit.wikimedia.org/r/273382 [22:53:35] bblack: yeah, I’m on it, ^ [22:53:44] (03Abandoned) 10Andrew Bogott: Crap, this breaks the recursor on startup somehow. Seemed so straightforward... [puppet] - 10https://gerrit.wikimedia.org/r/273381 (owner: 10Andrew Bogott) [22:53:53] (03PS2) 10Andrew Bogott: Stupid typo [puppet] - 10https://gerrit.wikimedia.org/r/273382 [22:54:03] (03CR) 10Andrew Bogott: [C: 032 V: 032] Stupid typo [puppet] - 10https://gerrit.wikimedia.org/r/273382 (owner: 10Andrew Bogott) [22:55:14] PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:55:34] PROBLEM - Recursive DNS on 2620:0:862:1:91:198:174:122 is CRITICAL: CRITICAL - Plugin timed out while executing system call [22:55:35] PROBLEM - Recursive DNS on 91.198.174.122 is CRITICAL: CRITICAL - Plugin timed out while executing system call [22:56:50] if only, if only I could log in to that box... [22:56:55] RECOVERY - puppet last run on labservices1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:57:01] (03CR) 10Gergő Tisza: "According to Ori getIP is nontrivially slow (looking at a random xhgui page, something like 5ms), and it doesn't seem like we are currentl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273376 (https://phabricator.wikimedia.org/T114700) (owner: 10BryanDavis) [22:58:06] andrewbogott: i'm on labservices1001, can i help? [22:58:15] mutante: nah, all better [22:58:20] okay [22:58:27] it was 91.198.174.122 I was fretting about, I just had to set up a proxycommand [22:58:41] ah [22:58:54] RECOVERY - Recursive DNS on 2620:0:862:1:91:198:174:122 is OK: DNS OK: 0.138 seconds response time. www.wikipedia.org returns 91.198.174.192 [22:58:55] RECOVERY - Recursive DNS on 91.198.174.122 is OK: DNS OK: 0.146 seconds response time. www.wikipedia.org returns 91.198.174.192 [22:58:55] it had the bad luck to run puppet during the 5 minutes when my config had a typo [22:59:07] and there it is [23:00:22] yep :) [23:01:14] I bet that 95% of outages are caused by patches that are ‘too simple to test' [23:02:00] 6Operations, 10Mail: move travel related aliases to OIT - https://phabricator.wikimedia.org/T127549#2065149 (10Dzahn) Hi Byron, I appreciate the quick response. I have removed these on our side. Thanks, Daniel ``` -## Travel (RT #3552) -travel: ddunican, kzwicker - -## Travel approvals (RT #4255) -travela... [23:02:13] 6Operations, 10Mail: move travel related aliases to OIT - https://phabricator.wikimedia.org/T127549#2065150 (10Dzahn) 5Open>3Resolved [23:02:15] 6Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#2065151 (10Dzahn) [23:02:38] 6Operations, 10Mail: status of fdcsupport@ ? - https://phabricator.wikimedia.org/T127548#2065153 (10Dzahn) Winifred has confirmed this is still used. [23:03:59] andrewbogott: yes, when the commit message has certain keywords like "trivial" or "one way to find out", that immediately makes it more likely to break stuff :) [23:04:30] (03PS1) 10BBlack: text VCL: misspass_mangle: fix bitscompat retval... [puppet] - 10https://gerrit.wikimedia.org/r/273385 [23:04:55] (03CR) 10BBlack: [C: 032 V: 032] text VCL: misspass_mangle: fix bitscompat retval... [puppet] - 10https://gerrit.wikimedia.org/r/273385 (owner: 10BBlack) [23:04:57] it's just Murphy [23:14:29] (03CR) 10Aude: "the Graph namespace appears unused on mediawiki.org and other places, so at least from that perspective, think the change is ok" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273283 (owner: 10Yurik) [23:16:10] aude, thx [23:16:37] !log turning puppet back on for cp*, pushing changes through https://gerrit.wikimedia.org/r/#/c/273385/ [23:16:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:17:44] 6Operations, 10Traffic, 10domains, 13Patch-For-Review: figure out if we can park wicipediacymraeg.org - https://phabricator.wikimedia.org/T128085#2065197 (10Dzahn) contacted Robin Owain via WMUK wiki https://wikimedia.org.uk/wiki/User_talk:Robin_Owain_%28WMUK%29 [23:19:42] (03CR) 10Dzahn: [C: 031] "yes, good idea, more than 1 but right under 2. per T116514#1990803" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266544 (https://phabricator.wikimedia.org/T116514) (owner: 10TheDJ) [23:20:14] bblack: ^ that addresses your concerns over the 1024 x 1024 x 2048 limit ^ [23:20:34] but is still helping uploaders a lot, many videos are >1GB but under that [23:21:16] mutante: yeah I'm generally ok with it, I just haven't tried to push it myself. It needs someone to walk it through SWAT [23:21:24] feel free! :) [23:21:43] ok! [23:21:44] yurik: i'm still curious and interested in how we will store data, including for graphs [23:22:10] but maybe a data namespace or such will need to be something different [23:22:26] aude, https://meta.wikimedia.org/wiki/User:Yurik/Storing_data [23:22:35] * aude nods :) [23:22:40] aude, and see the two links at the top [23:23:05] yeah [23:30:05] 6Operations, 10Mail: who should receive fellows@ email? - https://phabricator.wikimedia.org/T127547#2065266 (10Dzahn) [23:32:27] 6Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#2065271 (10Dzahn) [23:32:29] 6Operations, 10Mail: who should receive fellows@ email? - https://phabricator.wikimedia.org/T127547#2065269 (10Dzahn) 5Open>3Resolved Siko confirmed as well that this can be removed. done ``` -## Community ## -fellows: sbouterse - ``` [23:37:53] 6Operations, 10DNS, 10Traffic, 10Wikimedia-Site-Requests, and 3 others: Move oldwikisource on www.wikisource.org to mul.wikisource.org - https://phabricator.wikimedia.org/T64717#2065278 (10Dzahn) Oh, i didn't see the revert, you are absolutely right. I did not mean to add confusion. [23:38:00] jouncebot: refresh [23:38:01] I refreshed my knowledge about deployments. [23:40:31] jouncebot: relax [23:40:42] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#1425614 (10Dzahn) Looks to me like this ticket is either about editing wiki pages or about fixing things on the Blog side. Both of these don't really need Op... [23:41:32] 6Operations, 10Mail: status of wikigroup@ alias - https://phabricator.wikimedia.org/T127551#2065288 (10Dzahn) [23:43:27] 6Operations, 10Mail: status of wikigroup@ alias - https://phabricator.wikimedia.org/T127551#2046304 (10Dzahn) Doreen has told me this is still used. [23:46:01] 6Operations, 6Research-and-Data, 10Wikimedia-Mailing-lists: Close / Archive rcom-l - https://phabricator.wikimedia.org/T128141#2065295 (10DarTar) [23:55:49] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065314 (10jrbs) >>! In T104728#2065282, @Dzahn wrote: > Looks to me like this ticket is either about editing wiki pages or about fixing things on the Blog s... [23:57:16] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065328 (10Krenair) >>! In T104728#2065282, @Dzahn wrote: > Looks to me like this ticket is either about editing wiki pages or about fixing things on the Blo... [23:59:40] (03PS3) 10CSteipp: Password policies for advanced permission groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272660 (https://phabricator.wikimedia.org/T119100)