[00:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T0000). [00:00:04] RoanKattouw and tgr: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:02:49] I'll do the SWAT today [00:03:16] (03PS2) 10Catrope: Enable logging for GrowthExperiments help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482743 (https://phabricator.wikimedia.org/T211991) [00:03:21] (03CR) 10Catrope: [C: 03+2] Enable logging for GrowthExperiments help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482743 (https://phabricator.wikimedia.org/T211991) (owner: 10Catrope) [00:04:27] (03Merged) 10jenkins-bot: Enable logging for GrowthExperiments help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482743 (https://phabricator.wikimedia.org/T211991) (owner: 10Catrope) [00:10:40] 10Operations, 10Cloud-Services, 10Patch-For-Review: host-vmem.erb is doing operations that make no sense - https://phabricator.wikimedia.org/T167412 (10bd808) Notes for my own future sanity: * `h_vmem` in grid engine means "hard virtual memory limit" * A job has an `h_vmem` reservation value that is explicit... [00:11:46] (03CR) 10jenkins-bot: Enable logging for GrowthExperiments help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482743 (https://phabricator.wikimedia.org/T211991) (owner: 10Catrope) [00:13:02] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable EventLogging for GrowthExperiments help panel (T211991) (duration: 00m 54s) [00:13:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:13:06] T211991: Help Panel: Implement production configuration for Test, Czech and Korean wikis - https://phabricator.wikimedia.org/T211991 [00:15:04] James_F: is the ATOMIC_CANCELABLE bug on testcommonswiki reproducible? [00:15:29] tgr: don’t think so. [00:20:09] !log catrope@deploy1001 Synchronized php-1.33.0-wmf.12/extensions/GrowthExperiments/: Help panel fixes (T212973, T212890, T213186) (duration: 00m 54s) [00:20:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:20:14] T212973: Audit HelpPanelLogger.prototype.getEditor and make any fixes as needed - https://phabricator.wikimedia.org/T212973 [00:20:15] T213186: Help Panel: Posting to monthly help desk archive (kowiki) - https://phabricator.wikimedia.org/T213186 [00:20:15] T212890: Help panel: design - CTA on mobile view should be circular - https://phabricator.wikimedia.org/T212890 [00:22:45] (03PS5) 10Smalyshev: Puppetize blazegraph config for cases where deployed one is not enough [puppet] - 10https://gerrit.wikimedia.org/r/483310 (https://phabricator.wikimedia.org/T213212) [00:25:57] (03PS1) 10BryanDavis: toolforge: hard code h_vmem count for exec queue nodes [puppet] - 10https://gerrit.wikimedia.org/r/483317 (https://phabricator.wikimedia.org/T167412) [00:27:48] (03CR) 10BryanDavis: toolforge: hard code h_vmem count for exec queue nodes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483317 (https://phabricator.wikimedia.org/T167412) (owner: 10BryanDavis) [00:28:02] tgr, James_F: The ATOMIC_CANCELABLE patch is on mwdebug1002 now, is it at all testable? [00:28:19] not really, I'll do an upload just in case [00:29:53] (03CR) 10Bstorm: "> Patch Set 1:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483317 (https://phabricator.wikimedia.org/T167412) (owner: 10BryanDavis) [00:30:24] (03PS5) 10Dzahn: puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 [00:32:03] (03CR) 10Bstorm: [C: 03+2] toolforge: hard code h_vmem count for exec queue nodes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483317 (https://phabricator.wikimedia.org/T167412) (owner: 10BryanDavis) [00:33:20] RoanKattouw: looks good [00:34:41] !log catrope@deploy1001 Synchronized php-1.33.0-wmf.12/includes/MovePage.php: Fix missing ATOMIC_CANCELABLE in MovePage::move() (T213168) (duration: 00m 53s) [00:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:43] T213168: Move file page to new name that exists on commons throws Wikimedia\Rdbms\DBTransactionStateError on zhwiki - https://phabricator.wikimedia.org/T213168 [00:34:57] (03CR) 10Dzahn: "it compiles now since i added the change in configmaster as well: https://puppet-compiler.wmflabs.org/compiler1002/14249/" [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [00:37:22] (03PS6) 10Dzahn: puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 [00:40:09] (03PS1) 10Catrope: Help panel: Set help desk page correctly on kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483318 (https://phabricator.wikimedia.org/T213186) [00:40:53] (03CR) 10Catrope: [C: 03+2] Help panel: Set help desk page correctly on kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483318 (https://phabricator.wikimedia.org/T213186) (owner: 10Catrope) [00:41:57] (03Merged) 10jenkins-bot: Help panel: Set help desk page correctly on kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483318 (https://phabricator.wikimedia.org/T213186) (owner: 10Catrope) [00:43:50] 10Operations, 10Patch-For-Review: Reallocate former image scalers - https://phabricator.wikimedia.org/T192457 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2151.codfw.wmnet'] ` and were **ALL** successful. [00:45:57] (03CR) 10jenkins-bot: Help panel: Set help desk page correctly on kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483318 (https://phabricator.wikimedia.org/T213186) (owner: 10Catrope) [00:58:20] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Configure help desk page for help panel correctly on kowiki (T213186) (duration: 00m 53s) [00:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:58:23] T213186: Help Panel: Posting to monthly help desk archive (kowiki) - https://phabricator.wikimedia.org/T213186 [01:00:04] twentyafterfour: I, the Bot under the Fountain, allow thee, The Deployer, to do Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T0100). [01:16:34] did any notable deploys happen at 1pm UTC 9th Jan (my today?) [01:16:48] or around that time [01:18:02] (03PS1) 10Catrope: Revert "Help panel: Set help desk page correctly on kowiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483322 [01:18:06] (03CR) 10Catrope: [C: 03+2] Revert "Help panel: Set help desk page correctly on kowiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483322 (owner: 10Catrope) [01:19:11] (03Merged) 10jenkins-bot: Revert "Help panel: Set help desk page correctly on kowiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483322 (owner: 10Catrope) [01:21:39] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert latest config patch (caused fatal errors on kowiki) (duration: 00m 52s) [01:21:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:25:43] (03CR) 10jenkins-bot: Revert "Help panel: Set help desk page correctly on kowiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483322 (owner: 10Catrope) [01:27:18] (03PS1) 10Catrope: Revert "Revert "Help panel: Set help desk page correctly on kowiki"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483323 [01:27:28] (03CR) 10Catrope: [C: 04-1] "Do not deploy until wmf.12 is on kowiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483323 (owner: 10Catrope) [01:35:44] (03PS2) 10Catrope: Update GrowthExperiments config for proportion->percentage change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 [01:36:08] (03CR) 10Catrope: [C: 03+2] Update GrowthExperiments config for proportion->percentage change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 (owner: 10Catrope) [01:36:15] (03PS3) 10Catrope: Update GrowthExperiments config for proportion->percentage change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 [01:36:19] (03CR) 10Catrope: Update GrowthExperiments config for proportion->percentage change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 (owner: 10Catrope) [01:36:22] (03CR) 10Catrope: [C: 03+2] Update GrowthExperiments config for proportion->percentage change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 (owner: 10Catrope) [01:36:51] (03PS2) 10Catrope: Enable GrowthExperiments help panel on cswiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482373 (https://phabricator.wikimedia.org/T211993) [01:36:56] (03PS2) 10Catrope: Enable GrowthExperiments help panel for 50% of new users on cswiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482374 (https://phabricator.wikimedia.org/T211993) [01:37:28] (03Merged) 10jenkins-bot: Update GrowthExperiments config for proportion->percentage change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 (owner: 10Catrope) [01:38:54] (03PS3) 10Catrope: Enable GrowthExperiments help panel for 50% of new users on cswiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482374 (https://phabricator.wikimedia.org/T211993) [01:38:56] (03CR) 10jenkins-bot: Update GrowthExperiments config for proportion->percentage change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 (owner: 10Catrope) [01:41:20] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Make GrowthExperiments config wmf.12-proof (duration: 00m 52s) [01:41:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:57:40] (03CR) 10MarkAHershberger: [C: 03+1] "I like the new look. Colors are better than these." [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [01:59:10] 10Operations, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): rebuild tools-grid-master as a large instance - https://phabricator.wikimedia.org/T162955 (10bd808) [02:35:56] 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog: Decide on handling system updates for Proton - https://phabricator.wikimedia.org/T213366 (10Tgr) [03:33:53] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 903.51 seconds [03:49:35] 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog, 10Traffic: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10Tgr) [04:06:21] (03PS2) 10Gergő Tisza: Remove AICaptcha settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481987 (https://phabricator.wikimedia.org/T186244) [04:34:18] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 264.91 seconds [05:42:28] (03CR) 10Jdlrobson: Disable reader trust survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476370 (https://phabricator.wikimedia.org/T209882) (owner: 10Bmansurov) [06:17:05] (03CR) 10星耀晨曦: [C: 04-1] Modifying configuration about Chinese Wikiversity: (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [06:22:46] (03CR) 10Krinkle: [C: 03+1] "LGTM, is there a patch for mw-config/scap/prep.py as well?" [puppet] - 10https://gerrit.wikimedia.org/r/480695 (owner: 10Tim Starling) [06:50:14] (03PS13) 10Krinkle: tests: Assert that extra namespaces have correspondent talk namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [06:51:20] (03CR) 10Krinkle: [C: 03+2] tests: Assert that extra namespaces have correspondent talk namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [06:52:24] (03Merged) 10jenkins-bot: tests: Assert that extra namespaces have correspondent talk namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [06:57:29] (03PS1) 10Gergő Tisza: Demistify $wmgMonologChannels Logstash debug level behavior [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483339 [06:59:11] 10Operations, 10ops-eqiad, 10User-Elukey: Heating alerts and broken RAM on kafka1014 - https://phabricator.wikimedia.org/T204479 (10elukey) [07:01:48] 10Operations, 10ops-eqiad, 10User-Elukey: Heating alerts and broken RAM on kafka1014 - https://phabricator.wikimedia.org/T204479 (10elukey) It is fine in here Daniel, thanks! In theory kafka1012->23 should be decommissioned when Event Gate (part of Modern Event Platform) will be up and running, since Mediawi... [07:04:20] (03CR) 10jenkins-bot: tests: Assert that extra namespaces have correspondent talk namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [07:06:00] (03CR) 10Krinkle: Demistify $wmgMonologChannels Logstash debug level behavior (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483339 (owner: 10Gergő Tisza) [07:09:42] (03PS4) 10Elukey: systemd::timer: allow more normal forms for datetime type [puppet] - 10https://gerrit.wikimedia.org/r/483085 (https://phabricator.wikimedia.org/T172532) [07:14:29] (03PS5) 10Elukey: systemd::timer: allow more normal forms for datetime type [puppet] - 10https://gerrit.wikimedia.org/r/483085 (https://phabricator.wikimedia.org/T172532) [07:19:29] (adding a bit more tests to --^) [07:23:38] (03CR) 10Gergő Tisza: Demistify $wmgMonologChannels Logstash debug level behavior (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483339 (owner: 10Gergő Tisza) [07:24:05] (03PS6) 10Elukey: systemd::timer: allow more normal forms for datetime type [puppet] - 10https://gerrit.wikimedia.org/r/483085 (https://phabricator.wikimedia.org/T172532) [07:24:17] (03PS2) 10Gergő Tisza: Demistify $wmgMonologChannels Logstash debug level behavior [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483339 [07:27:08] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/14252/ looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/483085 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [07:27:58] _joe_ anything against me merging --^ ? [07:28:06] tests + pcc looks good [07:28:24] <_joe_> elukey: one sec [07:29:12] (03CR) 10Giuseppe Lavagetto: [C: 03+1] systemd::timer: allow more normal forms for datetime type [puppet] - 10https://gerrit.wikimedia.org/r/483085 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [07:33:09] thankssss [07:33:21] (03CR) 10Elukey: [C: 03+2] systemd::timer: allow more normal forms for datetime type [puppet] - 10https://gerrit.wikimedia.org/r/483085 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [07:35:50] (03PS3) 10Elukey: profile::refinery::job::camus: conver netflow to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/483069 (https://phabricator.wikimedia.org/T172532) [07:43:46] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14253/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/483069 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [07:52:01] (03PS1) 10Elukey: camus::job: properly clean up crons when systemd is used [puppet] - 10https://gerrit.wikimedia.org/r/483345 (https://phabricator.wikimedia.org/T172532) [07:53:33] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14254/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/483345 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [07:58:59] (03CR) 10Mathew.onipe: [C: 03+1] wdqs: preliminary work to manage multiple instances (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [08:01:33] (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for passive Icinga node [puppet] - 10https://gerrit.wikimedia.org/r/483125 (https://phabricator.wikimedia.org/T135991) [08:03:47] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:06:04] (03CR) 10Muehlenhoff: [C: 03+2] Enable base::service_auto_restart for passive Icinga node [puppet] - 10https://gerrit.wikimedia.org/r/483125 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [08:08:18] an-coord is me [08:08:27] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational [08:10:39] (03PS4) 10Vgutierrez: certcentral: Allow specifying authorized hosts and regex in the config [software/certcentral] - 10https://gerrit.wikimedia.org/r/483163 (https://phabricator.wikimedia.org/T213301) [08:12:30] (03PS2) 10Muehlenhoff: Update Grafana package source [puppet] - 10https://gerrit.wikimedia.org/r/483159 [08:12:53] (03PS1) 10Elukey: profile::analytics::refinery::job::camus: move more crons to timers [puppet] - 10https://gerrit.wikimedia.org/r/483359 (https://phabricator.wikimedia.org/T172532) [08:33:00] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T213397 (10noarave) [08:35:56] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T213397 (10WMDE-leszek) [08:37:01] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T213397 (10WMDE-leszek) As Noa's manager, I endorse the request. [08:40:43] (03CR) 10Muehlenhoff: [C: 03+2] Update Grafana package source [puppet] - 10https://gerrit.wikimedia.org/r/483159 (owner: 10Muehlenhoff) [08:45:40] (03PS1) 10Elukey: systemd::timer::job: allow to specify SyslogIdentifier [puppet] - 10https://gerrit.wikimedia.org/r/483364 (https://phabricator.wikimedia.org/T172532) [08:46:57] (03CR) 10jerkins-bot: [V: 04-1] systemd::timer::job: allow to specify SyslogIdentifier [puppet] - 10https://gerrit.wikimedia.org/r/483364 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [08:49:30] ah! [08:51:36] (03PS2) 10Elukey: systemd::timer::job: allow to specify SyslogIdentifier [puppet] - 10https://gerrit.wikimedia.org/r/483364 (https://phabricator.wikimedia.org/T172532) [08:57:31] (03CR) 10Elukey: "noop: https://puppet-compiler.wmflabs.org/compiler1002/14256/" [puppet] - 10https://gerrit.wikimedia.org/r/483364 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [09:06:32] 10Operations, 10netops: migrate netinsights from rhenium to sulfur - https://phabricator.wikimedia.org/T212011 (10elukey) [09:12:19] (03PS1) 10Gilles: Revert ruwiki navtiming rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483369 (https://phabricator.wikimedia.org/T187299) [09:13:15] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T213397 (10Peachey88) [09:22:30] (03CR) 10Gilles: [C: 03+2] Revert ruwiki navtiming rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483369 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [09:24:08] (03Merged) 10jenkins-bot: Revert ruwiki navtiming rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483369 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [09:24:26] (03CR) 10Elukey: [C: 03+2] systemd::timer::job: allow to specify SyslogIdentifier [puppet] - 10https://gerrit.wikimedia.org/r/483364 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [09:27:33] (03CR) 10jenkins-bot: Revert ruwiki navtiming rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483369 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [09:27:53] (03PS2) 10Elukey: profile::analytics::refinery::job::camus: move more crons to timers [puppet] - 10https://gerrit.wikimedia.org/r/483359 (https://phabricator.wikimedia.org/T172532) [09:29:17] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14257/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/483359 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [09:29:22] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery::job::camus: move more crons to timers [puppet] - 10https://gerrit.wikimedia.org/r/483359 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [09:34:37] !log updated thirdparty/php72 component for stretch-wikimedia to 7.2.13 [09:34:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:47] ^_joe_ [09:35:28] <_joe_> did it work? [09:36:14] <_joe_> it did! [09:36:16] <_joe_> thanks a lot [09:36:26] yeah, after I fixed the broken grafana config, imported new grafana key and fixing the docs on wikitech (still WIP), it now works again :-) [09:43:22] 10Operations, 10Wikidata, 10Wikidata-Query-Service: Create a cookbook to copy data between WDQS servers - https://phabricator.wikimedia.org/T213401 (10Gehel) [09:45:23] !log gilles@deploy1001 Synchronized tests/InitialiseSettingsTest.php: T211395 T211529 tests: Assert that extra namespaces have correspondent talk namespaces (duration: 00m 56s) [09:45:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:27] T211529: ApiQuerySiteinfo.php: PHP Notice: Undefined index: 103 - https://phabricator.wikimedia.org/T211529 [09:45:27] T211395: Check if all extra namespaces have correspondent talk namespace - https://phabricator.wikimedia.org/T211395 [09:52:48] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T187299 Decrease ruwiki navtiming rate (duration: 00m 52s) [09:52:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:51] T187299: User-perceived page load performance study - https://phabricator.wikimedia.org/T187299 [09:54:50] 10Operations, 10Patch-For-Review: Please import php-xdebug to apt.wm.o thirdparty/php72 - https://phabricator.wikimedia.org/T212757 (10Joe) php7.2-xdebug is now available in our repository: ` $ sudo apt show php-xdebug Package: php-xdebug Version: 2.7.0~beta1+2.6.1+2.5.5-2+0~20181019070242.5+stretch~1.gbp7041... [10:00:54] (03PS1) 10Gilles: Set CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483377 (https://phabricator.wikimedia.org/T209857) [10:04:17] (03PS2) 10Muehlenhoff: Swift: Drop support for older distros [puppet] - 10https://gerrit.wikimedia.org/r/482593 [10:06:04] (03CR) 10Gilles: [C: 03+2] Set CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483377 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [10:07:45] (03Merged) 10jenkins-bot: Set CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483377 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [10:08:07] 10Operations, 10Patch-For-Review: Please import php-xdebug to apt.wm.o thirdparty/php72 - https://phabricator.wikimedia.org/T212757 (10Joe) 05Open→03Resolved [10:10:02] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T209857 Run CPU benchmark for a portion of navtiming pageloads (duration: 00m 53s) [10:10:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:05] T209857: Create ISP ranking based on RUM data - https://phabricator.wikimedia.org/T209857 [10:10:51] 10Operations, 10Patch-For-Review: Please import php-xdebug to apt.wm.o thirdparty/php72 - https://phabricator.wikimedia.org/T212757 (10MoritzMuehlenhoff) @Legoktm : FYI, the update also rebased the php72 component to 7.2.13. [10:12:00] (03CR) 10Muehlenhoff: [C: 03+2] Swift: Drop support for older distros [puppet] - 10https://gerrit.wikimedia.org/r/482593 (owner: 10Muehlenhoff) [10:12:06] (03PS4) 10Fdans: Changes wording in uniques dump link to reflect project families [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) [10:12:35] (03CR) 10Fdans: [C: 03+1] Changes wording in uniques dump link to reflect project families [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) (owner: 10Fdans) [10:14:12] (03PS5) 10Elukey: Changes wording in uniques dump link to reflect project families [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) (owner: 10Fdans) [10:14:56] (03CR) 10Elukey: [C: 03+2] Changes wording in uniques dump link to reflect project families [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) (owner: 10Fdans) [10:16:21] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, all uses of profile::conftool::client use jessie or stretch." [puppet] - 10https://gerrit.wikimedia.org/r/483226 (owner: 10Dzahn) [10:19:48] (03CR) 10jenkins-bot: Set CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483377 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [10:20:40] (03CR) 10Hashar: Attempt to pull images before building (031 comment) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475843 (https://phabricator.wikimedia.org/T200720) (owner: 10Hashar) [10:25:38] (03PS1) 10Elukey: profile::analytics::refinery::job::camus: move all crons to timers [puppet] - 10https://gerrit.wikimedia.org/r/483378 (https://phabricator.wikimedia.org/T172532) [10:25:49] (03CR) 10Alexandros Kosiaris: [C: 03+2] mathoid: Move config.yaml into a template (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/483184 (owner: 10Alexandros Kosiaris) [10:26:07] (03CR) 10jerkins-bot: [V: 04-1] profile::analytics::refinery::job::camus: move all crons to timers [puppet] - 10https://gerrit.wikimedia.org/r/483378 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [10:26:41] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T209857 Run CPU benchmark for a portion of navtiming pageloads (duration: 00m 52s) [10:26:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:26:44] T209857: Create ISP ranking based on RUM data - https://phabricator.wikimedia.org/T209857 [10:27:44] (03PS2) 10Elukey: profile::analytics::refinery::job::camus: move all crons to timers [puppet] - 10https://gerrit.wikimedia.org/r/483378 (https://phabricator.wikimedia.org/T172532) [10:27:51] 10Operations, 10ORES, 10Scoring-platform-team, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Backlog): The continuous release pipeline should support more than one service per repo - https://phabricator.wikimedia.org/T210267 (10Ladsgroup) >>! In T210267#4860670, @Ottomata wrote: > Q: would bl... [10:28:33] (03PS1) 10Muehlenhoff: Remove support for older distros in some Apache classes [puppet] - 10https://gerrit.wikimedia.org/r/483380 [10:28:34] (03PS1) 10Muehlenhoff: hhvm: Remove support for trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/483381 [10:36:52] PROBLEM - toolschecker: Verify internal DNS from within Tools on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/labs-dns/private - 341 bytes in 0.004 second response time [10:37:01] PROBLEM - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 341 bytes in 0.005 second response time [10:37:26] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery::job::camus: move all crons to timers [puppet] - 10https://gerrit.wikimedia.org/r/483378 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [10:37:44] ??? [10:38:08] ACKNOWLEDGEMENT - toolschecker service itself needs to return OK on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/self - 341 bytes in 0.004 second response time GTirloni T213252 [10:38:09] ACKNOWLEDGEMENT - toolschecker: All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 341 bytes in 0.003 second response time GTirloni T213252 [10:38:09] ACKNOWLEDGEMENT - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: connect to address checker.tools.wmflabs.org and port 80: Connection refused GTirloni T213252 [10:38:09] ACKNOWLEDGEMENT - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 341 bytes in 0.003 second response time GTirloni T213252 [10:38:10] ACKNOWLEDGEMENT - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 341 bytes in 0.002 second response time GTirloni T213252 [10:38:11] ACKNOWLEDGEMENT - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 341 bytes in 0.005 second response time GTirloni T213252 [10:38:11] ACKNOWLEDGEMENT - toolschecker: Redis set/get on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/redis - 341 bytes in 0.002 second response time GTirloni T213252 [10:38:12] ACKNOWLEDGEMENT - toolschecker: Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: connect to address checker.tools.wmflabs.org and port 80: Connection refused GTirloni T213252 [10:38:13] ACKNOWLEDGEMENT - toolschecker: Test LDAP for query on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/ldap - 341 bytes in 0.003 second response time GTirloni T213252 [10:38:14] ACKNOWLEDGEMENT - toolschecker: Verify internal DNS from within Tools on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/labs-dns/private - 341 bytes in 0.004 second response time GTirloni T213252 [10:38:15] ACKNOWLEDGEMENT - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 341 bytes in 0.002 second response time GTirloni T213252 [10:38:15] ACKNOWLEDGEMENT - toolschecker: showmount succeeds on a labs instance on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/nfs/secondary_cluster_showmount - 341 bytes in 0.003 second response time GTirloni T213252 [10:38:16] RECOVERY - toolschecker: Verify internal DNS from within Tools on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 3.196 second response time [10:38:21] RECOVERY - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.014 second response time [10:38:21] uh [10:38:27] ouch [10:38:40] sorry for the noise [10:38:47] gtirloni: something "expected"? [10:40:43] (03CR) 10Muehlenhoff: "https://puppet-compiler.wmflabs.org/compiler1002/14259/" [puppet] - 10https://gerrit.wikimedia.org/r/483381 (owner: 10Muehlenhoff) [10:41:23] volans arturo: kind of, some trouble with the `toolschecker` app that incinga connects to (kind of a `node-exporter-blackbox` sort of thing) but it should be fine now. it had trouble starting after i cleaned some unpuppetized checks [10:42:13] ack, so alerting failure but the service is ok [10:42:15] ? [10:42:31] yep [10:42:48] much better than the opposite ;) [10:43:41] :) [10:50:56] (03PS1) 10Gilles: Increase CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483382 (https://phabricator.wikimedia.org/T209857) [10:53:46] (03PS1) 10Elukey: camus: clean up references about crons [puppet] - 10https://gerrit.wikimedia.org/r/483383 (https://phabricator.wikimedia.org/T172532) [10:55:29] (03CR) 10Gilles: [C: 03+2] Increase CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483382 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [10:55:47] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14260/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/483383 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [10:56:55] (03Merged) 10jenkins-bot: Increase CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483382 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [10:58:05] !log uploaded docker-registry_2.7.0~rc0~wmf1-1 debian package to reprepro for stretch-wikimedia (done yesterday at 17:21 UTC forgot about the log) [10:58:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:27] (03CR) 10jenkins-bot: Increase CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483382 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [10:59:29] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T209857 Increase CPU benchmark sampling rate (duration: 00m 53s) [10:59:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:32] T209857: Create ISP ranking based on RUM data - https://phabricator.wikimedia.org/T209857 [11:02:08] (03PS2) 10Alexandros Kosiaris: Add an stdout log stanza to config [deployment-charts] - 10https://gerrit.wikimedia.org/r/483227 [11:04:41] (03PS1) 10Filippo Giunchedi: Default production logging to new logging infrastructure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483384 (https://phabricator.wikimedia.org/T211124) [11:05:36] (03CR) 10Filippo Giunchedi: "To be merged on Mon" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483384 (https://phabricator.wikimedia.org/T211124) (owner: 10Filippo Giunchedi) [11:08:37] (03PS1) 10Alexandros Kosiaris: mathoid: Remove the logging sidecar container [deployment-charts] - 10https://gerrit.wikimedia.org/r/483385 [11:11:19] (03CR) 10Fsero: [C: 03+2] mathoid: Remove the logging sidecar container [deployment-charts] - 10https://gerrit.wikimedia.org/r/483385 (owner: 10Alexandros Kosiaris) [11:17:14] 10Operations, 10Traffic, 10Patch-For-Review: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10Vgutierrez) So, after setting http2_max_field_size to 8k, you can properly fetch the same URLs over HTTP 1.1 and HTTP2. If the max length is exceeded over HTTP 1.1, a 414 error... [11:30:41] (03CR) 10Alexandros Kosiaris: [C: 04-2] "The latter, not the former for the "temporary is the new permanent" reason. I 'll have a better look into https://gerrit.wikimedia.org/r/#" [puppet] - 10https://gerrit.wikimedia.org/r/481215 (https://phabricator.wikimedia.org/T212327) (owner: 10Bstorm) [11:48:23] (03PS1) 10Jcrespo: mariadb: Depool es2012 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483387 [11:50:53] (03CR) 10Jcrespo: [C: 03+2] mariadb: Depool es2012 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483387 (owner: 10Jcrespo) [11:52:31] (03Merged) 10jenkins-bot: mariadb: Depool es2012 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483387 (owner: 10Jcrespo) [11:53:35] PROBLEM - puppet last run on cumin2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:53:46] 10Operations, 10ops-codfw, 10DBA, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo) 05Resolved→03Open [11:54:16] !log starting data transfer from wdqs1003 -> wdqs1006 - T213361 [11:54:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:19] T213361: reload wdqs1006 from other server - https://phabricator.wikimedia.org/T213361 [11:54:19] 10Operations, 10ops-codfw, 10DBA, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo) Leaving it open and acking it on icinga so we don't forget about it. [11:58:49] RECOVERY - puppet last run on cumin2001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for European Mid-day SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T1200). [12:00:04] Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:15] Oh :D [12:00:52] I can swat today [12:01:00] zeljkof: Normally [12:01:05] Zoranzoki21: looks like it's just the two of us today :D [12:01:17] zeljkof: Looks :D [12:01:25] !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: Depool es2012 (duration: 00m 52s) [12:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:36] I get out of the way [12:02:52] (03CR) 10jenkins-bot: mariadb: Depool es2012 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483387 (owner: 10Jcrespo) [12:03:00] jynus: I can start with swat, or should I wait? [12:03:06] go on [12:03:12] I technically started at :59 [12:03:19] so did nothing wrong this time [12:03:27] Zoranzoki21: any patches need scripts to run? if you, please add a comment in gerrit [12:03:29] 0:-) [12:03:31] jynus: :D [12:03:41] zeljkof: Wait [12:04:05] (03CR) 10Zoranzoki21: "Needs namesapceDupes.php to be run" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) (owner: 10Zoranzoki21) [12:04:07] (03CR) 10Zoranzoki21: "Needs namesapceDupes.php to be run" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483111 (https://phabricator.wikimedia.org/T207626) (owner: 10Zoranzoki21) [12:04:10] (03CR) 10Zoranzoki21: "Needs namesapceDupes.php to be run" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [12:04:12] zeljkof: Done [12:04:31] Zoranzoki21: thanks! [12:04:58] (03PS2) 10Zoranzoki21: Create Portal namespace on shn.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) [12:07:23] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483111 (https://phabricator.wikimedia.org/T207626) (owner: 10Zoranzoki21) [12:08:57] (03Merged) 10jenkins-bot: Reverted "Revert "Disable unused Flow extension on de.wikiversity"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483111 (https://phabricator.wikimedia.org/T207626) (owner: 10Zoranzoki21) [12:10:08] Zoranzoki21: 483111 is at mwdebug1002 [12:10:15] testing [12:11:05] zeljkof: Looks good [12:11:29] deploying [12:11:38] zeljkof: namespaceDupes? [12:12:04] deploying first, then running the script [12:12:10] zeljkof: Oh ok [12:12:21] that's the usual procedre [12:12:26] procedure [12:12:48] !log zfilipin@deploy1001 Synchronized dblists/flow.dblist: SWAT: [[gerrit:483111|Reverted "Revert "Disable unused Flow extension on de.wikiversity"" (T207626)]] (duration: 00m 53s) [12:12:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:51] T207626: Disable unused Flow extension on de.wikiversity - https://phabricator.wikimedia.org/T207626 [12:13:30] zeljkof: do you time for a late-breaking patch during the current swat window? [12:14:22] Zoranzoki21: deployed, script ran T207626#4869167 [12:14:36] phuedx: if it's urgent I can deploy it next [12:14:49] Zoranzoki21: is any of your patches urgent, forgot to ask? [12:15:04] zeljkof: No [12:15:18] zeljkof: I have three patches which no need mwdebug [12:16:00] (03CR) 10jenkins-bot: Reverted "Revert "Disable unused Flow extension on de.wikiversity"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483111 (https://phabricator.wikimedia.org/T207626) (owner: 10Zoranzoki21) [12:16:11] Zoranzoki21: 478464 has merge conflict [12:16:48] zeljkof: You can deploy phuedx patch while I resolve it [12:17:01] phuedx: what's your patch? [12:18:00] (03CR) 10GTirloni: [C: 03+1] "Changing to +1 to indicate I agree with the change." [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [12:19:07] 10Operations, 10Patch-For-Review: Onboarding John Bond - https://phabricator.wikimedia.org/T213079 (10MoritzMuehlenhoff) [12:20:35] (03PS2) 10Jbond42: Small change to test merge permissions to use [puppet] - 10https://gerrit.wikimedia.org/r/483168 (https://phabricator.wikimedia.org/T213079) [12:20:37] (03PS2) 10Zoranzoki21: Disable unused Flow extension on ur.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) [12:20:39] (03PS1) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483392 [12:20:54] !log stop and upgrade es2012 [12:20:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:03] zeljkof: Conflict resolved [12:21:13] (03PS1) 10Phuedx: Re-enable QuickSurveys extension on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483393 (https://phabricator.wikimedia.org/T209882) [12:21:19] ^ zeljkof [12:21:24] (03Abandoned) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483392 (owner: 10Zoranzoki21) [12:22:03] phuedx: ok, deploying it now, please add it to the calendar [12:22:12] zeljkof: sure. thanks! [12:22:12] (03CR) 10Bmansurov: Disable reader trust survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476370 (https://phabricator.wikimedia.org/T209882) (owner: 10Bmansurov) [12:22:27] Zoranzoki21: a quick break from your patches until I deploy 483393 [12:22:38] zeljkof: Ok, no problemo [12:23:19] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483393 (https://phabricator.wikimedia.org/T209882) (owner: 10Phuedx) [12:24:57] (03Merged) 10jenkins-bot: Re-enable QuickSurveys extension on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483393 (https://phabricator.wikimedia.org/T209882) (owner: 10Phuedx) [12:25:40] phuedx: 483393 is at mwdebug1002, please test and let me know if I can deploy it [12:26:46] zeljkof: just proven that it's disabled in prod and enabled on mwdebug1002. thanks [12:26:51] (i mean: go) [12:26:56] ok [12:27:18] zeljkof: i'll be monitoring https://grafana.wikimedia.org/d/000000566/reading-web-dashboard?orgId=1&panelId=15&fullscreen post-deployment [12:28:05] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:483393|Re-enable QuickSurveys extension on enwiki (T209882)]] (duration: 00m 52s) [12:28:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:08] T209882: Quicksurvey for reader trust - https://phabricator.wikimedia.org/T209882 [12:28:18] (03CR) 10A2093064: Modifying configuration about Chinese Wikiversity: (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [12:28:30] phuedx: deployed! please test and thanks for deploying with #releng ;) [12:28:44] (03CR) 10jenkins-bot: Re-enable QuickSurveys extension on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483393 (https://phabricator.wikimedia.org/T209882) (owner: 10Phuedx) [12:28:47] (03CR) 10Arturo Borrero Gonzalez: "This is related to deployment-prep, why don't add this:" [puppet] - 10https://gerrit.wikimedia.org/r/481215 (https://phabricator.wikimedia.org/T212327) (owner: 10Bstorm) [12:28:53] 10Operations, 10Citoid, 10serviceops, 10Wikimedia-Incident: allow zotero container nodejs server to define the amount of heap used instead of the fixed limit of 1.7Gi - https://phabricator.wikimedia.org/T213414 (10jijiki) [12:28:57] zeljkof: You're the hero we need [12:29:02] (03PS1) 10Jcrespo: Revert "mariadb: Depool es2012 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483394 [12:29:13] phuedx: the hero we _deserve_ ;P [12:29:20] :P [12:29:31] (03CR) 10Alexandros Kosiaris: varnish: move $all_networks to $trusted_networks (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/475714 (owner: 10Alexandros Kosiaris) [12:30:06] :D [12:30:18] my son (11) just saw scap logo in one of the terminal windows and said it's cool [12:30:42] zeljkof: It should be [12:31:32] Btw, I am 15 [12:31:36] :P [12:31:49] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [12:32:00] At 17th February I will be 16 :D [12:32:14] zeljkof: You working on my patch now? [12:32:16] :) [12:32:28] yes, just +2ed it [12:33:01] 10Operations, 10ops-codfw, 10Traffic: lvs2002: raid battery failure - https://phabricator.wikimedia.org/T213417 (10ema) [12:33:08] 10Operations, 10ops-codfw, 10Traffic: lvs2002: raid battery failure - https://phabricator.wikimedia.org/T213417 (10ema) p:05Triage→03Normal [12:33:36] zeljkof: Can you do it again? [12:33:52] On zuul I no see my patch [12:34:33] !log starting data transfer from wdqs1003 -> wdqs1006 - T213361 - aborted (nodes are in different cluster) [12:34:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:36] T213361: reload wdqs1006 from other server - https://phabricator.wikimedia.org/T213361 [12:34:42] Zoranzoki21: strange https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/478464 [12:34:46] I don't see it either [12:36:26] (03CR) 10Zfilipin: Disable unused Flow extension on ur.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [12:36:39] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [12:37:06] (03PS3) 10Zoranzoki21: Disable unused Flow extension on ur.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) [12:37:48] (03CR) 10Zfilipin: Disable unused Flow extension on ur.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [12:40:08] (03PS6) 10Alexandros Kosiaris: Introduce $aggregate_networks, deprecate $all_networks [puppet] - 10https://gerrit.wikimedia.org/r/475714 (https://phabricator.wikimedia.org/T212327) [12:41:12] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [12:42:17] (03Merged) 10jenkins-bot: Disable unused Flow extension on ur.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [12:42:32] !log starting data transfer from wdqs1004 -> wdqs1006 - T213361 [12:42:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:42:35] T213361: reload wdqs1006 from other server - https://phabricator.wikimedia.org/T213361 [12:42:55] Zoranzoki21: 478464 is at mwdebug1002 [12:43:37] (03PS1) 10Fsero: Added defaults of node heap size to match the new one introduced. [deployment-charts] - 10https://gerrit.wikimedia.org/r/483398 (https://phabricator.wikimedia.org/T213414) [12:43:44] zeljkof: Testing [12:44:39] zeljkof: Looks good [12:44:56] deploying [12:46:13] !log zfilipin@deploy1001 Synchronized dblists/flow.dblist: SWAT: [[gerrit:478464|Disable unused Flow extension on ur.wikibooks (T207627)]] (duration: 00m 55s) [12:46:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:17] T207627: Disable unused Flow extension on ur.wikibooks - https://phabricator.wikimedia.org/T207627 [12:46:36] (03PS2) 10Zoranzoki21: Turn off main page special casing for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482516 (https://phabricator.wikimedia.org/T213018) [12:47:20] Zoranzoki21: deployed, script done T207627#4869297 [12:47:51] zeljkof: Ok [12:49:44] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482516 (https://phabricator.wikimedia.org/T213018) (owner: 10Zoranzoki21) [12:51:13] 10Operations, 10ops-eqiad: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10jcrespo) [12:51:25] (03Merged) 10jenkins-bot: Turn off main page special casing for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482516 (https://phabricator.wikimedia.org/T213018) (owner: 10Zoranzoki21) [12:51:49] 10Operations, 10ops-codfw, 10DBA, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) Let's merge this then T209261 with this task or the other way around [12:52:33] 10Operations, 10ops-eqiad: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10jcrespo) p:05Triage→03Normal a:03jcrespo I will first try remote debugging techniques myself. [12:52:41] (03CR) 10Alexandros Kosiaris: [C: 03+1] Added defaults of node heap size to match the new one introduced. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/483398 (https://phabricator.wikimedia.org/T213414) (owner: 10Fsero) [12:52:50] (03PS3) 10Zoranzoki21: Create Portal namespace on shn.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) [12:53:01] !log zfilipin@deploy1001 Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: [[gerrit:482516|Turn off main page special casing for svwiki (T213018)]] (duration: 00m 52s) [12:53:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:04] T213018: Turn off main page special casing for svwiki - https://phabricator.wikimedia.org/T213018 [12:53:06] 10Operations: IPMI Audit 2018-04 - https://phabricator.wikimedia.org/T193155 (10jcrespo) [12:53:13] Zoranzoki21: 482516 deployed [12:53:21] zeljkof: OK [12:53:21] 10Operations, 10monitoring, 10Patch-For-Review: Several hosts return "internal IPMI error" in the check_ipmi_temp check - https://phabricator.wikimedia.org/T167121 (10jcrespo) [12:53:25] 10Operations, 10ops-eqiad: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10jcrespo) [12:54:04] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483117 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [12:54:10] (03CR) 10Zfilipin: Remove main page special casing from eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483117 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [12:54:16] (03PS2) 10Zfilipin: Remove main page special casing from eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483117 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [12:54:24] 10Operations, 10ops-codfw, 10DBA, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo) Sorry, I searched but I didn't find the other one, as on your above comment you probably meant that but linked to itself by mistake. I am ok with any method, as lo... [12:54:30] (03CR) 10Zfilipin: [C: 03+2] "swat" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483117 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [12:54:33] (03CR) 10jenkins-bot: Disable unused Flow extension on ur.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [12:54:36] (03CR) 10jenkins-bot: Turn off main page special casing for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482516 (https://phabricator.wikimedia.org/T213018) (owner: 10Zoranzoki21) [12:56:44] (03Merged) 10jenkins-bot: Remove main page special casing from eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483117 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [12:57:01] (03CR) 10Fsero: "> Patch Set 1: Code-Review+1" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/483398 (https://phabricator.wikimedia.org/T213414) (owner: 10Fsero) [12:57:39] (03PS2) 10Zoranzoki21: Remove main page special casing from ruwikibooks and ruwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483118 (https://phabricator.wikimedia.org/T212849) [12:58:01] !log zfilipin@deploy1001 Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: [[gerrit:483117|Remove main page special casing from eswiki (T212849)]] (duration: 00m 53s) [12:58:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:04] T212849: Remove main page special casing from eswiki / some ru-projects - https://phabricator.wikimedia.org/T212849 [12:58:09] Zoranzoki21: 483117 deployed [12:58:13] zeljkof: OK [12:58:24] zeljkof: Now 483118 please [12:58:43] zeljkof: I will move https://gerrit.wikimedia.org/r/c/482508/ for next SWAT [12:58:55] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483118 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [12:59:10] Zoranzoki21: please do, we're just in time for this one [12:59:22] zeljkof: ok [12:59:58] (03Merged) 10jenkins-bot: Remove main page special casing from ruwikibooks and ruwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483118 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [13:01:13] !log zfilipin@deploy1001 Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: [[gerrit:483118|Remove main page special casing from ruwikibooks and ruwikiquote (T212849)]] (duration: 00m 52s) [13:01:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:18] Zoranzoki21: 483118 deployed! [13:01:25] !log EU SWAT finished [13:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:27] zeljkof: OK, thanks [13:04:49] I will go on with db maintenance is swap has finished [13:05:21] *swat [13:05:41] (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Depool es2012 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483394 (owner: 10Jcrespo) [13:07:20] (03Merged) 10jenkins-bot: Revert "mariadb: Depool es2012 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483394 (owner: 10Jcrespo) [13:08:04] (03CR) 10jenkins-bot: Remove main page special casing from eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483117 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [13:08:06] (03CR) 10jenkins-bot: Remove main page special casing from ruwikibooks and ruwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483118 (https://phabricator.wikimedia.org/T212849) (owner: 10Zoranzoki21) [13:08:08] (03CR) 10jenkins-bot: Revert "mariadb: Depool es2012 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483394 (owner: 10Jcrespo) [13:08:10] (03PS1) 10Jcrespo: mariadb: Depool es1018 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483402 [13:08:57] jynus: SWAT is finished [13:10:15] !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: Repool es2012 (duration: 00m 52s) [13:10:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:41] (03CR) 10Jcrespo: [C: 03+2] mariadb: Depool es1018 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483402 (owner: 10Jcrespo) [13:12:22] (03Merged) 10jenkins-bot: mariadb: Depool es1018 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483402 (owner: 10Jcrespo) [13:12:46] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/483140/ <--- reviews welcome [13:12:49] cc moritzm [13:12:56] (03PS1) 10Jcrespo: Revert "mariadb: Depool es1018 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483403 [13:14:00] 10Operations, 10ops-eqiad: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10Volans) @jcrespo you could try first any of the known/listed things in https://wikitech.wikimedia.org/wiki/Management_Interfaces (aliased from IPMI) and of course feel fre... [13:14:41] arturo: ack, will have a look [13:17:04] thanks! [13:20:50] (03CR) 10jenkins-bot: mariadb: Depool es1018 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483402 (owner: 10Jcrespo) [13:24:02] (03PS1) 10Arturo Borrero Gonzalez: openstack: enable net nodes in the mitaka/stretch combination [puppet] - 10https://gerrit.wikimedia.org/r/483408 (https://phabricator.wikimedia.org/T212302) [13:24:17] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: depool es1018 (duration: 00m 52s) [13:24:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:35] (03CR) 10Alexandros Kosiaris: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/14261/ looks pretty sane fwiw. The one notable addition is localhost IPv4/IPv6 addresses " [puppet] - 10https://gerrit.wikimedia.org/r/475714 (https://phabricator.wikimedia.org/T212327) (owner: 10Alexandros Kosiaris) [13:25:47] 10Operations, 10ops-eqiad: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10jcrespo) That was the plan :-) [13:28:14] (03PS2) 10Arturo Borrero Gonzalez: openstack: enable net nodes in the mitaka/stretch combination [puppet] - 10https://gerrit.wikimedia.org/r/483408 (https://phabricator.wikimedia.org/T212302) [13:29:29] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "Catalog compiles as expected: https://puppet-compiler.wmflabs.org/compiler1002/14263/" [puppet] - 10https://gerrit.wikimedia.org/r/483408 (https://phabricator.wikimedia.org/T212302) (owner: 10Arturo Borrero Gonzalez) [13:30:06] (03CR) 10星耀晨曦: [C: 04-1] Modifying configuration about Chinese Wikiversity: (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [13:31:07] (03PS9) 10Gehel: wdqs: preliminary work to manage multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) [13:31:52] 10Operations: Add kchapman@wikimedia.org to performance-team@wikimedia.org - https://phabricator.wikimedia.org/T213427 (10kchapman) [13:32:08] (03CR) 10jerkins-bot: [V: 04-1] wdqs: preliminary work to manage multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [13:33:52] (03PS10) 10Gehel: wdqs: preliminary work to manage multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) [13:36:48] 10Operations, 10ops-eqiad: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10jcrespo) a:05jcrespo→03Cmjohnson @Volans I have no ssh, https or ipmi access, so there is nothing I can do about it. **This needs a power drain.** Please ping us in a... [13:39:27] (03PS1) 10Zoranzoki21: Add new throttle rule for Berklee College of Music library [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483409 (https://phabricator.wikimedia.org/T213311) [13:39:34] (03CR) 10A2093064: Modifying configuration about Chinese Wikiversity: (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [13:39:47] (03CR) 10jerkins-bot: [V: 04-1] Add new throttle rule for Berklee College of Music library [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483409 (https://phabricator.wikimedia.org/T213311) (owner: 10Zoranzoki21) [13:40:03] (03PS2) 10Zoranzoki21: Add new throttle rule for Berklee College of Music library [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483409 (https://phabricator.wikimedia.org/T213311) [13:42:47] PROBLEM - puppet last run on cloudvirt1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:43:04] (03PS4) 10Gehel: make blazegraph port configurable [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/483144 (https://phabricator.wikimedia.org/T213289) [13:43:27] PROBLEM - puppet last run on cloudvirt1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:45:23] volans: what's wrong with this cumin command? [13:45:26] `sudo cumin 'P{O:wmcs::openstack::eqiad1::virt and F:lsbdistcodename = stretch}'` [13:45:34] Caught InvalidQueryError exception: Mixed endpoints are not supported, use the global grammar to mix them. [13:45:58] arturo: that you're mixing a resource and fact query within the same puppedb query [13:46:03] you need to rewrite it as: [13:46:22] 'P{O:wmcs::openstack::eqiad1::virt} and P{F:lsbdistcodename = stretch}' [13:46:27] oh I see [13:46:29] thanks volans !! [13:46:34] to use the global grammar to compose them [13:46:40] doing 2 different queries to puppetdb [13:47:02] yw :) [13:50:16] (03PS11) 10Gehel: wdqs: preliminary work to manage multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) [13:51:18] !log T212302 icinga downtime for 2h cloudvirt[1013,1024,1026-1030].eqiad.wmnet bc wrong puppet code [13:51:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:23] T212302: CloudVPS: upgrade: jessie -> stretch & mitaka -> newton - https://phabricator.wikimedia.org/T212302 [13:52:05] 10Operations, 10Discovery-Search (Current work): Test spicerack elasticsearch module - https://phabricator.wikimedia.org/T207920 (10Mathew.onipe) [13:52:14] (03CR) 10Gehel: "PCC agrees this is still a noop: https://puppet-compiler.wmflabs.org/compiler1002/14266/" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [13:59:17] (03CR) 10Gehel: "very minor comments inline, otherwise, LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/483310 (https://phabricator.wikimedia.org/T213212) (owner: 10Smalyshev) [14:01:34] 10Operations: Add kchapman@wikimedia.org to performance-team@wikimedia.org - https://phabricator.wikimedia.org/T213427 (10Reedy) Isn't this a google group and hence needs handling by #office-it (or a current member of #performance-team)? [14:02:14] (03CR) 10Jcrespo: [C: 04-1] "Blocked on maintenance, which is blocked on depool taking place due to ongoing dump process." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483403 (owner: 10Jcrespo) [14:04:06] (03PS1) 10Jcrespo: mariadb: Depool es1019 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483415 (https://phabricator.wikimedia.org/T213422) [14:04:54] (03PS1) 10Arturo Borrero Gonzalez: openstack: remove redundant sqlite3 declaration in cloudvirt hosts [puppet] - 10https://gerrit.wikimedia.org/r/483416 (https://phabricator.wikimedia.org/T212302) [14:07:04] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/14267/console" [puppet] - 10https://gerrit.wikimedia.org/r/483416 (https://phabricator.wikimedia.org/T212302) (owner: 10Arturo Borrero Gonzalez) [14:07:41] (03CR) 10Jcrespo: [C: 03+2] mariadb: Depool es1019 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483415 (https://phabricator.wikimedia.org/T213422) (owner: 10Jcrespo) [14:08:53] (03Merged) 10jenkins-bot: mariadb: Depool es1019 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483415 (https://phabricator.wikimedia.org/T213422) (owner: 10Jcrespo) [14:09:31] RECOVERY - puppet last run on cloudvirt1024 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [14:12:35] (03CR) 10Freephile: [C: 03+1] gerrit: Add colour to PolyGerrit header and update the theme slightly [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [14:12:55] 10Operations, 10Traffic, 10Patch-For-Review: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10Anomie) Firefox 64 and Chromium 72 also react to the "enhance your calm" as a dropped connection rather than as a 414. Firefox gives me an empty page, while Chromium is a bit m... [14:14:02] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: depool es1019 (duration: 00m 53s) [14:14:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:03] RECOVERY - puppet last run on cloudvirt1030 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:17:41] 10Operations, 10ops-codfw, 10DBA, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) [14:18:18] 10Operations, 10ops-codfw, 10DBA, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) I have merged this into T209261 as that other one has a more "important" title so we don't forget! :) [14:19:49] (03CR) 10jenkins-bot: mariadb: Depool es1019 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483415 (https://phabricator.wikimedia.org/T213422) (owner: 10Jcrespo) [14:21:51] jouncebot: now [14:21:51] No deployments scheduled for the next 0 hour(s) and 38 minute(s) [14:21:54] jouncebot: next [14:21:54] In 0 hour(s) and 38 minute(s): Structured Data on Commons initial deployment - Pre slot (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T1500) [14:21:58] I see [14:22:12] I am going to hit merge on my 2 patches now to make sure they are actually merged ready for the slot [14:23:06] oooh, department retro time [14:23:13] oooh, wrong channel .... [14:27:33] (03PS1) 10Marostegui: db-eqiad.php: Depool db1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483420 (https://phabricator.wikimedia.org/T86338) [14:30:13] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483420 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:31:19] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483420 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:33:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1114 T86338 T202167 (duration: 00m 53s) [14:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:10] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:33:10] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:33:12] !log Deploy schema change on db1114 - T86338 T202167 [14:33:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:31] !log fsero@deploy1001 scap-helm -h [namespace: -h, clusters: staging] [14:33:31] !log fsero@deploy1001 scap-helm -h cluster staging completed [14:33:31] !log fsero@deploy1001 scap-helm -h finished [14:33:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:54] damn [14:33:56] xD [14:34:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483420 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:35:34] !log fsero@deploy1001 scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml [namespace: zotero, clusters: staging] [14:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:51] !log fsero@deploy1001 scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: staging] [14:36:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:52] !log fsero@deploy1001 scap-helm zotero cluster staging completed [14:36:52] !log fsero@deploy1001 scap-helm zotero finished [14:36:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:25] (03CR) 10星耀晨曦: [C: 03+1] Modifying configuration about Chinese Wikiversity: (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [14:40:26] (03CR) 10jerkins-bot: [V: 04-1] Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [14:40:52] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483425 [14:41:13] (03PS1) 10Elukey: profile::analytics::refinery: move sanitize_eventlogging_analytics to timer [puppet] - 10https://gerrit.wikimedia.org/r/483426 (https://phabricator.wikimedia.org/T172532) [14:42:03] !log fsero@deploy1001 scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: staging] [14:42:04] !log fsero@deploy1001 scap-helm zotero cluster staging completed [14:42:04] !log fsero@deploy1001 scap-helm zotero finished [14:42:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:28] (03PS1) 10Jbond: Small change to test merge permissions (now with a different account) [puppet] - 10https://gerrit.wikimedia.org/r/483427 (https://phabricator.wikimedia.org/T213079) [14:44:01] (03CR) 10星耀晨曦: [C: 03+1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [14:44:15] (03PS5) 10Vgutierrez: certcentral: Allow specifying authorized hosts and regex in the config [software/certcentral] - 10https://gerrit.wikimedia.org/r/483163 (https://phabricator.wikimedia.org/T213301) [14:44:24] (03PS2) 10Elukey: profile::analytics::refinery: move sanitize_eventlogging_analytics to timer [puppet] - 10https://gerrit.wikimedia.org/r/483426 (https://phabricator.wikimedia.org/T172532) [14:45:02] (03CR) 10jerkins-bot: [V: 04-1] Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [14:45:20] (03CR) 10jerkins-bot: [V: 04-1] profile::analytics::refinery: move sanitize_eventlogging_analytics to timer [puppet] - 10https://gerrit.wikimedia.org/r/483426 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [14:45:55] (03PS1) 10Alexandros Kosiaris: ferm: Remove unused all_networks erb variable [puppet] - 10https://gerrit.wikimedia.org/r/483429 [14:45:57] (03PS1) 10Alexandros Kosiaris: hhvm: Switch to using domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/483430 [14:46:00] (03PS1) 10Alexandros Kosiaris: DNM: dnsrecursor: Switch to using aggregate_networks [puppet] - 10https://gerrit.wikimedia.org/r/483431 [14:46:02] (03PS1) 10Alexandros Kosiaris: mx/otrs/lists: Move spamassasin to aggregate_networks [puppet] - 10https://gerrit.wikimedia.org/r/483432 [14:46:13] (03CR) 10jerkins-bot: [V: 04-1] certcentral: Allow specifying authorized hosts and regex in the config [software/certcentral] - 10https://gerrit.wikimedia.org/r/483163 (https://phabricator.wikimedia.org/T213301) (owner: 10Vgutierrez) [14:47:01] (03Abandoned) 10Jbond: Small change to test merge permissions to use [puppet] - 10https://gerrit.wikimedia.org/r/483168 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [14:47:05] (03CR) 10星耀晨曦: [C: 04-1] Modifying configuration about Chinese Wikiversity: (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [14:48:08] (03PS6) 10Vgutierrez: certcentral: Allow specifying authorized hosts and regex in the config [software/certcentral] - 10https://gerrit.wikimedia.org/r/483163 (https://phabricator.wikimedia.org/T213301) [14:50:08] (03PS3) 10Elukey: profile::analytics::refinery: move sanitize_eventlogging_analytics to timer [puppet] - 10https://gerrit.wikimedia.org/r/483426 (https://phabricator.wikimedia.org/T172532) [14:51:26] (03CR) 10Alexandros Kosiaris: [C: 03+1] "There's now a series of patches following this one removing all uses of all_networks." [puppet] - 10https://gerrit.wikimedia.org/r/475714 (https://phabricator.wikimedia.org/T212327) (owner: 10Alexandros Kosiaris) [14:53:36] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483425 (owner: 10Marostegui) [14:54:31] (03PS1) 10Ema: ATS: use stock request coalescing settings [puppet] - 10https://gerrit.wikimedia.org/r/483436 [14:54:42] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483425 (owner: 10Marostegui) [14:55:42] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1114 T86338 T202167 (duration: 00m 52s) [14:55:46] (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483438 (https://phabricator.wikimedia.org/T86338) [14:55:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:46] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:55:47] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:55:48] (03PS1) 10Jcrespo: mariadb: Depool es2015 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483439 [14:56:17] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM (cc performance folks for their information)" [puppet] - 10https://gerrit.wikimedia.org/r/482894 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [14:56:39] (03CR) 10Filippo Giunchedi: [C: 03+1] hiera: add cluster definition to poolcounter servers [puppet] - 10https://gerrit.wikimedia.org/r/483009 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [14:57:04] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483438 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:57:17] (03CR) 10Filippo Giunchedi: [C: 03+1] hiera: add cluster definition for graphite [puppet] - 10https://gerrit.wikimedia.org/r/482884 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [14:57:21] (03PS2) 10Jcrespo: mariadb: Depool es2015 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483439 [14:57:56] (03PS2) 10Ema: ATS: use stock request coalescing settings [puppet] - 10https://gerrit.wikimedia.org/r/483436 [14:58:09] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483438 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:58:39] (03CR) 10Ema: [C: 03+2] ATS: use stock request coalescing settings [puppet] - 10https://gerrit.wikimedia.org/r/483436 (owner: 10Ema) [14:58:53] (03CR) 10Jcrespo: [C: 03+2] mariadb: Depool es2015 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483439 (owner: 10Jcrespo) [14:59:14] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1080 T86338 T202167 (duration: 00m 52s) [14:59:15] !log Deploy schema change on db1080 - T86338 T202167 [14:59:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:01] (03Merged) 10jenkins-bot: mariadb: Depool es2015 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483439 (owner: 10Jcrespo) [15:00:04] addshore: Time to snap out of that daydream and deploy Structured Data on Commons initial deployment - Pre slot. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T1500). [15:00:09] \o [15:00:23] I'm going to wait a few more mins for the second of the 2 backports to finish merging [15:01:06] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483425 (owner: 10Marostegui) [15:01:10] (03CR) 10jenkins-bot: mariadb: Depool es2015 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483439 (owner: 10Jcrespo) [15:01:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483438 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [15:03:09] (03CR) 10Alexandros Kosiaris: [C: 03+2] Change frequency of OSM replication on maps1004 [puppet] - 10https://gerrit.wikimedia.org/r/482860 (owner: 10MSantos) [15:03:17] (03CR) 10Alexandros Kosiaris: [C: 03+2] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/482860 (owner: 10MSantos) [15:03:25] (03PS4) 10Alexandros Kosiaris: Change frequency of OSM replication on maps1004 [puppet] - 10https://gerrit.wikimedia.org/r/482860 (owner: 10MSantos) [15:06:49] \o [15:06:55] right, im going to start with my slot now [15:06:59] (03PS4) 10Elukey: profile::analytics::refinery: move sanitize_eventlogging_analytics to timer [puppet] - 10https://gerrit.wikimedia.org/r/483426 (https://phabricator.wikimedia.org/T172532) [15:10:53] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/14271/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/483426 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [15:12:28] (03PS1) 10Alexandros Kosiaris: phab::exim: Move to aggregate_networks [puppet] - 10https://gerrit.wikimedia.org/r/483440 [15:12:30] (03PS1) 10Alexandros Kosiaris: mailmain: Switch to using aggregate_networks [puppet] - 10https://gerrit.wikimedia.org/r/483441 [15:12:32] (03PS1) 10Alexandros Kosiaris: ntp: Update comment for usage of all_networks [puppet] - 10https://gerrit.wikimedia.org/r/483442 [15:12:34] (03PS1) 10Alexandros Kosiaris: networks: Remove old and deprecated all_networks var [puppet] - 10https://gerrit.wikimedia.org/r/483443 [15:12:42] !log addshore@deploy1001 Synchronized php-1.33.0-wmf.9/extensions/Wikibase/repo/includes/Content: [[gerrit:483388|T208330 dont write to wb_terms for mediainfo]] (duration: 00m 55s) [15:12:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:45] T208330: MediaInfo extension should not use the wb_terms table - https://phabricator.wikimedia.org/T208330 [15:13:38] * addshore continues to wait for the other branch to merge.... [15:14:05] woo, it just merged [15:20:04] syncing [15:20:55] !log addshore@deploy1001 Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/includes/Content: [[gerrit:483388|T208330 dont write to wb_terms for mediainfo]] (duration: 00m 54s) [15:20:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:58] T208330: MediaInfo extension should not use the wb_terms table - https://phabricator.wikimedia.org/T208330 [15:21:35] !log fsero@deploy1001 scap-helm zotero upgrade -f /srv/scap-helm/zotero/zotero-values-codfw.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: codfw] [15:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:39] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483444 [15:21:58] addshore: LGTM. [15:22:06] so, thats the 2 backports [15:22:14] im going to truncate the wb_terms table for testcommonswiki too [15:22:17] !log fsero@deploy1001 scap-helm zotero upgrade production -f /srv/scap-helm/zotero/zotero-values-codfw.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: codfw] [15:22:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:44] addshore: Can I deploy https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/483444/ ? [15:22:49] marostegui: go for it [15:22:52] (I can wait for you!) [15:22:58] i have no more syncs right now [15:23:01] Ah ok! [15:23:04] Thanks! [15:23:06] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483444 (owner: 10Marostegui) [15:24:43] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483444 (owner: 10Marostegui) [15:24:45] !log T208330, MariaDB [testcommonswiki]> TRUNCATE TABLE wb_terms; # Was https://phabricator.wikimedia.org/P7973 [15:24:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1080 T86338 T202167 (duration: 00m 49s) [15:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:09] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [15:26:11] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [15:26:51] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483444 (owner: 10Marostegui) [15:27:01] !log Deploy schema change on db1067 (s1 master) - T86338 T202167 [15:27:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:25] 10Operations, 10ops-eqiad, 10Patch-For-Review: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10Cmjohnson) I need to power this off and unplug it for 10-20 secs. LMK if I can do that today [15:30:56] 1 [15:31:05] !log rollbacking last zotero codfw deployment [15:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:23] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Cmjohnson) Record: 4 Date/Time: 11/17/2017 19:18:35 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded... [15:31:56] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Cmjohnson) I need to move DIMM around and do standard troubleshooting. Is this server able to be powered off and down in icinga? [15:32:33] 10Operations, 10Patch-For-Review: Onboarding John Bond - https://phabricator.wikimedia.org/T213079 (10MoritzMuehlenhoff) [15:39:00] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Eevans) >>! In T212418#4869959, @Cmjohnson wrote: > I need to move DIMM around and do standard troubleshooting. Is this server able to be powered off and do... [15:43:38] 10Operations, 10VisualEditor, 10Wikimedia-Apache-configuration: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10matmarex) [15:45:26] James_F: 15 mins ;) [15:45:41] Indeed. [15:45:53] so the tables are there already? [15:46:12] Yup. [15:46:30] so it is just the 2 config patches? [15:47:12] Yes. We *might* have to deploy the config patches together, not totally sure I traced every side issue with both being on. [15:47:31] I.e. bits of WBRepo code that try to execute when it's "on" but disabled and would flake. [15:47:43] On a test wiki it's hard to spot. On Commons the fatals would flood the world. [15:48:08] James_F: yes, I was going to suggest going very slowly and leaving it on mwdebug1002 for quite some mins for me to poke around :) [15:48:10] same for the second patch [15:48:21] Definitely. [15:48:37] I remember the unexpected things that happened with lexeme [15:48:59] do you want to push the buttons and I'll just sit here waving my mouse around? [15:49:13] Sounds good. [15:49:16] [= [15:49:30] I'm in the Meet if you want a more real time back-channel. :-) [15:49:58] 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10serviceops, 10Performance-Team (Radar): Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Eevans) [15:50:05] cool, im currently also in another meet though ;) [15:50:14] should finish part way through our slot [15:50:25] Excuses. ;-) [15:51:30] (03PS1) 10Jbond: Add jbond user to icinga groups [puppet] - 10https://gerrit.wikimedia.org/r/483446 (https://phabricator.wikimedia.org/T213079) [15:52:03] (03PS2) 10Jbond: Add jbond user to icinga groups [puppet] - 10https://gerrit.wikimedia.org/r/483446 (https://phabricator.wikimedia.org/T213079) [15:54:09] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/483446 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond) [15:56:12] !log Deploy schema change on db1068 (s4 master) - T86338 [15:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:15] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [15:58:18] jouncebot: next [15:58:18] In 0 hour(s) and 1 minute(s): Structured Data on Commons initial deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T1600) [15:58:21] oooh [15:59:25] (03PS4) 10Zoranzoki21: Update groupOverrides for srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482609 (https://phabricator.wikimedia.org/T213055) [15:59:42] (03PS4) 10Mathew.onipe: Elasticsearch failed shard allocation check [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850) [16:00:04] James_F, marktraceur, and addshore: Time to snap out of that daydream and deploy Structured Data on Commons initial deployment. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T1600). [16:00:35] (03PS4) 10Bstorm: network: Add the new cloud region to all_networks [puppet] - 10https://gerrit.wikimedia.org/r/481215 (https://phabricator.wikimedia.org/T212327) [16:00:36] * James_F is on it. :-) [16:01:43] good luck [16:02:03] (03CR) 10Jbond: [C: 03+2] Add jbond user to icinga groups [puppet] - 10https://gerrit.wikimedia.org/r/483446 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond) [16:02:06] (03PS7) 10Jforrester: Install but don't enable the WikibaseMediaInfo extension, part IV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446844 (https://phabricator.wikimedia.org/T180981) [16:02:10] (03PS1) 10Sbisson: Welcome survey: experiment 2: A vs. C [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483449 [16:02:20] 10Operations, 10VisualEditor, 10Wikimedia-Apache-configuration: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10Elitre) Is T212575 related? [16:02:24] (03CR) 10Jforrester: [C: 03+2] Install but don't enable the WikibaseMediaInfo extension, part IV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446844 (https://phabricator.wikimedia.org/T180981) (owner: 10Jforrester) [16:02:33] Thanks. [16:02:37] 10Operations, 10Office-IT, 10Performance-Team: Add kchapman@wikimedia.org to performance-team@wikimedia.org - https://phabricator.wikimedia.org/T213427 (10herron) p:05Triage→03Normal Hi @kchapman, I wasn't able to find a mailman list with this name, nor an email server alias. As @reedy suggests we'll n... [16:02:59] (03CR) 10jerkins-bot: [V: 04-1] Welcome survey: experiment 2: A vs. C [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483449 (owner: 10Sbisson) [16:03:30] (03Merged) 10jenkins-bot: Install but don't enable the WikibaseMediaInfo extension, part IV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446844 (https://phabricator.wikimedia.org/T180981) (owner: 10Jforrester) [16:03:46] (03CR) 10Mathew.onipe: Elasticsearch failed shard allocation check (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850) (owner: 10Mathew.onipe) [16:03:54] James_F: so mwdebug1002 first? :D [16:04:01] Very much so. [16:04:16] (03PS2) 10Sbisson: Welcome survey: experiment 2: A vs. C [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483449 [16:04:49] !log T180981 Placed patch to install but not enable WBMI on Commons on mwdebug1002 [16:04:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:51] T180981: Deploy WikibaseMediaInfo extension to beta - https://phabricator.wikimedia.org/T180981 [16:04:57] right then [16:05:25] so they are both on special version, lovely [16:05:41] (03CR) 10jerkins-bot: [V: 04-1] Welcome survey: experiment 2: A vs. C [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483449 (owner: 10Sbisson) [16:05:55] (03CR) 10Alexandros Kosiaris: [C: 04-2] "https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475714/ has been updated and awaits review. There is a long chain of patchsets depe" [puppet] - 10https://gerrit.wikimedia.org/r/481215 (https://phabricator.wikimedia.org/T212327) (owner: 10Bstorm) [16:06:09] (03CR) 10jenkins-bot: Install but don't enable the WikibaseMediaInfo extension, part IV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446844 (https://phabricator.wikimedia.org/T180981) (owner: 10Jforrester) [16:06:33] (03PS3) 10Sbisson: Welcome survey: experiment 2: A vs. C [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483449 [16:07:32] addshore: I'm testing and not finding any issues. [16:07:52] well, i found this, but not worrying... https://commons.wikimedia.org/wiki/Special:ListProperties [16:08:00] i dont think i'll check the special pages? [16:08:09] Eurgh. But yeah. [16:08:11] (03CR) 10Alexandros Kosiaris: [C: 03+1] "@gtirloni, IMHO your usage of -1 was fine. It's been quite a common practice to do so to stall merging something." [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [16:08:38] James_F: I think we can proceed [16:08:53] Roger. [16:09:43] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Cmjohnson) @eevans I am going to have to power it back on and let it go for a few days to see if the error returns, will that present an issue for you? [16:11:17] (03PS2) 10Volans: remote: add workaround for Cumin bug [software/spicerack] - 10https://gerrit.wikimedia.org/r/483164 (https://phabricator.wikimedia.org/T213296) [16:11:19] (03PS1) 10Volans: mediawiki: update maintenance host Cumin query [software/spicerack] - 10https://gerrit.wikimedia.org/r/483453 [16:11:22] (03PS1) 10Volans: puppet: add default batch_size when running puppet [software/spicerack] - 10https://gerrit.wikimedia.org/r/483454 (https://phabricator.wikimedia.org/T205884) [16:11:24] (03PS1) 10Volans: puppet: fix regenerate_certificate() [software/spicerack] - 10https://gerrit.wikimedia.org/r/483455 (https://phabricator.wikimedia.org/T205884) [16:11:26] (03PS1) 10Volans: ipmi: add support for DRY RUN mode [software/spicerack] - 10https://gerrit.wikimedia.org/r/483456 (https://phabricator.wikimedia.org/T205884) [16:11:29] (03PS1) 10Volans: phabricator: remove unneded pylint ignore [software/spicerack] - 10https://gerrit.wikimedia.org/r/483457 (https://phabricator.wikimedia.org/T205884) [16:11:31] (03PS1) 10Volans: config: add load_ini_config() function [software/spicerack] - 10https://gerrit.wikimedia.org/r/483458 (https://phabricator.wikimedia.org/T205884) [16:11:33] (03PS1) 10Volans: debmonitor: use the existing configuration file [software/spicerack] - 10https://gerrit.wikimedia.org/r/483459 (https://phabricator.wikimedia.org/T205884) [16:11:45] !log jforrester@deploy1001 Synchronized dblists/wikidatarepo.dblist: T180981 Add Commons to wikis with WikibaseRepo installed (duration: 00m 54s) [16:11:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:48] T180981: Deploy WikibaseMediaInfo extension to beta - https://phabricator.wikimedia.org/T180981 [16:12:01] (03CR) 10Bstorm: "Cool!" [puppet] - 10https://gerrit.wikimedia.org/r/483085 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [16:13:14] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T180981 Add Commons to wikis with WikibaseMediaInfo installed (duration: 00m 52s) [16:13:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:29] (03PS6) 10Jforrester: Enable WikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/466955 (https://phabricator.wikimedia.org/T159708) [16:14:34] 10Operations, 10Patch-For-Review: Onboarding John Bond - https://phabricator.wikimedia.org/T213079 (10MoritzMuehlenhoff) [16:15:16] (03CR) 10Jforrester: [C: 03+2] Enable WikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/466955 (https://phabricator.wikimedia.org/T159708) (owner: 10Jforrester) [16:15:24] PROBLEM - Host restbase1016.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:15:39] James_F: time for the next one :) [16:15:57] addshore: Don't rush him, it's very delicate :P [16:16:06] I know ;) [16:16:08] (he's also narrating beautifully) [16:16:20] wish i had a cat here to support me [16:16:31] (03Merged) 10jenkins-bot: Enable WikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/466955 (https://phabricator.wikimedia.org/T159708) (owner: 10Jforrester) [16:17:09] !log T180981 Placed patch to enable WBMI on Commons on mwdebug1002 [16:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:12] T180981: Deploy WikibaseMediaInfo extension to beta - https://phabricator.wikimedia.org/T180981 [16:18:04] 10Operations, 10ops-eqiad, 10Patch-For-Review: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10jcrespo) @Cmjohnson Sorry, cannot today for both organizational reasons (@ at meeting today) and technical ones (cannot depool today due to traffic w... [16:18:23] (03PS3) 10Alexandros Kosiaris: admin: test for absent users [puppet] - 10https://gerrit.wikimedia.org/r/482611 (owner: 10Hashar) [16:18:30] James_F: im pretty happy [16:19:12] (03CR) 10jenkins-bot: Enable WikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/466955 (https://phabricator.wikimedia.org/T159708) (owner: 10Jforrester) [16:19:44] (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: test for absent users [puppet] - 10https://gerrit.wikimedia.org/r/482611 (owner: 10Hashar) [16:19:53] Doing some more testing. [16:20:10] (03CR) 10Jdlrobson: Disable reader trust survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476370 (https://phabricator.wikimedia.org/T209882) (owner: 10Bmansurov) [16:20:34] RECOVERY - Host restbase1016.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.03 ms [16:21:39] 10Operations, 10ops-eqiad, 10Patch-For-Review: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10Cmjohnson) @jcrespo Sure...Tuesday works [16:21:39] addshore: Looks like the caption box CSS is broken by a gadget. [16:21:58] James_F: also pause, i just want to make sure we didn't somehow break the query service for some unexpected reason [16:22:06] Yes. [16:22:50] the query service thing was happening before we touched anything, unrelated [16:23:07] the caption box looks fine for me [16:23:12] but i guess you have different gadgets ;) [16:23:27] Slideshow gadget. [16:24:38] James_F: oooh yes, i can confirm that [16:24:47] https://usercontent.irccloud-cdn.com/file/yZqV2olQ/image.png [16:24:53] Yup. [16:28:07] addshore: Fixed in https://commons.wikimedia.org/w/index.php?title=MediaWiki:Gadget-GallerySlideshow.css&diff=334287703&oldid=110155904 I think. [16:28:14] * addshore can check here :) [16:28:27] looks good to me [16:28:45] haha, thats a nice little fix [16:29:51] OK, I think we are good to go. [16:29:54] ack [16:29:56] *agrees* [16:31:39] Syncing now. [16:31:43] ack [16:32:11] addshore: https://commons.wikimedia.org/w/index.php?title=File:Wikimedia_Foundation_logo_-_vertical.svg&diff=334287827&oldid=325256766&diffmode=source was the first caption. [16:32:25] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T159708 Enable Structured Data on Commons, captions-only (duration: 00m 53s) [16:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:28] T159708: Deploy WikibaseMediaInfo extension to production - https://phabricator.wikimedia.org/T159708 [16:32:36] James_F: sur eit wasnt https://commons.wikimedia.org/w/index.php?title=File:Chicken_In_Snow.JPG&diff=prev&oldid=334286912 ? ;) [16:33:04] addshore: Eurgh. We said not to use it in prod. [16:33:38] * James_F grumps. [16:33:42] you did? =o I definitely missed that, but also that the only really way to see if it breaks anything! [16:34:03] addshore: Which is why I was doing so in controlled conditions. :-) [16:34:30] everything looks green in the logs [16:35:02] AF being triggered for non-priv users. [16:35:19] ? [16:35:36] Detecting removal of the information template when people add a caption. [16:35:43] hmmmmm [16:35:46] link? [16:35:56] addshore: Don't have one. [16:36:13] https://commons.wikimedia.org/wiki/Special:AbuseLog/4475237 [16:36:27] James_F: is abuse filter mcr reads? [16:36:29] *ready? [16:36:37] It was… [16:36:41] its comparing 2 different slots [16:36:46] Yeah. [16:36:50] *sighs* [16:36:57] I blame Daniel. [16:37:13] hah [16:37:48] James_F: turn it off? or? [16:38:33] addshore: Let's leave it on and fix AF? [16:39:15] addshore: Do you know AF code much? [16:39:57] somewhat [16:40:02] "For now, this returns all the revision's slots, concatenated together." [16:40:03] let me go and see what has been done with AF so far [16:40:07] im inthe call now too [16:40:24] (03CR) 10Cwhite: [C: 03+2] hiera: add cluster definition for graphite [puppet] - 10https://gerrit.wikimedia.org/r/482884 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [16:40:31] (03PS2) 10Cwhite: hiera: add cluster definition for graphite [puppet] - 10https://gerrit.wikimedia.org/r/482884 (https://phabricator.wikimedia.org/T210486) [16:40:45] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10BBlack) I suspect our bug is fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=73f21c653f930f438d53eed29b5e4c65c8a0f906 which i... [16:41:45] !log data transfer from wdqs1004 -> wdqs1006 completed! - T213361 [16:41:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:48] T213361: reload wdqs1006 from other server - https://phabricator.wikimedia.org/T213361 [16:45:05] 10Operations, 10VisualEditor, 10Wikimedia-Apache-configuration: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10CDanis) If anyone can manage to reproduce this consistently, even for a few minutes, capturing a Chrome NetLog using chrom... [16:45:11] James_F: where are we chilling while looking at this? which channel? [16:45:32] 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog: Decide on handling system updates for Proton - https://phabricator.wikimedia.org/T213366 (10Jhernandez) [16:45:34] addshore: commons-sd? [16:45:41] ack [16:45:43] (03CR) 10Mathew.onipe: [C: 03+1] wdqs: preliminary work to manage multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [16:45:50] 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog, 10Traffic: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10Jhernandez) [16:46:02] 10Operations, 10Office-IT, 10Performance-Team: Add kchapman@wikimedia.org to performance-team@wikimedia.org - https://phabricator.wikimedia.org/T213427 (10Imarlier) It is a google group -- I've invited @kchapman and made her a manager of the group. [16:46:08] 10Operations, 10Office-IT, 10Performance-Team: Add kchapman@wikimedia.org to performance-team@wikimedia.org - https://phabricator.wikimedia.org/T213427 (10Imarlier) 05Open→03Resolved a:03Imarlier [16:47:03] (03Abandoned) 10Mathew.onipe: New upstream version [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/483143 (https://phabricator.wikimedia.org/T210592) (owner: 10Mathew.onipe) [16:48:15] 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog: Decide on handling system updates for Proton - https://phabricator.wikimedia.org/T213366 (10Jhernandez) [16:48:40] (03CR) 10Alexandros Kosiaris: "import is not even a valid keyword anymore. And the fact the validation fails is weird" [puppet] - 10https://gerrit.wikimedia.org/r/333012 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [16:48:56] (03CR) 10Gehel: [C: 04-1] "This is looking pretty good. A few comments inline." (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850) (owner: 10Mathew.onipe) [16:49:06] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T213397 (10RStallman-legalteam) @noarave - acknowledging your request. I'll prepare your NDA and follow up with you via your WMDE email address. Thanks! [16:49:10] (03CR) 10Alexandros Kosiaris: "This is really weird and peculiar. But the change looks good to me otherwise" [puppet] - 10https://gerrit.wikimedia.org/r/333012 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [16:49:32] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10MoritzMuehlenhoff) >>! In T203194#4870205, @BBlack wrote: > I suspect our bug is fixed by: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm... [16:49:34] (03CR) 10Alexandros Kosiaris: [C: 03+2] test: puppet-syntax now fails on deprecation notices [puppet] - 10https://gerrit.wikimedia.org/r/333012 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [16:49:40] (03PS6) 10Alexandros Kosiaris: test: puppet-syntax now fails on deprecation notices [puppet] - 10https://gerrit.wikimedia.org/r/333012 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [16:49:59] akosiaris: I could not make it find a deprecation notice :( [16:50:17] but yeah at least now we have the default [16:50:30] previously that nicely failed due to the 'import "realm.pp"' [16:50:32] but not anymore [16:51:15] hashar: yeah neither did I [16:52:34] !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: depool es2015 (duration: 00m 52s) [16:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:40] (03CR) 10Gehel: [C: 03+1] "LGTM, trivial enough" [software/spicerack] - 10https://gerrit.wikimedia.org/r/483164 (https://phabricator.wikimedia.org/T213296) (owner: 10Volans) [16:52:56] (03CR) 10Gehel: [C: 03+1] "LGTM, trivial enough" [software/spicerack] - 10https://gerrit.wikimedia.org/r/483453 (owner: 10Volans) [16:53:22] (03CR) 10Cwhite: [C: 03+2] hiera: add cluster definition to webperf servers [puppet] - 10https://gerrit.wikimedia.org/r/482894 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [16:53:34] (03PS2) 10Cwhite: hiera: add cluster definition to webperf servers [puppet] - 10https://gerrit.wikimedia.org/r/482894 (https://phabricator.wikimedia.org/T210486) [16:53:47] (03PS1) 10Jcrespo: Revert "mariadb: Depool es2015 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483466 [16:54:02] (03CR) 10Gehel: "LGTM (and good idea to not crash the puppetmasters!)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/483454 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [16:54:04] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "This is blocking us, I kindly ask you folks to review and merge soon :-)" [puppet] - 10https://gerrit.wikimedia.org/r/475714 (https://phabricator.wikimedia.org/T212327) (owner: 10Alexandros Kosiaris) [16:55:03] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10BBlack) Actually, it is already in the 4.9.y LTS/stable branch, here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.9.y&id=b2be15b... [16:56:06] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10Vgutierrez) yeah, it's included as part of 4.9.134: https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.9.134 [16:56:23] (03CR) 10Volans: [C: 03+2] remote: add workaround for Cumin bug [software/spicerack] - 10https://gerrit.wikimedia.org/r/483164 (https://phabricator.wikimedia.org/T213296) (owner: 10Volans) [16:58:22] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10MoritzMuehlenhoff) Even better, then we can simply get the 4.9.144-1 kernel from stretch-proposed-updates and test whether that is the correct fix [16:59:52] !log stop and upgrade es2015 [16:59:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:04] godog and _joe_: #bothumor I � Unicode. All rise for Puppet SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T1700). [17:00:04] No GERRIT patches in the queue for this window AFAICS. [17:00:19] <_joe_> cool [17:04:29] _joe_: perhaps https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475714/ ? [17:04:31] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10BBlack) Yeah. It's hard to "prove" whether we have this bug fixed other than running a supposed fix on the bnxt_en cp10 fleet for a while as a statistical test, but p... [17:04:57] (03Merged) 10jenkins-bot: remote: add workaround for Cumin bug [software/spicerack] - 10https://gerrit.wikimedia.org/r/483164 (https://phabricator.wikimedia.org/T213296) (owner: 10Volans) [17:05:00] <_joe_> Hauskatze: that's definitely not for puppetswat [17:05:09] <_joe_> both people in that patch have +2 rights :D [17:05:13] _joe_: ack, didn't know [17:06:01] (03CR) 10jenkins-bot: remote: add workaround for Cumin bug [software/spicerack] - 10https://gerrit.wikimedia.org/r/483164 (https://phabricator.wikimedia.org/T213296) (owner: 10Volans) [17:06:12] (03CR) 10Volans: [C: 03+2] mediawiki: update maintenance host Cumin query [software/spicerack] - 10https://gerrit.wikimedia.org/r/483453 (owner: 10Volans) [17:09:20] (03CR) 10Gehel: [C: 03+1] "Looks reasonable to me" [software/spicerack] - 10https://gerrit.wikimedia.org/r/483455 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:11:53] (03CR) 10Gehel: [C: 03+1] "Is there really any IPMI command that is safe? Joke aside, LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/483456 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:12:56] gehel: lol :) [17:13:43] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) a:05Cmjohnson→03RobH @robh @Marostegui I went through the very long and painful Dell troubleshooting and it's one of those cases where it actually worked. The server is ready to... [17:14:48] (03Merged) 10jenkins-bot: mediawiki: update maintenance host Cumin query [software/spicerack] - 10https://gerrit.wikimedia.org/r/483453 (owner: 10Volans) [17:14:53] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) Good job!! Thank you! Is the RAID 5 made already? If it is only OS install pending, I can take it from there [17:15:09] (03PS1) 10BBlack: authdns: listen for local PROXY, min v6 threads [puppet] - 10https://gerrit.wikimedia.org/r/483470 [17:15:52] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10jcrespo) @Cmjohnson you are the best, the worse Dell is, the more superb you are to cover for their mess. How many beers do I own you already? XD Thanks again. [17:16:16] (03CR) 10jenkins-bot: mediawiki: update maintenance host Cumin query [software/spicerack] - 10https://gerrit.wikimedia.org/r/483453 (owner: 10Volans) [17:17:21] (03CR) 10Gehel: [C: 03+1] puppet: add default batch_size when running puppet [software/spicerack] - 10https://gerrit.wikimedia.org/r/483454 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:17:56] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) Ah no, I think all the mgmt entries, vlan and all those steps are pending so I cannot proceed until those are set up. (Just tried to access mgmt, which was not successful). [17:17:58] (03CR) 10Volans: [C: 03+2] puppet: add default batch_size when running puppet [software/spicerack] - 10https://gerrit.wikimedia.org/r/483454 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:18:11] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) [17:18:57] (03CR) 10Gehel: [C: 03+1] "LGTM, trivial enough" [software/spicerack] - 10https://gerrit.wikimedia.org/r/483457 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:19:22] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) mgmt works now :-) I will wait for the green light from @Cmjohnson to proceed with the install Thank you for getting this almost done! [17:19:43] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) *update not ready for install. I set the wrong raid. I am updating the driver now and will fix to raid 5 once the update is complete. @marostegui odd...may have somethign to do with t... [17:23:16] (03Merged) 10jenkins-bot: puppet: add default batch_size when running puppet [software/spicerack] - 10https://gerrit.wikimedia.org/r/483454 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:24:15] (03CR) 10jenkins-bot: puppet: add default batch_size when running puppet [software/spicerack] - 10https://gerrit.wikimedia.org/r/483454 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:25:23] (03PS30) 10Elukey: admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [17:25:42] (03CR) 10Elukey: "Removed WIP so people interested can chime in and review :)" [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey) [17:27:00] (03CR) 10Gehel: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/483459 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:27:27] (03CR) 10Gehel: [C: 03+1] "LGTM. It might make sense to squash it with https://gerrit.wikimedia.org/r/#/c/operations/software/spicerack/+/483459/ for clarity (but fe" [software/spicerack] - 10https://gerrit.wikimedia.org/r/483458 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:30:29] (03CR) 10BryanDavis: Demistify $wmgMonologChannels Logstash debug level behavior (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483339 (owner: 10Gergő Tisza) [17:33:17] !log Deploy schema change on db2046 - T210713 [17:33:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:20] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [17:34:05] (03CR) 10Volans: [C: 03+2] puppet: fix regenerate_certificate() [software/spicerack] - 10https://gerrit.wikimedia.org/r/483455 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:34:07] (03PS2) 10Jcrespo: Revert "mariadb: Depool es2015 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483466 [17:39:34] (03Merged) 10jenkins-bot: puppet: fix regenerate_certificate() [software/spicerack] - 10https://gerrit.wikimedia.org/r/483455 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:40:36] (03CR) 10jenkins-bot: puppet: fix regenerate_certificate() [software/spicerack] - 10https://gerrit.wikimedia.org/r/483455 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:40:55] (03CR) 10Volans: [C: 03+2] ipmi: add support for DRY RUN mode [software/spicerack] - 10https://gerrit.wikimedia.org/r/483456 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:41:25] 10Operations, 10Patch-For-Review: Reallocate former image scalers - https://phabricator.wikimedia.org/T192457 (10Dzahn) [17:43:55] (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Depool es2015 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483466 (owner: 10Jcrespo) [17:44:59] (03Merged) 10jenkins-bot: Revert "mariadb: Depool es2015 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483466 (owner: 10Jcrespo) [17:46:24] (03Merged) 10jenkins-bot: ipmi: add support for DRY RUN mode [software/spicerack] - 10https://gerrit.wikimedia.org/r/483456 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:47:20] (03CR) 10jenkins-bot: ipmi: add support for DRY RUN mode [software/spicerack] - 10https://gerrit.wikimedia.org/r/483456 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:48:44] (03CR) 10Volans: [C: 03+2] phabricator: remove unneded pylint ignore [software/spicerack] - 10https://gerrit.wikimedia.org/r/483457 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:49:08] (03PS4) 10Jforrester: [Beta Cluster] Cleanup SDC config, all same as prod now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470459 [17:49:21] (03CR) 10Jforrester: [C: 03+2] "Beta Cluster-only config." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470459 (owner: 10Jforrester) [17:49:23] !log Deploy schema change on db2053 - T210713 [17:49:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:49:26] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [17:50:01] (03PS1) 10Dzahn: site: add mw2151 as another jobrunner host [puppet] - 10https://gerrit.wikimedia.org/r/483476 (https://phabricator.wikimedia.org/T192457) [17:50:21] !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: repool es2015 (duration: 00m 53s) [17:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:28] (03Merged) 10jenkins-bot: [Beta Cluster] Cleanup SDC config, all same as prod now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470459 (owner: 10Jforrester) [17:53:33] 10Operations, 10Wikimedia-Mailing-lists: lists.wikimedia.org reporting "You must GET the form before submitting it" for all list subscription attempts - https://phabricator.wikimedia.org/T185222 (10bd808) I received an email from a community member who was attempting to subscribe to the https://lists.wikimedia... [17:53:51] 10Operations, 10Traffic, 10VisualEditor, 10Wikimedia-Apache-configuration: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10herron) p:05Triage→03Normal [17:54:16] (03Merged) 10jenkins-bot: phabricator: remove unneded pylint ignore [software/spicerack] - 10https://gerrit.wikimedia.org/r/483457 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:55:23] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T213397 (10herron) p:05Triage→03Normal [17:56:14] (03CR) 10jenkins-bot: Revert "mariadb: Depool es2015 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483466 (owner: 10Jcrespo) [17:56:17] (03CR) 10jenkins-bot: [Beta Cluster] Cleanup SDC config, all same as prod now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470459 (owner: 10Jforrester) [17:56:33] (03CR) 10Volans: [C: 03+2] config: add load_ini_config() function [software/spicerack] - 10https://gerrit.wikimedia.org/r/483458 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:56:48] 10Operations, 10Wikimedia-Mailing-lists: lists.wikimedia.org reporting "You must GET the form before submitting it" for all list subscription attempts - https://phabricator.wikimedia.org/T185222 (10bd808) Subscription to cloud@ via web form worked for me with Chrome 71.0.3578.98 on macOS 10.13.6. [17:56:53] 10Operations, 10Operations-Software-Development, 10Wikidata, 10Wikidata-Query-Service: Create a cookbook to copy data between WDQS servers - https://phabricator.wikimedia.org/T213401 (10herron) p:05Triage→03Normal [17:57:18] (03CR) 10jenkins-bot: phabricator: remove unneded pylint ignore [software/spicerack] - 10https://gerrit.wikimedia.org/r/483457 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:57:28] (03CR) 10Volans: [C: 03+1] "LGTM" [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/483144 (https://phabricator.wikimedia.org/T213289) (owner: 10Gehel) [17:59:41] 10Operations, 10Wikimedia-Mailing-lists: lists.wikimedia.org reporting "You must GET the form before submitting it" for all list subscription attempts - https://phabricator.wikimedia.org/T185222 (10herron) p:05Triage→03Normal [17:59:43] (03CR) 10Bmansurov: Disable reader trust survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476370 (https://phabricator.wikimedia.org/T209882) (owner: 10Bmansurov) [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: Dear deployers, time to do the Services – Graphoid / Parsoid / Citoid / ORES deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T1800). [18:00:47] 10Operations, 10Wikimedia-Mailing-lists: lists.wikimedia.org reporting "You must GET the form before submitting it" for all list subscription attempts - https://phabricator.wikimedia.org/T185222 (10Tomthirteen) I am getting the same error message. [18:01:04] (03Merged) 10jenkins-bot: config: add load_ini_config() function [software/spicerack] - 10https://gerrit.wikimedia.org/r/483458 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [18:03:10] nothing for parsoid [18:03:49] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Cmjohnson) While the server is offline I took this opportunity to update the f/w on the bios and idrac. [18:04:16] (03CR) 10jenkins-bot: config: add load_ini_config() function [software/spicerack] - 10https://gerrit.wikimedia.org/r/483458 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [18:05:31] (03PS2) 10Dzahn: conftool::client: remove obsolete trusty distribution check [puppet] - 10https://gerrit.wikimedia.org/r/483226 [18:05:51] (03CR) 10Volans: [C: 03+2] debmonitor: use the existing configuration file [software/spicerack] - 10https://gerrit.wikimedia.org/r/483459 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [18:06:01] (03PS1) 10Marostegui: db-codfw.php: Depool db2053, db2060 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483480 [18:07:34] (03CR) 10Smalyshev: [C: 03+1] make blazegraph port configurable [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/483144 (https://phabricator.wikimedia.org/T213289) (owner: 10Gehel) [18:08:43] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) return shipping info for parts USPS 9202 3946 5301 2440 4873 91 Fedex 9611918 2393026 77237414 [18:08:46] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Depool db2053, db2060 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483480 (owner: 10Marostegui) [18:08:55] (03CR) 10Volans: "Looks mostly ok, see few minor nitpicks/questions inline." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/481154 (https://phabricator.wikimedia.org/T150264) (owner: 10Faidon Liambotis) [18:08:58] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) [18:09:56] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2053, db2060 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483480 (owner: 10Marostegui) [18:10:06] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:10:43] (03CR) 10Smalyshev: Puppetize blazegraph config for cases where deployed one is not enough (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483310 (https://phabricator.wikimedia.org/T213212) (owner: 10Smalyshev) [18:10:54] (03PS6) 10Smalyshev: Puppetize blazegraph config for cases where deployed one is not enough [puppet] - 10https://gerrit.wikimedia.org/r/483310 (https://phabricator.wikimedia.org/T213212) [18:11:15] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool db2053, db2060 for kernel and mysql upgrade (duration: 00m 53s) [18:11:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:23] (03Merged) 10jenkins-bot: debmonitor: use the existing configuration file [software/spicerack] - 10https://gerrit.wikimedia.org/r/483459 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [18:11:38] !log Stop MySQL on db2053 and db2060 for mysql and kernel upgrade [18:11:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:48] (03CR) 10Dzahn: [C: 03+2] conftool::client: remove obsolete trusty distribution check [puppet] - 10https://gerrit.wikimedia.org/r/483226 (owner: 10Dzahn) [18:12:23] (03CR) 10Smalyshev: [C: 03+1] wdqs: preliminary work to manage multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [18:12:55] !log The above change was db2053 and not db2060 [18:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:16] (03CR) 10jenkins-bot: debmonitor: use the existing configuration file [software/spicerack] - 10https://gerrit.wikimedia.org/r/483459 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [18:13:17] !log Stop MySQL on db2046 for kernel upgrade [18:13:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:59] (03CR) 10Dzahn: [C: 04-2] "seconding what Alex said, i did not see that -1 as rude at all, it was the right way to indicate it should not be merged yet. actually a r" [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [18:16:17] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2053, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483482 [18:19:21] (03CR) 10GTirloni: [C: 03+1] "Cool, thanks for the feedback folks!" [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [18:19:26] (03PS7) 10Smalyshev: Puppetize blazegraph config for cases where deployed one is not enough [puppet] - 10https://gerrit.wikimedia.org/r/483310 [18:22:07] (03CR) 10jenkins-bot: db-codfw.php: Depool db2053, db2060 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483480 (owner: 10Marostegui) [18:22:10] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:24:53] (03PS4) 10Dzahn: geoip::maxmind: replace deprecated validate_string functions [puppet] - 10https://gerrit.wikimedia.org/r/483222 [18:28:20] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool db2053, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483482 (owner: 10Marostegui) [18:29:25] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2053, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483482 (owner: 10Marostegui) [18:30:33] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2053, db2060 for kernel and mysql upgrade (duration: 00m 51s) [18:30:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:58] !log Upgrade mysql and kernel on db2060 [18:30:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:30] (03PS8) 10Smalyshev: Puppetize blazegraph config for cases where deployed one is not enough [puppet] - 10https://gerrit.wikimedia.org/r/483310 [18:35:22] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2053, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483482 (owner: 10Marostegui) [18:36:36] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:37:55] !log Stop replication on s8 codfw master for a schema change - T85757 [18:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:57] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [18:42:35] (03PS2) 10Dzahn: Remove support for older distros in some Apache classes [puppet] - 10https://gerrit.wikimedia.org/r/483380 (owner: 10Muehlenhoff) [18:42:40] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:43:25] (03CR) 10Dzahn: [C: 03+2] Remove support for older distros in some Apache classes [puppet] - 10https://gerrit.wikimedia.org/r/483380 (owner: 10Muehlenhoff) [18:46:03] !log Stop replication on s1 codfw master for a schema change - T85757 [18:46:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:06] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [18:47:41] !log Deploy schema change on s1 codfw master (db2048) with replication, this will generate lag on s1 codfw - T85757 [18:47:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:59] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.12 [software/spicerack] - 10https://gerrit.wikimedia.org/r/483487 [18:52:08] !log anomie@mwmaint1002 Running migrateActors.php on test wikis and mediawikiwiki for T188327. This may cause lag in codfw. [18:52:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:52:11] T188327: Deploy refactored actor storage - https://phabricator.wikimedia.org/T188327 [18:53:55] (03PS6) 10Framawiki: Create Cookbook NS in bnwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458870 (https://phabricator.wikimedia.org/T203534) [18:54:23] (03PS3) 10Framawiki: Whitelist *.*.archive.org in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468831 (https://phabricator.wikimedia.org/T207581) [18:57:13] !log deleting three files for legal compliance [18:57:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Morning SWAT (Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T1900). [19:00:04] Framawiki, tgr, Zoranzoki21, and MatmaRex: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:13] hi [19:00:16] o/ [19:00:25] \o [19:00:35] (03PS2) 10Volans: CHANGELOG: add changelogs for release v0.0.12 [software/spicerack] - 10https://gerrit.wikimedia.org/r/483487 [19:00:39] \o/ [19:01:14] (03PS1) 10Dzahn: create static microsite for RT ticket archive, copy static Bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/483489 (https://phabricator.wikimedia.org/T180641) [19:01:26] (03PS6) 10Framawiki: Modifying configuration about Chinese Wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [19:02:45] (03CR) 10jerkins-bot: [V: 04-1] Modifying configuration about Chinese Wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [19:04:17] Who will SWAT? :D [19:05:08] (03PS1) 10Mathew.onipe: Updated changelog [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/483492 (https://phabricator.wikimedia.org/T210592) [19:06:31] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.12 [software/spicerack] - 10https://gerrit.wikimedia.org/r/483487 (owner: 10Volans) [19:08:02] (03CR) 10Mathew.onipe: "So I was able to push. Then I think I have to pull then make a change that passes through the review process, get merged and trigger a bui" [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/483492 (https://phabricator.wikimedia.org/T210592) (owner: 10Mathew.onipe) [19:10:32] (03CR) 10Ayounsi: Monitoring: add VRRP check (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/481154 (https://phabricator.wikimedia.org/T150264) (owner: 10Faidon Liambotis) [19:11:34] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [19:12:16] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.12 [software/spicerack] - 10https://gerrit.wikimedia.org/r/483487 (owner: 10Volans) [19:12:46] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [19:13:26] (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.12 [software/spicerack] - 10https://gerrit.wikimedia.org/r/483487 (owner: 10Volans) [19:13:53] !log Deploy schema change on dbstore1002 - T85757 [19:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:55] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [19:14:01] (03CR) 10Mathew.onipe: [C: 03+2] Updated changelog [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/483492 (https://phabricator.wikimedia.org/T210592) (owner: 10Mathew.onipe) [19:14:12] !log Deploy schema change on dbstore1001 - T85757 [19:14:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:53] Anybody here for swatting? :) [19:15:31] i can do it in 10 minutes or so [19:15:48] 10Operations, 10Wikimedia-Mailing-lists: lists.wikimedia.org reporting "You must GET the form before submitting it" for all list subscription attempts - https://phabricator.wikimedia.org/T185222 (10herron) @Tomthirteen which OS and browser version did this occur on? Also is it reproducible using different bro... [19:16:23] thanks tgr ! [19:17:09] (03PS2) 10Dzahn: create static microsite for RT ticket archive, copy static Bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/483489 (https://phabricator.wikimedia.org/T180641) [19:18:06] SWAT will be one full hour_ [19:19:10] (03PS1) 10Volans: Upstream release v0.0.12 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/483494 [19:21:51] (03PS1) 10Dzahn: webserver_misc_static: move backup::host to role [puppet] - 10https://gerrit.wikimedia.org/r/483496 [19:22:36] (03CR) 10Dzahn: [C: 03+2] webserver_misc_static: move backup::host to role [puppet] - 10https://gerrit.wikimedia.org/r/483496 (owner: 10Dzahn) [19:25:34] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.12 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/483494 (owner: 10Volans) [19:27:42] right [19:28:24] super [19:31:08] tgr: It started? [19:31:12] (03CR) 10Gergő Tisza: [C: 03+2] Create Cookbook NS in bnwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458870 (https://phabricator.wikimedia.org/T203534) (owner: 10Framawiki) [19:32:22] (03Merged) 10jenkins-bot: Create Cookbook NS in bnwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458870 (https://phabricator.wikimedia.org/T203534) (owner: 10Framawiki) [19:32:36] (03Merged) 10jenkins-bot: Upstream release v0.0.12 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/483494 (owner: 10Volans) [19:33:18] (03PS4) 10Gergő Tisza: Whitelist *.*.archive.org in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468831 (https://phabricator.wikimedia.org/T207581) (owner: 10Framawiki) [19:34:31] framawiki: cookbook is on mwdebug1002 [19:34:59] on it [19:36:49] tgr: cookook ok for me [19:38:18] (03PS12) 10Gehel: wdqs: preliminary work to manage multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) [19:38:38] PROBLEM - Request latencies on acrux is CRITICAL: instance=10.192.0.93:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [19:39:24] (03CR) 10Gehel: [C: 03+2] wdqs: preliminary work to manage multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/483217 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [19:39:28] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:458870|Note that namespaceDupes.php maintenance script run will be needed after the deployment. (T203534)]] (duration: 00m 53s) [19:39:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:31] T203534: Create Cookbook:Namespace in Bengali Wikibooks - https://phabricator.wikimedia.org/T203534 [19:39:57] !log uploaded spicerack_0.0.12-1_amd64.deb to apt.wikimedia.org stretch-wikimedia T205884 [19:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:00] T205884: Spicerack: split wmf-auto-reimage-lib into Spicerack modules - https://phabricator.wikimedia.org/T205884 [19:40:57] (03CR) 10jenkins-bot: Create Cookbook NS in bnwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458870 (https://phabricator.wikimedia.org/T203534) (owner: 10Framawiki) [19:41:06] RECOVERY - Request latencies on acrux is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [19:41:11] !log installed spicerack 0.0.12-1 on cumin2001 T205884 [19:41:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:35] !log ran mwscript namespaceDupes.php bnwikibooks --fix (238 links fixed) [19:41:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:47] (03CR) 10Gergő Tisza: [C: 03+2] Whitelist *.*.archive.org in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468831 (https://phabricator.wikimedia.org/T207581) (owner: 10Framawiki) [19:43:52] (03Merged) 10jenkins-bot: Whitelist *.*.archive.org in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468831 (https://phabricator.wikimedia.org/T207581) (owner: 10Framawiki) [19:44:48] (03PS3) 10Gergő Tisza: Remove AICaptcha settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481987 (https://phabricator.wikimedia.org/T186244) [19:45:13] framawiki: *.*.archive.org on mwdebug [19:46:00] (03CR) 10Smalyshev: "compiler run: https://puppet-compiler.wmflabs.org/compiler1002/14273/" [puppet] - 10https://gerrit.wikimedia.org/r/483310 (owner: 10Smalyshev) [19:46:24] tgr: lgtm [19:47:06] (03CR) 10Gergő Tisza: [C: 03+2] Remove AICaptcha settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481987 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [19:47:08] (03PS9) 10Gehel: Puppetize blazegraph config for cases where deployed one is not enough [puppet] - 10https://gerrit.wikimedia.org/r/483310 (owner: 10Smalyshev) [19:47:25] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:468831|Whitelist *.*.archive.org in wgCopyUploadsDomains (T207581)]] (duration: 00m 53s) [19:47:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:29] T207581: Whitelist *.*.archive.org in wgCopyUploadsDomains - https://phabricator.wikimedia.org/T207581 [19:48:15] (03Merged) 10jenkins-bot: Remove AICaptcha settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481987 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [19:48:23] (03CR) 10Gehel: [C: 03+2] "PCC agrees this is a noop: https://puppet-compiler.wmflabs.org/compiler1002/14274/wdqs1003.eqiad.wmnet/change.wdqs1003.eqiad.wmnet.pson" [puppet] - 10https://gerrit.wikimedia.org/r/483310 (owner: 10Smalyshev) [19:50:07] (03CR) 10Gehel: [C: 03+2] make blazegraph port configurable [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/483144 (https://phabricator.wikimedia.org/T213289) (owner: 10Gehel) [19:50:35] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:481987|Remove AICaptcha settings (T186244)]] (duration: 00m 52s) [19:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:40] T186244: Deploy AICaptcha data collection - https://phabricator.wikimedia.org/T186244 [19:54:11] (03CR) 10jenkins-bot: Whitelist *.*.archive.org in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468831 (https://phabricator.wikimedia.org/T207581) (owner: 10Framawiki) [19:54:13] (03CR) 10jenkins-bot: Remove AICaptcha settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481987 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [19:54:21] tgr: How its going? [19:54:33] MatmaRex: patch is on mwdebug1002 [19:55:00] Zoranzoki21: leaving that last as the extension patch is already merged so I need to wrap that up [19:55:07] and we are running out of time [19:55:12] tgr: Ok [19:55:28] I will move my patches for another SWAT [19:56:37] (03PS1) 10Gehel: wdqs: force creation of /etc/wdqs/RWStore.properties [puppet] - 10https://gerrit.wikimedia.org/r/483504 (https://phabricator.wikimedia.org/T213234) [19:57:09] (03CR) 10Smalyshev: [C: 03+1] wdqs: force creation of /etc/wdqs/RWStore.properties [puppet] - 10https://gerrit.wikimedia.org/r/483504 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [19:57:20] (03CR) 10Gehel: [C: 03+2] wdqs: force creation of /etc/wdqs/RWStore.properties [puppet] - 10https://gerrit.wikimedia.org/r/483504 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [19:59:04] MatmaRex: around? [19:59:33] tgr: sorry, yes [19:59:38] looking now. i missed your ping [20:00:00] nw [20:00:04] marxarelli: That opportune time is upon us again. Time for a MediaWiki train - Americas version deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190110T2000). [20:00:40] tgr: looks good [20:00:45] (on test.wp) [20:00:58] tgr: need more time? [20:01:03] i can wait [20:01:07] just a minute [20:01:09] np [20:01:48] (03PS1) 10Gehel: wdqs: force creation of /etc/wdqs/RWStore.properties as a file [puppet] - 10https://gerrit.wikimedia.org/r/483505 (https://phabricator.wikimedia.org/T213234) [20:02:10] !log tgr@deploy1001 Synchronized php-1.33.0-wmf.12/extensions/WikimediaEvents/modules/ve-wme/campaigns.js: SWAT: [[gerrit:483485|Remove unnecessary addPlugin wrapper (T213338)]] (duration: 00m 53s) [20:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:13] T213338: Toggling from VE to source to VE in MobileFrontend breaks - https://phabricator.wikimedia.org/T213338 [20:02:33] MatmaRex: ^ can you double-check? I don't fully trust ResourceLoader + sync-file [20:03:08] on the normal servers now? [20:03:15] or mwdebug still? [20:03:40] MatmaRex: normal, yeah [20:05:19] tgr: seems good [20:05:32] cool, thanks. all yours, marxarelli [20:05:44] danke [20:05:53] thanks [20:07:48] (03CR) 10Gehel: [C: 03+2] wdqs: force creation of /etc/wdqs/RWStore.properties as a file [puppet] - 10https://gerrit.wikimedia.org/r/483505 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel) [20:09:14] (03PS1) 10Dduvall: all wikis to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483508 [20:09:16] (03CR) 10Dduvall: [C: 03+2] all wikis to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483508 (owner: 10Dduvall) [20:10:22] (03Merged) 10jenkins-bot: all wikis to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483508 (owner: 10Dduvall) [20:11:31] !log restart blazegraph on wdqs1009 to validate new config [20:11:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:45] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.12 [20:12:49] dduvall@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [20:14:04] PROBLEM - Apache HTTP on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:14:26] PROBLEM - HHVM rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:14:26] PROBLEM - Nginx local proxy to apache on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:15:08] RECOVERY - Apache HTTP on mw1344 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.041 second response time [20:15:10] seeing timeouts, but waiting to see if they subside as they did yesterday [20:15:30] RECOVERY - Nginx local proxy to apache on mw1344 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.043 second response time [20:15:30] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 75605 bytes in 0.115 second response time [20:19:54] !log seeing increase in "60 second timed out" error rate and rise in 503 rate, as was the case with group1 deployment. continuing to monitor [20:19:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:17] (03CR) 10jenkins-bot: all wikis to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483508 (owner: 10Dduvall) [20:21:50] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:22:18] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:22:48] PROBLEM - Nginx local proxy to apache on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:54] RECOVERY - Nginx local proxy to apache on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 618 bytes in 0.722 second response time [20:24:08] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.309 second response time [20:24:36] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 75605 bytes in 0.112 second response time [20:28:34] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [20:34:02] (03PS3) 10Dzahn: create static microsite for RT ticket archive, copy static Bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/483489 (https://phabricator.wikimedia.org/T180641) [20:34:36] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [20:34:41] ^is the 5xx reqs/min alert alert actionable? looking at a time period of 30 days the recent spike looks on the low side [20:34:53] and theres the recovery [20:35:01] not sure yet [20:37:58] there was a definite spike following deployment, but it's leveled off [20:38:24] and you're right, it's not high relative to a wide time range [20:39:59] the spike in mw errors has subsided as well [20:41:23] 10Operations, 10Thumbor, 10Wikimedia-Logstash, 10serviceops, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10herron) Sure, sounds good! [20:47:25] !log both mediawiki error rates and 500 response rates have subsided back to pre-deploy levels [20:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:57] (03PS1) 10Dzahn: cross-validate-accounts: add a diff output to validate_common_ops_group [puppet] - 10https://gerrit.wikimedia.org/r/483520 [21:16:24] (03CR) 10jerkins-bot: [V: 04-1] cross-validate-accounts: add a diff output to validate_common_ops_group [puppet] - 10https://gerrit.wikimedia.org/r/483520 (owner: 10Dzahn) [21:17:48] (03PS2) 10Dzahn: cross-validate-accounts: add a diff output to validate_common_ops_group [puppet] - 10https://gerrit.wikimedia.org/r/483520 [21:17:54] thanks for the deployment tgr|away [21:21:46] (03PS3) 10Dzahn: cross-validate-accounts: add a diff output to validate_common_ops_group [puppet] - 10https://gerrit.wikimedia.org/r/483520 [21:31:16] (03CR) 10Dzahn: [C: 03+2] "everything copied from static-bz and compiles.. just preparing backup and basic site .. no content yet https://puppet-compiler.wmflabs.or" [puppet] - 10https://gerrit.wikimedia.org/r/483489 (https://phabricator.wikimedia.org/T180641) (owner: 10Dzahn) [21:36:46] PROBLEM - LVS HTTP IPv4 on wdqs.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:37:05] hmm [21:37:10] herron: I'm around for the xhgui patch whenever (4-5 hours) [21:37:22] PROBLEM - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:37:47] ah dammit [21:37:56] 2 different ones? [21:38:00] * gehel is looking at wdqs [21:38:08] looks weird [21:38:27] RECOVERY - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 138 bytes in 0.082 second response time [21:38:37] maybe unrelated_ [21:38:42] zotero one our old friend? [21:39:07] RECOVERY - LVS HTTP IPv4 on wdqs.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.082 second response time [21:39:17] wdqs hosts are complaining about high lag [21:39:40] where do you see this? [21:39:50] looks like lag is back to zero on wdqs [21:39:50] herron: icinga [21:40:10] not true, just no data [21:40:19] ah it's unknown [21:40:27] high lag is the name of the check [21:40:36] hmm not very aptly named [21:41:03] fsero: looks a bit weird. https://grafana.wikimedia.org/d/000000620/xxxx-zotero-debugging-kubernetes?orgId=1&from=now-1h&to=now [21:41:08] replication lag, we could be more specific [21:41:21] it's the "high" part that confused me btw [21:41:35] I was like "lag is high, this is a problem!" [21:41:38] indeed weird [21:41:51] (03PS1) 10Smalyshev: Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) [21:42:00] well, that's the case, if lag is high, this is a problem [21:42:01] yeah zotero memory is low this time around on codfw (albeit higher than before) [21:42:29] on eqiad where we actually had a clear case of node heap maximum usage it did not alert [21:42:33] Ok, the lag is probably due to an updated version of the blazegraph exporter. I did test it locally on one of the hosts, but I probably missed something, checking [21:42:52] (03CR) 10jerkins-bot: [V: 04-1] Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [21:46:23] zotero codfw error "logs" https://www.irccloud.com/pastebin/f2WAOy4r/ [21:46:36] akosiaris: no ETIMEDOUT either [21:47:20] actually besides the Error: it looks like "normal" errors [21:48:00] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [21:48:13] (03PS2) 10Smalyshev: Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) [21:48:14] that's probably us getting the logs ^ ? [21:49:04] (03CR) 10jerkins-bot: [V: 04-1] Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [21:49:14] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [21:49:18] (03PS2) 10Herron: xhgui: Disable deletion features [puppet] - 10https://gerrit.wikimedia.org/r/483048 (https://phabricator.wikimedia.org/T213218) (owner: 10Krinkle) [21:49:45] that is some also about Error: Could not parse CSS stylesheet and then dumping all the CSS [21:50:21] but looks unrelated [21:50:43] (03CR) 10Herron: [C: 03+2] xhgui: Disable deletion features [puppet] - 10https://gerrit.wikimedia.org/r/483048 (https://phabricator.wikimedia.org/T213218) (owner: 10Krinkle) [21:50:59] (03PS3) 10Smalyshev: Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) [21:51:21] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Eevans) >>! In T212418#4870100, @Cmjohnson wrote: > @eevans I am going to have to power it back on and let it go for a few days to see if the error returns,... [21:51:30] PROBLEM - Request latencies on acrab is CRITICAL: instance=10.192.16.26:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [21:52:04] (03CR) 10jerkins-bot: [V: 04-1] Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [21:52:42] RECOVERY - Request latencies on acrab is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [21:54:00] https://www.popmatters.com/future-generations-coast-audio-premiere-2495427253.html [21:54:07] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Cmjohnson) I ended up leaving the production cables disconnected. [21:54:16] that one was request to multiple pods right before the alert [21:54:24] I wonder... [21:56:11] (03PS1) 10Gehel: wdqs: wdqs profile was renamed [puppet] - 10https://gerrit.wikimedia.org/r/483600 [21:56:19] ^ this should fix the wdqs metrics [21:56:23] wow [21:56:25] (03PS4) 10Smalyshev: Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) [21:56:59] fsero: I may have just reproduced ... [21:57:17] (03PS2) 10Gehel: wdqs: wdqs profile was renamed [puppet] - 10https://gerrit.wikimedia.org/r/483600 [21:57:19] I have managed to sent a zotero pod into a busy loop I think [21:57:41] (03PS1) 10Cwhite: hiera: add cluster definition to snapshot servers [puppet] - 10https://gerrit.wikimedia.org/r/483602 (https://phabricator.wikimedia.org/T210486) [21:57:49] https://www.irccloud.com/pastebin/j9YT66o6/ [21:58:04] (03CR) 10jerkins-bot: [V: 04-1] Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [21:58:05] 10Operations, 10Patch-For-Review: Reallocate former image scalers - https://phabricator.wikimedia.org/T192457 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` ['mw1298.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201901102157_dzahn... [21:58:10] akosiaris: great definitely happens at 21 look at the error distribution [21:58:20] (03PS1) 10Dzahn: static-rt: add LDAP simple auth [puppet] - 10https://gerrit.wikimedia.org/r/483604 (https://phabricator.wikimedia.org/T180641) [21:58:20] it was that url? [21:58:31] possibly [21:58:35] login into kubernetes2004 [21:58:46] and look at the cpu usage of process 29863 [21:58:53] I handpicked it to send traffic to it [21:59:06] (03PS1) 10Krinkle: Revert "xhgui: Disable deletion features" [puppet] - 10https://gerrit.wikimedia.org/r/483605 [21:59:28] (03CR) 10Gehel: [C: 03+2] wdqs: wdqs profile was renamed [puppet] - 10https://gerrit.wikimedia.org/r/483600 (owner: 10Gehel) [21:59:50] it goes from an R state to a D state everynow and then but it does look locked up [22:00:36] it's probably hitting the pod cpu limit [22:01:34] (03PS1) 10BryanDavis: cloud: rewrite spreadcheck.py NPRE check [puppet] - 10https://gerrit.wikimedia.org/r/483606 [22:01:38] ah, no it should not. the limits is 10 vcpus [22:01:53] and it's maxing out at 250% or something [22:02:08] addshore: Around to double-check as I deploy? [22:02:25] jouncebot: next [22:02:25] In 1 hour(s) and 57 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190111T0000) [22:02:49] (03PS1) 10Krinkle: xhgui: Fix typo for run/delete* in LocationMatch [puppet] - 10https://gerrit.wikimedia.org/r/483608 (https://phabricator.wikimedia.org/T213218) [22:02:55] (03Abandoned) 10Krinkle: Revert "xhgui: Disable deletion features" [puppet] - 10https://gerrit.wikimedia.org/r/483605 (owner: 10Krinkle) [22:03:39] (03CR) 10BryanDavis: cloud: rewrite spreadcheck.py NPRE check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483606 (owner: 10BryanDavis) [22:03:55] fsero: yeah reproduced. I sent in the same spiralling deadlock the second pod on kubernetes2004 [22:04:03] ok we at least have a reproduction for once [22:04:56] I think we can kill both pods and retry to reproduce this locally [22:05:27] (03PS1) 10Cwhite: hiera: add cluster definition to syslog servers [puppet] - 10https://gerrit.wikimedia.org/r/483612 (https://phabricator.wikimedia.org/T210486) [22:05:49] (03CR) 10BryanDavis: cloud: rewrite spreadcheck.py NPRE check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483606 (owner: 10BryanDavis) [22:06:13] great [22:06:19] (03PS2) 10Cwhite: hiera: add cluster definition to poolcounter servers [puppet] - 10https://gerrit.wikimedia.org/r/483009 (https://phabricator.wikimedia.org/T210486) [22:06:20] let me try on staging [22:06:30] (03PS5) 10Smalyshev: Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) [22:06:32] (03CR) 10Cwhite: [C: 03+2] hiera: add cluster definition to poolcounter servers [puppet] - 10https://gerrit.wikimedia.org/r/483009 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [22:07:01] James_F: yup [22:07:10] oh, James_F in 2 hours? [22:07:37] addshore: No, now now. [22:07:44] James_F: is it somewhere? [22:07:46] * addshore reads up [22:07:51] addshore: Still merging. [22:07:54] ack :) [22:08:05] * addshore goes to find that previous examine link [22:08:28] addshore: Presumably the logged diff won't change after the fact? [22:08:39] (03PS2) 10BryanDavis: cloud: rewrite spreadcheck.py NPRE check [puppet] - 10https://gerrit.wikimedia.org/r/483606 [22:08:45] James_F: thats true, they are stoed [22:08:48] stored.. bah [22:09:02] we should be able to examine any diff we want though, so let me find an appropriate link [22:09:05] (03CR) 10jerkins-bot: [V: 04-1] Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [22:09:23] (03PS2) 10Krinkle: xhgui: Fix typo for in LocationMatch and double-slash [puppet] - 10https://gerrit.wikimedia.org/r/483608 (https://phabricator.wikimedia.org/T213218) [22:09:24] James_F: deploy the wikibase patch first :) [22:09:36] addshore: Indeed. [22:09:40] otherwise even more confusing AF diffs will happen [22:09:56] Yeah. [22:10:09] Also I want you to confirm it doesn't break Wikidata.org. :-) [22:10:22] Breaking Commons is bad enough. Let's not break our two biggest wikis. [22:11:07] (03CR) 10Herron: [C: 03+2] xhgui: Fix typo for in LocationMatch and double-slash [puppet] - 10https://gerrit.wikimedia.org/r/483608 (https://phabricator.wikimedia.org/T213218) (owner: 10Krinkle) [22:11:38] (03PS1) 10Volans: remote: fix logging for reboot() [software/spicerack] - 10https://gerrit.wikimedia.org/r/483613 (https://phabricator.wikimedia.org/T213296) [22:11:39] is there a test file / sandbox file page on commons James_F? [22:11:47] (03PS6) 10Smalyshev: Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) [22:11:52] i guess i can test it on the test commons though actually [22:12:23] Yes. [22:12:47] (03CR) 10jerkins-bot: [V: 04-1] Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [22:13:31] (03PS3) 10Cwhite: hiera: add cluster definition to poolcounter servers [puppet] - 10https://gerrit.wikimedia.org/r/483009 (https://phabricator.wikimedia.org/T210486) [22:14:30] (03PS7) 10Smalyshev: Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) [22:14:54] (03PS3) 10BryanDavis: cloud: rewrite spreadcheck.py NPRE check [puppet] - 10https://gerrit.wikimedia.org/r/483606 [22:15:23] hmm it takes a while but the RES on the first pod is finally reaching node heap maximums [22:15:34] addshore: If you want something to worry about whilst waiting for code merge, https://phabricator.wikimedia.org/T207683 needs your expertise. [22:16:07] James_F: should I just check all of the ones that shouldnt be listed now? [22:16:08] or? [22:16:45] addshore: I was thinking more of your expertise on what code in Wikibase needs to change to disable the pages. [22:16:59] oh, I dont think we want to do anything in Wikibase itself, probably [22:17:09] just unregister the apis and epcial pages in mediawiki-config [22:17:14] or at least that is what I had in mind [22:17:30] but maybe I should think about it a bit more tommorrow [22:18:32] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10Legoktm) HTTP 429 is rate limiting... https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429 Since these are calls to ap... [22:18:53] 10Operations, 10Wikidata, 10Wikidata-Query-Service: Some queries causes wdqs-blazegraph on wdqs1006 to crash and restart - https://phabricator.wikimedia.org/T213191 (10herron) p:05Triage→03Normal [22:19:10] (03PS4) 10BryanDavis: cloud: rewrite spreadcheck.py NPRE check [puppet] - 10https://gerrit.wikimedia.org/r/483606 [22:19:59] (03CR) 10BryanDavis: "Only one of the newly added grouping is currently out of balance:" [puppet] - 10https://gerrit.wikimedia.org/r/483606 (owner: 10BryanDavis) [22:21:52] 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog, 10Security-Team: Decide on handling system updates for Proton - https://phabricator.wikimedia.org/T213366 (10herron) p:05Triage→03Normal [22:23:39] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10monitoring: upgrade prometheus-blazegraph-exporter to python3 - https://phabricator.wikimedia.org/T213305 (10herron) p:05Triage→03Normal [22:23:56] trying to run puppet compiler on my change, I get tons of unrelated errors: https://puppet-compiler.wmflabs.org/compiler1002/14276/wdqs1004.eqiad.wmnet/prod.wdqs1004.eqiad.wmnet.err [22:24:21] is that a known issue? [22:24:21] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10herron) p:05Triage→03Normal [22:24:48] 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog, 10Traffic: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10herron) p:05Triage→03Normal [22:25:37] SMalyshev: yes, since jenkins-bot still gave you +2 it's ok and theey are known issues [22:25:53] mutante: ok, thanks! [22:26:04] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10bd808) >>! In T213475#4871260, @Chicocvenancio wrote: > On a different note, is this impossible to be done from the dumps? As... [22:26:05] sometimes there is a Error hidden between those warnings but then jenkins-bot would say -1 normally [22:26:06] addshore: Eurgh, CI for Wikibase makes me cry. [22:26:44] (03PS8) 10Smalyshev: Prepare for multi-instance Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) [22:28:20] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10Automactic) Hi, I am the dev of Zimfarm (the system automating the scrape process). I can run the scraper at home successfully... [22:29:50] James_F: yeh, it does that sometimes [22:30:42] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10chasemp) Could a change to coming from a 172 address have effected ratelimit whitelisting? [22:31:39] addshore: OK, Wikibase *but not AbuseFilter* change is live on mwdebug1002. [22:31:44] James_F: ack [22:31:52] (03CR) 10Volans: remote: fix logging for reboot() (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/483613 (https://phabricator.wikimedia.org/T213296) (owner: 10Volans) [22:32:56] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [22:32:56] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [22:32:56] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [22:32:58] James_F: just spotted this an am confued [22:33:00] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [22:33:00] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [22:33:02] https://usercontent.irccloud-cdn.com/file/HCYK41fa/image.png [22:33:32] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [22:33:32] 10Operations, 10Patch-For-Review: Reallocate former image scalers - https://phabricator.wikimedia.org/T192457 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1298.eqiad.wmnet'] ` and were **ALL** successful. [22:34:15] PROBLEM - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:34:27] and here we go again [22:34:36] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [22:35:22] akosiaris: anything I can do to help with zotero? [22:35:28] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [22:35:31] addshore: Oh, hmm. Where? [22:35:48] James_F: https://test-commons.wikimedia.org/wiki/File:Bluesq2.png [22:35:54] someone is looking at the page? [22:35:59] pages* [22:36:08] I am [22:36:14] me too [22:36:16] but I know finally what it is [22:36:18] arturo: my understanding is that they got a repro case [22:36:19] and can reproduce it [22:37:11] ok [22:37:33] RECOVERY - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 138 bytes in 0.081 second response time [22:37:57] addshore: oh! That's the content blob. It's wrapped. Odd. [22:38:08] i did also in staging check out the grafana (still did not hit the max node heap size but the spike is amazing) [22:38:12] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [22:38:35] it takes quite a bit of time to error out in the end [22:39:25] I 've killed both pods the alert was probably cause of traffic having hit one the 2 [22:39:41] or at least I believe so due to https://grafana.wikimedia.org/d/000000620/xxxx-zotero-debugging-kubernetes?panelId=45&fullscreen&orgId=1&from=now-1h&to=now [22:39:57] maybe the new upstream version fares better [22:40:16] PROBLEM - HHVM rendering on mw2209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:40:34] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [22:41:10] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [22:41:20] RECOVERY - HHVM rendering on mw2209 is OK: HTTP OK: HTTP/1.1 200 OK - 75902 bytes in 0.360 second response time [22:41:31] the new upstream version didnt worked this evening and reading the changelog i dont see any performance improvements [22:41:43] but worth a try [22:42:36] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10bd808) The 429 response is definitely a rate limit on on the Wikimedia side. It is not obvious to me by looking at the upstream... [22:42:39] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10akosiaris) >>! In T213475#4871480, @chasemp wrote: > Could a change to coming from a 172 address have effected ratelimit whitel... [22:43:51] (03PS1) 10Dzahn: admins: add Greg to phabricator-admins [puppet] - 10https://gerrit.wikimedia.org/r/483623 [22:44:50] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [22:44:57] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10akosiaris) > Yes. The VCL code that performs the rate limiting is in modules/varnish/templates/text-frontend.inc.vcl.erb and in... [22:44:58] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [22:45:41] fsero: yeah let's have a look at it tomorrow [22:45:54] but now it's time to go to sleep. ciaos [22:46:00] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [22:46:02] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [22:46:36] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10BBlack) Right. I'm not up to speed on where all related changes are, but from VCL's point of view its definition of `wikimedia... [22:50:09] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [22:52:39] (03PS2) 10Dzahn: admins: add Greg to phabricator-admins [puppet] - 10https://gerrit.wikimedia.org/r/483623 [22:54:39] addshore: Think we're good to go? [22:55:01] James_F: see my messages in #wikimedia-commons-sd [22:55:08] Oh! [22:55:31] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [22:56:27] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [22:57:42] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/includes/EditEntity/MediawikiEditFilterHookRunner.php: T213453: Pass slotrole into EditFilterMergedContent hook in Wikibase repo (duration: 00m 47s) [22:57:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:57:45] T213453: AbuseFilter MCR diff is comparing old value of one slot with the new value from another, not the old whole page with the new whole page - https://phabricator.wikimedia.org/T213453 [23:03:38] (03CR) 10Smalyshev: "Puppet compiler: https://puppet-compiler.wmflabs.org/compiler1002/14277/" [puppet] - 10https://gerrit.wikimedia.org/r/483529 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [23:05:36] 10Operations, 10Cloud-VPS, 10Traffic, 10serviceops: Difficulties to create offline version of Wikipedia because of HTTP 429 response - https://phabricator.wikimedia.org/T213475 (10akosiaris) >>! In T213475#4871518, @BBlack wrote: > Right. I'm not up to speed on where all related changes are, but from VCL'... [23:07:43] 10Operations, 10Patch-For-Review: Reallocate former image scalers - https://phabricator.wikimedia.org/T192457 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` ['mw2244.codfw.wmnet', 'mw2245.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reim... [23:21:36] (03PS1) 10Smalyshev: Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) [23:22:24] (03PS3) 10Dzahn: tor::relay: make Tor family configurable and move to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/459876 [23:22:34] (03CR) 10jerkins-bot: [V: 04-1] Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [23:26:32] (03CR) 10Greg Grossmeier: [C: 03+1] "Yes, thank you :)" [puppet] - 10https://gerrit.wikimedia.org/r/483623 (owner: 10Dzahn) [23:30:43] jouncebot: now [23:30:43] No deployments scheduled for the next 0 hour(s) and 29 minute(s) [23:30:46] jouncebot: next [23:30:46] In 0 hour(s) and 29 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190111T0000) [23:30:53] James_F: ^^ [23:31:04] just incase you didn't spot the time! [23:31:13] (03PS1) 10Jforrester: [Commons, TestCommons] Don't use Wikibase entity search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483630 (https://phabricator.wikimedia.org/T213497) [23:31:21] addshore: Yeah. :-( [23:31:54] (03CR) 10Jforrester: [C: 03+2] [Commons, TestCommons] Don't use Wikibase entity search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483630 (https://phabricator.wikimedia.org/T213497) (owner: 10Jforrester) [23:32:59] (03Merged) 10jenkins-bot: [Commons, TestCommons] Don't use Wikibase entity search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483630 (https://phabricator.wikimedia.org/T213497) (owner: 10Jforrester) [23:36:18] !log jforrester@deploy1001 Synchronized wmf-config/Wikibase.php: T213497 [Commons, TestCommons] Don't use Wikibase entity search (duration: 00m 46s) [23:36:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:21] T213497: Wikibase hijacks search autocomplete frontend entirely; we should configure it off - https://phabricator.wikimedia.org/T213497 [23:41:33] (03CR) 10jenkins-bot: [Commons, TestCommons] Don't use Wikibase entity search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483630 (https://phabricator.wikimedia.org/T213497) (owner: 10Jforrester) [23:42:35] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [23:43:28] 10Operations, 10Patch-For-Review: Reallocate former image scalers - https://phabricator.wikimedia.org/T192457 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2244.codfw.wmnet', 'mw2245.codfw.wmnet'] ` and were **ALL** successful. [23:45:09] (03PS2) 10Smalyshev: Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) [23:45:09] !log upgraded xhgui to upstream 2965240c91e52 (current upstream master) - T213218 [23:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:45:12] T213218: Upstream patches to disable new deletion methods in XHGui 0.9 - https://phabricator.wikimedia.org/T213218 [23:45:47] !log krinkle@tunsten: upgrade xhgui to include upstream f039fb9f99f - T213218 [23:45:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:23] (03CR) 10jerkins-bot: [V: 04-1] Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [23:49:10] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [23:49:57] (03PS3) 10Smalyshev: Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) [23:50:56] (03CR) 10jerkins-bot: [V: 04-1] Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [23:55:25] (03PS4) 10Dzahn: tor::relay: make Tor family configurable and move to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/459876 [23:57:28] (03PS4) 10Smalyshev: Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) [23:58:23] (03CR) 10jerkins-bot: [V: 04-1] Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev) [23:58:24] I'm crashing SWAT with my UBN fixes, sorry about this. :-( [23:58:58] James_F: how many patches? [23:59:23] the three my team has are small, but the second one we'll need about 15-20 minutes to verify. [23:59:31] kostajh: One already merged, one half-merged. [23:59:54] kostajh: I can deploy them for you if that'd help?