[00:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Evening SWAT (Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181130T0000). [00:00:04] dmaza, MatmaRex, and ebernhardson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:16] hi. [00:00:37] hey hey [00:05:36] 10Operations, 10vm-requests: upgrade people.wm.org (rutherfordium) to stretch - https://phabricator.wikimedia.org/T210036 (10Dzahn) [00:06:02] I can SWAT [00:06:31] sweet [00:06:36] thank you thcipriani [00:06:41] (03CR) 10Dzahn: [C: 032] remove rutherfordium.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/475235 (https://phabricator.wikimedia.org/T210036) (owner: 10Dzahn) [00:06:47] * thcipriani doffs cap [00:06:51] (03CR) 10Dzahn: [C: 032] "VM has been deleted" [dns] - 10https://gerrit.wikimedia.org/r/475235 (https://phabricator.wikimedia.org/T210036) (owner: 10Dzahn) [00:08:41] 10Operations, 10vm-requests: upgrade people.wm.org (rutherfordium) to stretch - https://phabricator.wikimedia.org/T210036 (10Dzahn) [00:09:02] 10Operations, 10vm-requests: upgrade people.wm.org (rutherfordium) to stretch - https://phabricator.wikimedia.org/T210036 (10Dzahn) 05Open>03Resolved all done. strike one jessie host off the list [00:10:07] 10Operations, 10vm-requests: upgrade people.wm.org to stretch (replace rutherfordium with people1001) - https://phabricator.wikimedia.org/T210036 (10Dzahn) [00:10:15] MatmaRex: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/VisualEditor/+/476761/ shouldn't be needed anymore, wmf.6 is everywhere now :) [00:10:37] 10Operations, 10vm-requests, 10Technical-Debt: upgrade people.wm.org to stretch (replace rutherfordium with people1001) - https://phabricator.wikimedia.org/T210036 (10Dzahn) [00:10:39] ebernhar|lunch: if you're about, ping for SWAT [00:10:43] thcipriani: oh, neat. it wasn't like an hour ago [00:11:02] thcipriani: yup [00:11:07] yeah, hashar just finished it up a little bit ago [00:11:14] 10Operations, 10vm-requests, 10Technical-Debt: upgrade people.wikimedia.org to stretch (replace rutherfordium with people1001) - https://phabricator.wikimedia.org/T210036 (10Dzahn) [00:11:38] thcipriani: are we reasonably sure it won't get reverted again? [00:11:54] (03PS7) 10Dzahn: swift: Fix checks on drive/filesystem titles to allow for labs ones [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [00:13:09] (03CR) 10Dzahn: [C: 032] swift: Fix checks on drive/filesystem titles to allow for labs ones [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [00:13:46] MatmaRex: as sure as we ever are :) I think all the blockers were resolved in an unambiguous way [00:14:15] (03CR) 10Dzahn: [C: 032] "@Krenair one long running beta-pick off the list. might be worth to check if it's identical since i manually amended to get it going" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [00:15:00] (03CR) 10Dzahn: [C: 032] "should unblock swift in cloud vps finally. right" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [00:19:43] (03PS2) 10Dzahn: hadoop::ui: migrate from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/474832 [00:19:55] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [00:20:42] (03CR) 10jerkins-bot: [V: 04-1] hadoop::ui: migrate from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [00:23:02] (03CR) 10Dzahn: ""Line 4: Expected 'Change-Id:' to be in footer" eh? i see that in the footer, jenkins" [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [00:23:34] (03PS3) 10Dzahn: hadoop::ui: migrate from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/474832 [00:23:45] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [00:24:14] Fetching submodule modules/zookeeper [00:24:14] fatal: bad object 0000000000000000000000000000000000000000 [00:24:18] anyone else seeing this consistently? [00:24:52] not really, maybe i was a submodule in he past and then was removed / merged back in.. i think [00:24:55] it [00:24:57] that's bad.... [00:25:22] means a object has been deleted [00:25:25] if it's showing 0000000000000000000000000000000000000000 [00:25:34] or has gone missing [00:25:43] i know that some things have been submodules and then got merged back into main [00:25:54] yes and it's causing a big pain when trying to work with old commits [00:26:22] though [00:26:26] it works for me Krenair [00:26:31] maybe try a new fresh clone? [00:26:34] Submodule path 'modules/zookeeper': checked out '25fae48cedf7f77801dd5c39e52be6a927f7b188' [00:26:52] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Legoktm) >>! In T210667#4787049, @Jrogers-WMF wrote: > Hi all, commenting on this from WMF Legal. > > As I understand the question and conte... [00:28:45] PROBLEM - nova-compute proc minimum on labvirt1016 is CRITICAL: NRPE: Command check_ensure_nova_compute_running not defined [00:28:45] PROBLEM - ensure kvm processes are running on labvirt1016 is CRITICAL: NRPE: Command check_ensure_running_kvm_instances not defined [00:28:58] come on...jenkins. we're all rooting for ya. [00:29:09] PROBLEM - kvm ssl cert on labvirt1016 is CRITICAL: NRPE: Command check_kvm_ssl_cert not defined [00:29:20] (03PS4) 10Dzahn: hadoop::ui: migrate from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/474832 [00:29:24] haha [00:29:36] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [00:29:39] there you go [00:29:39] meh [00:29:40] worked around it [00:29:47] PROBLEM - nova-compute proc maximum on labvirt1016 is CRITICAL: NRPE: Command check_ensure_single_nova_compute_proc not defined [00:29:55] (03PS11) 10Alex Monk: swift: use implicit /dev/swift prefix for swift devices [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) (owner: 10Filippo Giunchedi) [00:30:12] labvirt1016 looks to me like it's a new host [00:30:36] wait, what.. re-renamed? [00:30:45] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1002/13788/" [puppet] - 10https://gerrit.wikimedia.org/r/476592 (https://phabricator.wikimedia.org/T141324) (owner: 10Herron) [00:30:47] maybe to a cloudvirt1016? [00:30:52] (03PS4) 10Herron: rsyslog: input::file add multiline handling & ship gerrit logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476592 (https://phabricator.wikimedia.org/T141324) [00:31:05] paladox: https://phabricator.wikimedia.org/T209426 [00:31:09] that is reverse though? [00:31:25] (03CR) 10Paladox: [C: 031] "Before it was reverted, i tested my idea of changing it to if @var and it works." [puppet] - 10https://gerrit.wikimedia.org/r/476592 (https://phabricator.wikimedia.org/T141324) (owner: 10Herron) [00:31:43] oh [00:31:57] andrewbogott ^^ [00:32:19] (03CR) 10Herron: "> Before it was reverted, i tested my idea of changing it to if @var" [puppet] - 10https://gerrit.wikimedia.org/r/476592 (https://phabricator.wikimedia.org/T141324) (owner: 10Herron) [00:32:34] maybe it wasn't purged properly and just got out of downtime? [00:32:39] (03CR) 10Herron: [C: 032] rsyslog: input::file add multiline handling & ship gerrit logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476592 (https://phabricator.wikimedia.org/T141324) (owner: 10Herron) [00:33:27] something is wrong with puppet in deployment-prep andrewbogott [00:33:32] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Reading data from Deployment-prep failed: NoMethodError: undefined method `[]' for nil:NilClass at /etc/puppet/manifests/realm.pp:63:15 on node deployment-puppetmaster03.deployment-prep.eqiad.wmflabs [00:33:51] Krenair: one problem at a time please [00:33:56] ok [00:34:15] dmaza: your core change is on mwdebug1002, check please [00:34:21] on it [00:34:44] Krenair try running puppet again. [00:34:53] 10Operations, 10Community-Tech, 10MediaWiki-Parser, 10SVG Translate Tool, and 6 others: Show SVGs in page language if available - https://phabricator.wikimedia.org/T205040 (10Niharika) [00:35:01] paladox, it's consistent [00:35:03] andrewbogott: i think it might need the "cert revoke; node clean; node deactivate" on the puppetmaster and then puppet run on icinga1001 [00:35:07] oh [00:35:23] mutante: yes, doing that now [00:35:25] and that will then probably remove it from monitoring..if that's just the old server remnants [00:35:29] *nod*, cool [00:35:38] mutante: what is the new einsteinium? [00:35:43] andrewbogott: icinga1001 [00:36:16] everything 1001 now, icinga, cumin and puppetmaster [00:36:30] thcipriani: looks good [00:36:47] dmaza: okie doke, thanks for checking, going live [00:38:14] thank you [00:38:27] Krenair: what host is doing that? [00:38:40] And, I might finish eating dinner before I investigate :) [00:38:47] didn't realise you guys got paged [00:39:12] so far I've observed it on deployment-puppetmaster03 and deployment-ms-be03, no idea what it means [00:39:24] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.6/includes/api/ApiBase.php: SWAT: [[gerrit:476409|Add stats for block errors on create/edit actions]] T201717 (duration: 00m 48s) [00:39:40] ^ dmaza one down [00:39:41] the line it references is "$app_routes = hiera('discovery::app_routes')" but I have no idea what problem it could possibly have with that [00:40:08] sweet [00:40:11] that hiera key is set in labs.yaml... [00:40:22] even if it weren't that wouldn't explain the bizarre error [00:40:31] thcipriani@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [00:40:34] T201717: Tracking blocks: Log hard edits that fail via API or page rejection - https://phabricator.wikimedia.org/T201717 [00:41:21] andrewbogott it seems that stashbot has been failing quite alot to log today ^^ (i presume it's a wikitech problem) [00:42:36] (03CR) 10Dzahn: "interesting, the experimental check here says failed but if you follow the link you get to a working catalog at https://puppet-compiler.wm" [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [00:43:08] dmaza: MatmaRex your visualeditor changes are both on mwdebug1002, check please [00:43:16] checking [00:43:27] looking [00:44:42] thcipriani: seems good [00:44:45] thcipriani: looks good here [00:45:30] cool, thanks for checking, going live [00:45:42] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:46:34] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.034 second response time [00:49:34] dmaza: hrm, I'm seeing a lot of requests timing out following that first sync [00:50:06] ugg.. can you give me more info about that? [00:51:15] just a lot of noise in the logs along the lines of "error: entire web request took longer than 60 seconds and timed out in /srv/mediawiki/php-1.33.0-wmf.6/includes/api/ApiMain.php on line 309" [00:51:28] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1002/31/analytics-tool1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [00:52:00] hrm, seems to have trailed off, it just just right after that sync [00:52:35] was this after the VE change or the core change? [00:52:37] (03PS3) 10Dzahn: turnilo: migrate from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/474834 [00:52:44] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/474834 (owner: 10Dzahn) [00:53:13] dmaza: I've only sync'd the core change so far, I was about to sync ve stuff and noticed the spike [00:53:47] did it go away or is it still happening? [00:54:10] seems to have returned to normal now [00:54:36] ok, I'm going to continue SWAT, I'll keep an eye on logs [00:54:41] odd.. my changes are behind a config flag and nothing `should` execute [00:54:53] thank you.. let me know if you see it again [00:54:59] will do :) [00:55:07] ok, sorry for delay, going live with VE changes [00:55:16] no worries.. better safe than sorry [00:55:22] (03CR) 10Dzahn: "same here, experimental check claims failed but actually links to https://puppet-compiler.wmflabs.org/compiler1002/32/analytics-tool1002.e" [puppet] - 10https://gerrit.wikimedia.org/r/474834 (owner: 10Dzahn) [00:57:21] off [00:57:36] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.6/extensions/VisualEditor: SWAT: [[gerrit:476612|Rename configs for tracking block notices on visual editor]] [[gerrit:476762|ve.init.mw.ArticleTarget: Stop when we fail to load metadata]] T209542 (duration: 00m 48s) [00:57:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:59] T209542: [Regression wmf.4] "Publish Changes" button stays disabled while trying to re-edit a page - https://phabricator.wikimedia.org/T209542 [00:57:59] ^ dmaza MatmaRex VisualEditor changes deployed [00:58:52] thcipriani: thanks! [01:00:04] dmaza: ebernhardson mobilefrontend and wikimediaevents are update on mwdebug1002, check please [01:00:07] MatmaRex: yw :) [01:00:12] looking [01:00:15] thcipriani: checking [01:01:39] thcipriani: looks good [01:02:00] dmaza: ok, mobilefrontend going live [01:02:08] thanks [01:03:36] thcipriani: looks good [01:04:19] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.6/extensions/MobileFrontend: SWAT: [[gerrit:476613|Change config flag for enabling Block Notice stats]] T201719 (duration: 00m 48s) [01:04:29] ^ dmaza live now [01:04:34] ebernhardson: okie doke, going live [01:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:04:37] T201719: Tracking blocks: Log when the mobile web editor block notice is displayed - https://phabricator.wikimedia.org/T201719 [01:04:40] awesome.. tahnk you [01:05:01] have you seen anything else in the logs? are we good? [01:06:11] dmaza: all looks ok to me [01:06:29] maybe a weird hhvm cache state somehow? [01:06:40] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.6/extensions/WikimediaEvents: SWAT: [[gerrit:476777|wbsearchentities ab test improvements]] T209402 (duration: 00m 46s) [01:06:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:06:43] T209402: A/B testing plan for wbsearchentities, context=item - https://phabricator.wikimedia.org/T209402 [01:06:46] ^ ebernhardson live now [01:06:48] yeah it could be [01:07:02] thcipriani: thank you very much.. have a good night [01:07:21] dmaza: you're welcome, you too! [01:11:15] (03PS1) 10RobH: updating the required/blacklist skus [software] - 10https://gerrit.wikimedia.org/r/476792 [01:31:51] (03PS4) 10Bstorm: sonofgridengine: set up shadow_master profile [puppet] - 10https://gerrit.wikimedia.org/r/476430 (https://phabricator.wikimedia.org/T200557) [01:34:32] (03PS2) 10Faidon Liambotis: Update the required/blacklist SKUs [software] - 10https://gerrit.wikimedia.org/r/476792 (owner: 10RobH) [01:36:28] (03CR) 10Faidon Liambotis: [C: 032] Update the required/blacklist SKUs [software] - 10https://gerrit.wikimedia.org/r/476792 (owner: 10RobH) [01:37:34] (03Merged) 10jenkins-bot: Update the required/blacklist SKUs [software] - 10https://gerrit.wikimedia.org/r/476792 (owner: 10RobH) [01:40:18] (03PS3) 10Huji: Dissallow eliminators to block certain groups on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476503 (https://phabricator.wikimedia.org/T210642) [01:51:27] Krenair: I have good news and bad news. And it's the same news: I restarted apache on deployment-puppetmaster03 and everything started working again. [02:04:45] (03PS1) 10RobH: fixed network card options [software] - 10https://gerrit.wikimedia.org/r/476796 [02:09:16] (03PS2) 10Faidon Liambotis: quotereviewer: fix NIC options [software] - 10https://gerrit.wikimedia.org/r/476796 (owner: 10RobH) [02:09:40] (03CR) 10Faidon Liambotis: [C: 032] quotereviewer: fix NIC options [software] - 10https://gerrit.wikimedia.org/r/476796 (owner: 10RobH) [02:10:25] (03Merged) 10jenkins-bot: quotereviewer: fix NIC options [software] - 10https://gerrit.wikimedia.org/r/476796 (owner: 10RobH) [02:12:00] going to push a tiny sampling rate change to an ab test [02:12:26] (03CR) 10EBernhardson: [C: 032] Update wbsearchentities ab test configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476772 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [02:13:33] (03Merged) 10jenkins-bot: Update wbsearchentities ab test configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476772 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [02:17:04] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:476772|update wbsearchentities ab test configuration]] T209402 (duration: 00m 47s) [02:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:17:12] T209402: A/B testing plan for wbsearchentities, context=item - https://phabricator.wikimedia.org/T209402 [02:22:41] (03CR) 10jenkins-bot: Update wbsearchentities ab test configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476772 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [02:26:52] 10Operations, 10ops-codfw: ms-be2047 spontaneous reboots - https://phabricator.wikimedia.org/T209921 (10Papaul) @colewhite Thank you I saw that [02:52:10] (03CR) 10Krinkle: Labs: display reader trust survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476027 (https://phabricator.wikimedia.org/T209882) (owner: 10Bmansurov) [03:29:17] (03PS1) 10Papaul: DNS: Add mgmt and production DNS entries for elastic2037-elastic2044 [dns] - 10https://gerrit.wikimedia.org/r/476802 (https://phabricator.wikimedia.org/T210450) [03:31:20] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 874.88 seconds [03:33:22] (03PS2) 10Papaul: DNS: Add mgmt and production DNS entries for elastic2037-elastic2044 [dns] - 10https://gerrit.wikimedia.org/r/476802 (https://phabricator.wikimedia.org/T210450) [03:46:46] PROBLEM - puppet last run on analytics1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:17:58] RECOVERY - puppet last run on analytics1052 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:25:08] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 214.00 seconds [04:51:58] (03PS1) 10Herron: kafka_shipper: set format=json on message field in rsyslog template [puppet] - 10https://gerrit.wikimedia.org/r/476803 (https://phabricator.wikimedia.org/T205852) [04:53:48] (03CR) 10Herron: [C: 032] kafka_shipper: set format=json on message field in rsyslog template [puppet] - 10https://gerrit.wikimedia.org/r/476803 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [05:00:50] (03PS1) 10Herron: logstash: add mutate for escaped tabs in rsyslog multiline events [puppet] - 10https://gerrit.wikimedia.org/r/476804 (https://phabricator.wikimedia.org/T205852) [05:03:21] (03CR) 10Herron: [C: 032] logstash: add mutate for escaped tabs in rsyslog multiline events [puppet] - 10https://gerrit.wikimedia.org/r/476804 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [05:05:24] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 71, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:05:40] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:17:42] (03PS1) 10Herron: logstash: update gerrit multiline message start regex [puppet] - 10https://gerrit.wikimedia.org/r/476805 (https://phabricator.wikimedia.org/T205852) [06:12:50] !log Deploy schema change on s2 codfw master with replication (db2035), this will generate lag on codfw - T86338 T202167 [06:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:55] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:12:56] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [06:28:02] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:28:08] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.009 second response time [06:29:38] PROBLEM - puppet last run on ms-be1035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/conf.d/00_main] [06:33:14] PROBLEM - puppet last run on mw1300 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/mysql-ps1.sh] [06:35:10] 10Operations, 10DBA, 10Patch-For-Review: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10Marostegui) Ah right, the dbstore1003 and dbstore1005 are the new hosts that will replace dbstore1002 {T210478} [06:37:32] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [06:42:18] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:42:50] (03PS1) 10Marostegui: regex.yaml: Add new pc and new dbstores [puppet] - 10https://gerrit.wikimedia.org/r/476807 (https://phabricator.wikimedia.org/T210486) [06:47:01] (03PS1) 10Marostegui: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476808 (https://phabricator.wikimedia.org/T86338) [06:48:06] <_joe_> sigh, regex.yaml [06:48:26] I know... [06:48:38] <_joe_> that is an example of really bad usage of it btw :P [06:48:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476808 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:48:58] _joe_: I was surprised with that ticket even :( [06:50:11] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476808 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:50:12] <_joe_> I'm not saying you need to fix it now, I'm saying you (plural) should've never used it for such a use [06:50:51] _joe_: Yeah I got that :-) [06:51:22] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 T86338 T202167 (duration: 00m 48s) [06:51:24] It is a mess [06:51:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:27] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:51:28] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [06:51:30] !log Deploy schema change on db1103:3312 T86338 T20216 [06:51:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:34] T20216: Bugzilla landing page should point to mediawiki.org - https://phabricator.wikimedia.org/T20216 [06:51:59] ups! [06:52:06] !log Deploy schema change on db1103:3312 T86338 T202167 [06:52:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:30] 10Operations, 10DBA, 10Patch-For-Review: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10Joe) I would suggest you re-do your query excluding servers with `role::spare::system`, such as most of the cp* boxes and the conf* ones. Actually, it might make sense to put them in a separate... [06:55:05] !log Purge binary logs on pc1005 [06:55:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:06] RECOVERY - puppet last run on mw1300 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:59:18] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476808 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [07:00:38] RECOVERY - puppet last run on ms-be1035 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:08:22] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [07:08:28] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.504 second response time [07:19:12] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476809 [07:25:10] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1103:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476809 (owner: 10Marostegui) [07:26:15] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476809 (owner: 10Marostegui) [07:26:29] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476809 (owner: 10Marostegui) [07:27:14] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 T86338 T202167 (duration: 00m 47s) [07:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:19] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:27:19] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [07:28:13] !log Deploy schema change on db1095:3312 T86338 T202167 [07:28:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:16] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:48:18] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:51:58] (03CR) 10Banyek: [C: 031] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/476807 (https://phabricator.wikimedia.org/T210486) (owner: 10Marostegui) [07:52:06] PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3fullscreenrefresh=1morgId=1 [07:53:40] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [07:54:23] (03CR) 10Marostegui: regex.yaml: Add new pc and new dbstores (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/476807 (https://phabricator.wikimedia.org/T210486) (owner: 10Marostegui) [07:54:52] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [07:54:58] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=uploadvar-status_type=5 [07:55:42] RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3fullscreenrefresh=1morgId=1 [07:55:46] (03PS2) 10Marostegui: regex.yaml: Add new pc and new dbstores [puppet] - 10https://gerrit.wikimedia.org/r/476807 (https://phabricator.wikimedia.org/T210486) [07:57:50] brief 503 spike in esams, affecting both text and upload [08:00:24] (03CR) 10Banyek: [C: 031] regex.yaml: Add new pc and new dbstores [puppet] - 10https://gerrit.wikimedia.org/r/476807 (https://phabricator.wikimedia.org/T210486) (owner: 10Marostegui) [08:00:54] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=uploadvar-status_type=5 [08:11:12] (03CR) 10Marostegui: [C: 032] regex.yaml: Add new pc and new dbstores [puppet] - 10https://gerrit.wikimedia.org/r/476807 (https://phabricator.wikimedia.org/T210486) (owner: 10Marostegui) [08:12:37] !log installing perl security updates [08:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:13:44] (03PS1) 10Ema: ATS: enable XDebug plugin [puppet] - 10https://gerrit.wikimedia.org/r/476810 (https://phabricator.wikimedia.org/T207048) [08:15:02] (03CR) 10jerkins-bot: [V: 04-1] ATS: enable XDebug plugin [puppet] - 10https://gerrit.wikimedia.org/r/476810 (https://phabricator.wikimedia.org/T207048) (owner: 10Ema) [08:18:08] 10Operations, 10Patch-For-Review, 10User-Marostegui: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10Marostegui) >>! In T210486#4785954, @jcrespo wrote: > Adding DBA for the few db hosts that shouldn't be there, remove the tag when those are fixed: > > * New pc* hosts > * New dbsto... [08:21:11] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Legoktm) >>! In T210667#4784225, @Joe wrote: >>>! In T210667#4783582, @Legoktm wrote: >> If it's not re-distributable by us, then it doesn't m... [08:21:26] !log Deploy schema change on dbstore1002 T86338 T202167 [08:21:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:31] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [08:21:32] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [08:40:24] (03PS5) 10Jcrespo: admin: Add jgleeson access to production cluster [puppet] - 10https://gerrit.wikimedia.org/r/476004 (https://phabricator.wikimedia.org/T208432) [08:44:09] (03PS2) 10Ema: ATS: enable XDebug plugin [puppet] - 10https://gerrit.wikimedia.org/r/476810 (https://phabricator.wikimedia.org/T207048) [08:44:12] 10Operations, 10ops-eqiad, 10media-storage: rack/setup/install ms-be10[44-50].eqiad.wmnet - https://phabricator.wikimedia.org/T209618 (10fgiunchedi) >>! In T209618#4786263, @Cmjohnson wrote: > @fgiunchedi For racking this is the space I have > > I can do at least 3 in A with out a problem, > > I can only... [08:44:42] (03PS3) 10Ema: ATS: XDebug plugin support [puppet] - 10https://gerrit.wikimedia.org/r/476810 (https://phabricator.wikimedia.org/T207048) [08:46:38] (03PS4) 10Ema: ATS: XDebug plugin support [puppet] - 10https://gerrit.wikimedia.org/r/476810 (https://phabricator.wikimedia.org/T207048) [08:50:46] 10Operations, 10DBA, 10Patch-For-Review, 10User-Banyek: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10Marostegui) 05Open>03stalled I am stalling this until pc1007 is fixed as it arrived broken (T207258#4774505). Onc... [08:53:00] (03CR) 10Ema: [C: 032] ATS: XDebug plugin support [puppet] - 10https://gerrit.wikimedia.org/r/476810 (https://phabricator.wikimedia.org/T207048) (owner: 10Ema) [08:55:13] 10Operations, 10monitoring, 10User-CDanis: graph server temperature metrics - https://phabricator.wikimedia.org/T209863 (10fgiunchedi) Thanks @CDanis for looking into this! re: `max()` I have an hunch it might be due to having two prometheus servers backing the `prometheus.svc` endpoint in eqiad and codfw. T... [08:55:20] !log removed rutherfordium from debmonitor DB (T210036) [08:55:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:23] T210036: upgrade people.wikimedia.org to stretch (replace rutherfordium with people1001) - https://phabricator.wikimedia.org/T210036 [08:55:32] 10Operations, 10vm-requests, 10Technical-Debt: upgrade people.wikimedia.org to stretch (replace rutherfordium with people1001) - https://phabricator.wikimedia.org/T210036 (10MoritzMuehlenhoff) JFTR, It's better to use the wmf-decommission-host script, it also removes the debmonitor host entry (I fixed that m... [08:57:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476812 (https://phabricator.wikimedia.org/T86338) [08:58:29] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476812 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [08:59:33] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476812 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:00:32] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/476004 (https://phabricator.wikimedia.org/T208432) (owner: 10Jcrespo) [09:00:33] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 T86338 T202167 (duration: 00m 47s) [09:00:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:39] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [09:00:39] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [09:00:41] !log Deploy schema change on db1090:3312 T86338 T202167 [09:00:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:54] 10Operations, 10Gadgets, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), and 4 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) Update after the mediawiki train deployment: * the TKOs are... [09:09:56] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476812 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:18:38] (03PS1) 10Ema: varnish: do not allow X-ATS-Debug to be set from the outside [puppet] - 10https://gerrit.wikimedia.org/r/476814 (https://phabricator.wikimedia.org/T207048) [09:20:48] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1090:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476815 [09:23:06] (03CR) 10DCausse: [cirrus] Add temp clusters but still write to the old ones (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [09:44:29] (03PS1) 10Muehlenhoff: Install Imagemagick policy files for Thumbor [puppet] - 10https://gerrit.wikimedia.org/r/476818 [09:46:38] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1090:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476815 (owner: 10Marostegui) [09:47:41] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1090:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476815 (owner: 10Marostegui) [09:48:47] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 T86338 T202167 (duration: 00m 46s) [09:48:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:52] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [09:48:53] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [09:49:16] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1090:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476815 (owner: 10Marostegui) [09:53:54] (03PS1) 10Filippo Giunchedi: Add restbase201[3-8] cassandra instances [dns] - 10https://gerrit.wikimedia.org/r/476819 (https://phabricator.wikimedia.org/T209615) [09:55:50] (03PS1) 10Ema: ATS: add SystemTap probe for uncacheable responses [puppet] - 10https://gerrit.wikimedia.org/r/476820 (https://phabricator.wikimedia.org/T207048) [09:59:36] (03CR) 10Filippo Giunchedi: [C: 032] Add restbase201[3-8] cassandra instances [dns] - 10https://gerrit.wikimedia.org/r/476819 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [09:59:43] (03PS2) 10Arturo Borrero Gonzalez: openstack: make ::dmz_cidr an array [puppet] - 10https://gerrit.wikimedia.org/r/476567 (https://phabricator.wikimedia.org/T210754) (owner: 10Faidon Liambotis) [09:59:54] (03PS1) 10Elukey: profile::hadoop::firewall::master: add parameter for https ports [puppet] - 10https://gerrit.wikimedia.org/r/476821 [10:00:39] (03PS4) 10Elukey: turnilo: migrate from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/474834 (owner: 10Dzahn) [10:00:42] (03CR) 10jerkins-bot: [V: 04-1] openstack: make ::dmz_cidr an array [puppet] - 10https://gerrit.wikimedia.org/r/476567 (https://phabricator.wikimedia.org/T210754) (owner: 10Faidon Liambotis) [10:01:16] (03CR) 10Elukey: [C: 032] turnilo: migrate from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/474834 (owner: 10Dzahn) [10:04:09] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:04:34] (03CR) 10DCausse: [cirrus] prepare multi-instance services (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [10:04:36] (03CR) 10Elukey: [C: 032] "Only thing that changed:" [puppet] - 10https://gerrit.wikimedia.org/r/474834 (owner: 10Dzahn) [10:05:08] (03PS9) 10DCausse: [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) [10:05:10] (03PS18) 10DCausse: [cirrus] Add temp clusters but still write to the old ones [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [10:05:12] (03PS8) 10DCausse: [cirrus] Start writing to psi & omega [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476271 (https://phabricator.wikimedia.org/T210381) [10:05:14] (03PS8) 10DCausse: [cirrus] Start using replica group settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476272 (https://phabricator.wikimedia.org/T210381) [10:05:16] (03PS10) 10DCausse: [cirrus] Cleanup transitional states [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476273 (https://phabricator.wikimedia.org/T210381) [10:05:18] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/13792/thumbor1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/476818 (owner: 10Muehlenhoff) [10:05:20] (03PS1) 10Michael Große: Perform more PHP constraint checks before falling back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) [10:06:05] (03PS2) 10Ema: ATS: add SystemTap probe for uncacheable responses [puppet] - 10https://gerrit.wikimedia.org/r/476820 (https://phabricator.wikimedia.org/T207048) [10:07:11] (03PS6) 10Jcrespo: admin: Add jgleeson access to production cluster [puppet] - 10https://gerrit.wikimedia.org/r/476004 (https://phabricator.wikimedia.org/T208432) [10:07:29] (03CR) 10Elukey: "Thanks! Left a couple of nits :)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [10:09:31] (03CR) 10Michael Große: "not sure if doubling the number of checks in PHP from 10 to 20 is the right amount. Can one see their distribution somewhere?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [10:10:29] (03CR) 10Gilles: [C: 031] Install Imagemagick policy files for Thumbor [puppet] - 10https://gerrit.wikimedia.org/r/476818 (owner: 10Muehlenhoff) [10:19:43] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:23:01] (03PS1) 10Filippo Giunchedi: restbase: add restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476825 (https://phabricator.wikimedia.org/T209615) [10:25:57] (03CR) 10Filippo Giunchedi: [C: 032] restbase: add restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476825 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [10:26:05] (03PS2) 10Filippo Giunchedi: restbase: add restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476825 (https://phabricator.wikimedia.org/T209615) [10:30:17] 10Operations, 10Wikimedia-Mailing-lists: Post hold because of "invalid headers" in wikimediacz-l - https://phabricator.wikimedia.org/T210223 (10Urbanecm) Thank you a lot! I've removed the filter and the mod bit. We'll try it and let you know. [10:32:22] (03PS1) 10Filippo Giunchedi: hieradata: add rack for restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476827 (https://phabricator.wikimedia.org/T209615) [10:33:14] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add rack for restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476827 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [10:33:16] (03PS3) 10Arturo Borrero Gonzalez: openstack: make ::dmz_cidr an array [puppet] - 10https://gerrit.wikimedia.org/r/476567 (https://phabricator.wikimedia.org/T210754) (owner: 10Faidon Liambotis) [10:33:23] PROBLEM - puppet last run on restbase2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:33:47] restbase2013 is me [10:34:21] (03PS2) 10Elukey: profile::hadoop::firewall::master: add parameter for https ports [puppet] - 10https://gerrit.wikimedia.org/r/476821 [10:36:22] (03CR) 10Elukey: [C: 032] profile::hadoop::firewall::master: add parameter for https ports [puppet] - 10https://gerrit.wikimedia.org/r/476821 (owner: 10Elukey) [10:36:32] (03PS2) 10Filippo Giunchedi: hieradata: add rack for restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476827 (https://phabricator.wikimedia.org/T209615) [10:37:57] (03PS4) 10Arturo Borrero Gonzalez: openstack: make ::dmz_cidr an array [puppet] - 10https://gerrit.wikimedia.org/r/476567 (https://phabricator.wikimedia.org/T210754) (owner: 10Faidon Liambotis) [10:38:14] (03PS5) 10Arturo Borrero Gonzalez: openstack: make ::dmz_cidr an array [puppet] - 10https://gerrit.wikimedia.org/r/476567 (https://phabricator.wikimedia.org/T210754) (owner: 10Faidon Liambotis) [10:39:06] (03PS6) 10Arturo Borrero Gonzalez: openstack: make ::dmz_cidr an array [puppet] - 10https://gerrit.wikimedia.org/r/476567 (https://phabricator.wikimedia.org/T210754) (owner: 10Faidon Liambotis) [10:39:57] (03CR) 10Jcrespo: [C: 032] admin: Add jgleeson access to production cluster [puppet] - 10https://gerrit.wikimedia.org/r/476004 (https://phabricator.wikimedia.org/T208432) (owner: 10Jcrespo) [10:40:05] (03PS7) 10Jcrespo: admin: Add jgleeson access to production cluster [puppet] - 10https://gerrit.wikimedia.org/r/476004 (https://phabricator.wikimedia.org/T208432) [10:42:47] (03CR) 10Arturo Borrero Gonzalez: [C: 032] "This compilation test seems to be OK:" [puppet] - 10https://gerrit.wikimedia.org/r/476567 (https://phabricator.wikimedia.org/T210754) (owner: 10Faidon Liambotis) [10:42:56] (03PS7) 10Arturo Borrero Gonzalez: openstack: make ::dmz_cidr an array [puppet] - 10https://gerrit.wikimedia.org/r/476567 (https://phabricator.wikimedia.org/T210754) (owner: 10Faidon Liambotis) [10:48:45] RECOVERY - puppet last run on restbase2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:50:55] PROBLEM - Restbase root url on restbase2013 is CRITICAL: connect to address 10.192.16.80 and port 7231: Connection refused [10:51:44] (03PS1) 10Ema: ATS: do not add X-Forwarded-For [puppet] - 10https://gerrit.wikimedia.org/r/476828 (https://phabricator.wikimedia.org/T207048) [10:52:45] PROBLEM - cassandra-a CQL 10.192.16.82:9042 on restbase2013 is CRITICAL: connect to address 10.192.16.82 and port 9042: Connection refused [10:52:59] RECOVERY - Restbase root url on restbase2013 is OK: HTTP OK: HTTP/1.1 200 - 16164 bytes in 0.133 second response time [10:54:35] PROBLEM - cassandra-a SSL 10.192.16.82:7001 on restbase2013 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [10:55:35] (03PS1) 10Marostegui: db-eqiad.php: Depool db1105:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476829 (https://phabricator.wikimedia.org/T86338) [10:56:03] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Jupyter notebook / analytics-privatedata-users for jgleeson - https://phabricator.wikimedia.org/T208432 (10jcrespo) [10:57:20] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1105:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476829 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [10:58:21] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1105:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476829 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [10:59:25] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1105:3312 T86338 T202167 (duration: 00m 45s) [10:59:29] !log Deploy schema change on db1105:3312 T86338 T202167 [10:59:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:31] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [10:59:31] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [10:59:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:01:47] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Jupyter notebook / analytics-privatedata-users for jgleeson - https://phabricator.wikimedia.org/T208432 (10jcrespo) 05Open>03Resolved This is now deployed, in around 30 minutes this will be applied to all servers. After that, p... [11:03:11] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1105:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476830 [11:06:44] (03PS2) 10Jcrespo: admin: Add addshore to graphite-admins; allow _graphite commands [puppet] - 10https://gerrit.wikimedia.org/r/476558 (https://phabricator.wikimedia.org/T208750) [11:07:12] (03CR) 10Jcrespo: "This seems to do the right thing:" [puppet] - 10https://gerrit.wikimedia.org/r/476558 (https://phabricator.wikimedia.org/T208750) (owner: 10Jcrespo) [11:07:36] ^ godog not urgent but please a review when you can [11:07:48] !log deploy schema change on dbstore1002 - T85757 [11:07:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:52] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [11:08:32] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1105:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476829 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [11:20:15] can somebody invite me to the security channel? I was disconnected/kicked [11:29:31] 10Operations, 10ops-codfw: Degraded RAID on elastic2021 - https://phabricator.wikimedia.org/T209779 (10jcrespo) @Gehel, @Mathew.onipe I see no action taken here, please coordinate with Papaul in case some hw replacement is needed (or close if it is duplicate). [11:30:23] 10Operations: Filter potentially harmful PostScript commands in Commons upload/thumbor - https://phabricator.wikimedia.org/T210833 (10MoritzMuehlenhoff) [11:35:34] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1105:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476830 (owner: 10Marostegui) [11:36:12] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10jcrespo) @bcampbell I am trying to move this forward, were you able to follow up with legal (or they can tell us) to know what is desired final state, and we will alter... [11:36:49] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476830 (owner: 10Marostegui) [11:38:17] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1105:3312 T86338 T202167 (duration: 00m 47s) [11:38:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:23] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [11:38:24] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [11:44:02] (03PS22) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [11:45:02] (03PS4) 10Zoranzoki21: Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) [11:45:06] (03PS5) 10Zoranzoki21: Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) [11:47:46] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476830 (owner: 10Marostegui) [11:51:52] 10Operations, 10Patch-For-Review, 10User-Marostegui: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10Volans) >>! In T210486#4788234, @Marostegui wrote: > I will leave dbmonitor ones for @volans to decide! Why me? It's not deBmonitor 😜 I guess misc is ok, it seems pointless to me t... [12:03:51] (03CR) 10Lucas Werkmeister (WMDE): "I suspect we could set the limit even higher, somewhere around 100, but it’s probably a good idea to start with smaller steps." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [12:04:16] (03CR) 10Lucas Werkmeister (WMDE): [C: 031] Perform more PHP constraint checks before falling back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [12:06:34] 10Operations, 10SRE-Access-Requests, 10WMDE-Analytics-Engineering, 10Graphite, and 2 others: Requesting access to graphite hosts for addshore - https://phabricator.wikimedia.org/T208750 (10jcrespo) a:05jcrespo>03fgiunchedi Assigning me to prevent this from getting forgotten when I go on vacations, unre... [12:07:37] 10Operations, 10DBA, 10Availability (MediaWiki-MultiDC), 10Performance-Team (Radar): Investigate solutions for MySQL connection pooling - https://phabricator.wikimedia.org/T196378 (10jcrespo) a:05jcrespo>03None We are on discussion to see when #DBAs can move this forward (pending testing proxysql2 pack... [12:13:57] 10Operations, 10Electron-PDFs, 10Proton, 10Epic, and 4 others: [EPIC] New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748 (10phuedx) 05Open>03Resolved Being **bold**. As discussed, this epic only tracks the building of the service and not its deployment (see {T181084}). [12:14:31] 10Operations, 10ops-codfw: Degraded RAID on elastic2021 - https://phabricator.wikimedia.org/T209779 (10Gehel) 05Open>03declined @jcrespo Thanks for the ping. New servers are being racked, this server will be decommissioned in a few days if all goes well (T210450). Let's not do anything. [12:15:06] 10Operations, 10Electron-PDFs, 10Proton, 10Epic, and 4 others: [EPIC] New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748 (10phuedx) [12:17:41] (03PS2) 10Michael Große: Perform more PHP constraint checks before falling back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) [12:20:09] (03CR) 10Lucas Werkmeister (WMDE): [C: 031] Perform more PHP constraint checks before falling back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [12:20:11] (03CR) 10Michael Große: Perform more PHP constraint checks before falling back (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [12:20:43] 10Operations, 10ops-codfw: Degraded RAID on elastic2021 - https://phabricator.wikimedia.org/T209779 (10jcrespo) Cool! that works, too :-D [12:20:53] 10Operations, 10Citoid, 10Services (watching), 10VisualEditor (Current work): Decreased internationalisation of automatic citations as a result of switch to new translation-server - https://phabricator.wikimedia.org/T210806 (10mobrovac) @Mvolz this sounds like a serious consequence for non-english projects... [12:55:10] !log uploaded nodejs 6.11.0~dfsg-1+wmf3+jessie to apt.wikimedia.org/jessie-wikimedia (backporting the current security fixes) [12:55:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:36] 10Operations, 10DBA, 10Patch-For-Review, 10User-Banyek: BBU Fail on dbstore2002 - https://phabricator.wikimedia.org/T208320 (10Banyek) @Marostegui I think we should close this task, as the replication is good, and I doubt if we'll replace that BBU befure decommisioning. [13:07:02] jynus: yup I will! [13:12:40] (03PS4) 10Takidelfin: InitialiseSettings: Remove redundant namespace talks definitions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474372 (https://phabricator.wikimedia.org/T206952) [13:22:37] (03PS1) 10Filippo Giunchedi: hieradata: add restbase20[4-8] racks [puppet] - 10https://gerrit.wikimedia.org/r/476849 (https://phabricator.wikimedia.org/T209615) [13:23:55] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add restbase20[4-8] racks [puppet] - 10https://gerrit.wikimedia.org/r/476849 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [13:24:13] (03PS2) 10Filippo Giunchedi: hieradata: add restbase20[4-8] racks [puppet] - 10https://gerrit.wikimedia.org/r/476849 (https://phabricator.wikimedia.org/T209615) [13:26:30] (03PS1) 10Ladsgroup: ores: Notify ores services when the config changes [puppet] - 10https://gerrit.wikimedia.org/r/476850 (https://phabricator.wikimedia.org/T210719) [13:28:42] 10Operations, 10DBA, 10Patch-For-Review, 10User-Banyek: BBU Fail on dbstore2002 - https://phabricator.wikimedia.org/T208320 (10jcrespo) 05Open>03declined Technically the alerts went away after the restart, lets decline it because we know it is not in a good state and it is likely to reappear, but I agr... [13:33:02] 10Operations, 10DBA, 10Patch-For-Review, 10User-Banyek: BBU Fail on dbstore2002 - https://phabricator.wikimedia.org/T208320 (10Marostegui) Agreed with all you guys said Further, we not only not invest in old hardware but these hosts should go away once we've got the final backups hosts in place [13:39:25] (03CR) 10Bmansurov: Labs: display reader trust survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476027 (https://phabricator.wikimedia.org/T209882) (owner: 10Bmansurov) [13:43:56] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching): rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 (10fgiunchedi) a:05RobH>03fgiunchedi I'll be preparing these hosts for cassandra to be bootstrapped there [13:44:19] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), 10User-fgiunchedi: rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 (10fgiunchedi) [13:57:59] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, I've added Traffic team folks for awareness" [puppet] - 10https://gerrit.wikimedia.org/r/476393 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [14:03:52] (03CR) 10Filippo Giunchedi: "See inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/476431 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [14:04:08] (03CR) 10BBlack: [C: 031] hiera: add cluster definition to recursor role [puppet] - 10https://gerrit.wikimedia.org/r/476393 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [14:06:18] (03CR) 10Filippo Giunchedi: "Ditto this will need explicit cluster declarations like you did for recursor" [puppet] - 10https://gerrit.wikimedia.org/r/476396 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [14:22:02] (03PS1) 10Andrew Bogott: Horizon: move projects to eqiad1-r [puppet] - 10https://gerrit.wikimedia.org/r/476858 (https://phabricator.wikimedia.org/T204745) [14:24:12] (03CR) 10Andrew Bogott: [C: 032] Horizon: move projects to eqiad1-r [puppet] - 10https://gerrit.wikimedia.org/r/476858 (https://phabricator.wikimedia.org/T204745) (owner: 10Andrew Bogott) [14:24:27] (03PS1) 10Arturo Borrero Gonzalez: openstack: dnsleaks.py: respect PTR records for .svc.eqiad.wmflabs too [puppet] - 10https://gerrit.wikimedia.org/r/476859 [14:25:01] (03CR) 10jerkins-bot: [V: 04-1] openstack: dnsleaks.py: respect PTR records for .svc.eqiad.wmflabs too [puppet] - 10https://gerrit.wikimedia.org/r/476859 (owner: 10Arturo Borrero Gonzalez) [14:31:13] (03PS2) 10Arturo Borrero Gonzalez: openstack: dnsleaks.py: respect PTR records for .svc.eqiad.wmflabs too [puppet] - 10https://gerrit.wikimedia.org/r/476859 [14:35:00] (03PS3) 10Arturo Borrero Gonzalez: openstack: dnsleaks.py: respect PTR records for .svc.eqiad.wmflabs too [puppet] - 10https://gerrit.wikimedia.org/r/476859 [14:38:14] (03PS1) 10Vgutierrez: lists: Deploy the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476860 (https://phabricator.wikimedia.org/T207050) [14:39:14] (03PS1) 10Elukey: admin: update flemmerich's email contact [puppet] - 10https://gerrit.wikimedia.org/r/476861 [14:42:07] (03CR) 10Vgutierrez: "pcc is happy https://puppet-compiler.wmflabs.org/compiler1002/13796/ and shows the private keys being deployed with the expected permissio" [puppet] - 10https://gerrit.wikimedia.org/r/476860 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:42:40] (03CR) 10Elukey: [C: 032] admin: update flemmerich's email contact [puppet] - 10https://gerrit.wikimedia.org/r/476861 (owner: 10Elukey) [14:43:52] (03CR) 10Andrew Bogott: [C: 031] openstack: dnsleaks.py: respect PTR records for .svc.eqiad.wmflabs too [puppet] - 10https://gerrit.wikimedia.org/r/476859 (owner: 10Arturo Borrero Gonzalez) [14:49:55] (03PS1) 10Elukey: profile::druid::turnilo: remove mod_proxy_html (not needed) [puppet] - 10https://gerrit.wikimedia.org/r/476867 [14:51:57] (03PS1) 10Filippo Giunchedi: cassandra: create hints directory [puppet] - 10https://gerrit.wikimedia.org/r/476868 (https://phabricator.wikimedia.org/T209615) [14:52:37] (03CR) 10jerkins-bot: [V: 04-1] cassandra: create hints directory [puppet] - 10https://gerrit.wikimedia.org/r/476868 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [14:54:53] (03PS2) 10Filippo Giunchedi: cassandra: create hints directory [puppet] - 10https://gerrit.wikimedia.org/r/476868 (https://phabricator.wikimedia.org/T209615) [14:56:26] (03CR) 10Elukey: [C: 032] profile::druid::turnilo: remove mod_proxy_html (not needed) [puppet] - 10https://gerrit.wikimedia.org/r/476867 (owner: 10Elukey) [14:58:44] PROBLEM - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is CRITICAL: 149 ge 130 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1panelId=2fullscreen [14:59:48] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) p:05Triage>03High [15:00:43] hi [15:00:58] Just noted some warnings about more failed login attempts [15:01:17] AndyRussG: o/ - I know that you guys are busy but if you have time today can you check T203669 ? :) [15:01:18] T203669: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 [15:01:20] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/476868 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [15:01:29] elukey: hi! [15:01:41] I know you can't reveal too much about onging incidents but is there anything you can say? [15:02:57] elukey: I did start looking at it. For some reason the test turnilo link that joal sent didn't seem to have very consistent data, but i didn't get a chance to dig properly... In theory the same time periord and parameters should give the same result, or at least quite close, for both the new Druid dataset and the old one via the old pipeline, when comparing both on normalized count [15:03:36] elukey: really briefly, the other comment that came to mind is that, in the new version, there are some event properties included that don't make much sense to put into Druid [15:03:56] elukey: thanks so much for your work on this!!!!!!!!! I can try to look into it with more detail and comment better a bit later [15:04:14] Hi, can anyone to resubmit this https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/FileImporter/+/476852/ [15:04:24] (03PS1) 10Vgutierrez: lists: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476869 (https://phabricator.wikimedia.org/T207050) [15:04:46] (03CR) 10Filippo Giunchedi: [C: 031] "PCC https://puppet-compiler.wmflabs.org/compiler1002/13798/" [puppet] - 10https://gerrit.wikimedia.org/r/476868 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [15:06:09] (03PS2) 10Vgutierrez: lists: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476869 (https://phabricator.wikimedia.org/T207050) [15:07:56] AndyRussG: ah yes we have put "all" we had, but it can be reduced.. bare in mind that we only have the "real time" job for the moment, the batch one still needs some work [15:08:07] but you guys should be able to get good data out of it [15:08:50] (03PS2) 10Herron: logstash: update gerrit multiline message start regex [puppet] - 10https://gerrit.wikimedia.org/r/476805 (https://phabricator.wikimedia.org/T205852) [15:10:10] (03PS1) 10Vgutierrez: lists: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/476872 (https://phabricator.wikimedia.org/T207050) [15:10:12] (03CR) 10Herron: [C: 032] logstash: update gerrit multiline message start regex [puppet] - 10https://gerrit.wikimedia.org/r/476805 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [15:11:23] (03CR) 10Filippo Giunchedi: [C: 032] cassandra: create hints directory [puppet] - 10https://gerrit.wikimedia.org/r/476868 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [15:11:32] (03PS3) 10Filippo Giunchedi: cassandra: create hints directory [puppet] - 10https://gerrit.wikimedia.org/r/476868 (https://phabricator.wikimedia.org/T209615) [15:14:44] RECOVERY - cassandra-a SSL 10.192.16.82:7001 on restbase2013 is OK: SSL OK - Certificate restbase2013-a valid until 2020-11-29 09:26:04 +0000 (expires in 729 days) [15:16:55] that's a lie [15:21:07] (03CR) 10Vgutierrez: [C: 032] lists: Deploy the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476860 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:21:16] (03PS2) 10Vgutierrez: lists: Deploy the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476860 (https://phabricator.wikimedia.org/T207050) [15:22:55] !log uploaded nodejs 6.11.0~dfsg-1+wmf4+jessie to apt.wikimedia.org/jessie-wikimedia (fixes a dependency compared to the initial jessie update) [15:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:59] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10faidon) So I think this task raises a few different issues (and @legoktm correct me if I'm wrong): 1. Legal concerns about using this particul... [15:26:32] 10Operations, 10Wikimedia-Logstash: Ship Grafana server logs to ELK - https://phabricator.wikimedia.org/T210846 (10herron) [15:27:00] 10Operations, 10Wikimedia-Logstash: Ship Grafana server logs to ELK - https://phabricator.wikimedia.org/T210846 (10herron) [15:27:03] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [15:27:10] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [15:27:11] (03PS3) 10Vgutierrez: lists: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476869 (https://phabricator.wikimedia.org/T207050) [15:27:13] (03PS2) 10Vgutierrez: lists: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/476872 (https://phabricator.wikimedia.org/T207050) [15:27:15] (03PS1) 10Vgutierrez: lists: Fix the group name in certcentral::cert [puppet] - 10https://gerrit.wikimedia.org/r/476874 (https://phabricator.wikimedia.org/T207050) [15:27:30] (03PS1) 10Herron: grafana: ship grafana-server syslogs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476875 (https://phabricator.wikimedia.org/T205852) [15:27:43] (03CR) 10Vgutierrez: [C: 032] lists: Fix the group name in certcentral::cert [puppet] - 10https://gerrit.wikimedia.org/r/476874 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:29:19] !log bootstrapping restbase2013-a -- T210843 [15:29:20] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/centralcerts/lists.rsa-2048.key],File[/etc/centralcerts/lists.ec-prime256v1.key] [15:29:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:23] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [15:29:34] ^^that's me (fermium crash) [15:29:56] L8 issue setting the group ownership for certcentral::cert [15:29:56] it should be fixed now [15:34:26] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:36:56] (03PS4) 10Vgutierrez: lists: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476869 (https://phabricator.wikimedia.org/T207050) [15:36:58] (03PS3) 10Vgutierrez: lists: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/476872 (https://phabricator.wikimedia.org/T207050) [15:37:00] (03PS1) 10Vgutierrez: certcentral: Fix /etc/centralcerts permissions [puppet] - 10https://gerrit.wikimedia.org/r/476877 (https://phabricator.wikimedia.org/T207050) [15:37:44] (03CR) 10Alex Monk: [C: 031] certcentral: Fix /etc/centralcerts permissions [puppet] - 10https://gerrit.wikimedia.org/r/476877 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:38:36] 10Operations, 10DBA, 10monitoring, 10Patch-For-Review: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896 (10jcrespo) 05Open>03stalled [15:39:00] (03PS2) 10Herron: grafana: ship grafana-server syslogs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476875 (https://phabricator.wikimedia.org/T205852) [15:39:22] 10Operations, 10ops-codfw, 10Traffic, 10Patch-For-Review: lvs2006 crashed into (what it seems) an unrecoverable state - https://phabricator.wikimedia.org/T209337 (10Vgutierrez) 05Open>03stalled a:05Papaul>03None lvs2010 replacement is currently blocked by T203194 [15:40:45] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1002/13802/" [puppet] - 10https://gerrit.wikimedia.org/r/476875 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [15:40:48] (03CR) 10Herron: [C: 032] grafana: ship grafana-server syslogs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476875 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [15:40:53] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Ship Grafana server logs to ELK - https://phabricator.wikimedia.org/T210846 (10jcrespo) a:03herron You seem to be working on this, do you mind if I assign it to you (you can unclaim it if you want, later), that way it is clear someone is actively work... [15:41:33] (03CR) 10Vgutierrez: [C: 032] certcentral: Fix /etc/centralcerts permissions [puppet] - 10https://gerrit.wikimedia.org/r/476877 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:41:35] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Ship Grafana server logs to ELK - https://phabricator.wikimedia.org/T210846 (10herron) >>! In T210846#4789214, @jcrespo wrote: > You seem to be working on this, do you mind if I assign it to you (you can unclaim it if you want, later), that way it is cl... [15:41:41] (03PS2) 10Vgutierrez: certcentral: Fix /etc/centralcerts permissions [puppet] - 10https://gerrit.wikimedia.org/r/476877 (https://phabricator.wikimedia.org/T207050) [15:42:46] 10Operations, 10Wikimedia-Mailing-lists, 10User-Urbanecm: Post hold because of "invalid headers" in wikimediacz-l - https://phabricator.wikimedia.org/T210223 (10jcrespo) a:03Urbanecm Assigning it to you- unclaim it if it doesn't work and need more help, or close it at a later time if the fix works. [15:48:35] 10Operations, 10ORES, 10Scoring-platform-team: ORES 500s since 2018-11-29 6:25 - https://phabricator.wikimedia.org/T210701 (10jcrespo) Related to T210610 or T210575, or nothing to do? CC @Ladsgroup [15:49:56] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [15:50:00] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Ship Grafana server logs to ELK - https://phabricator.wikimedia.org/T210846 (10herron) 05Open>03Resolved Grafana-server syslogs are now flowing into logstash [15:50:13] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [15:50:21] 10Operations, 10ORES, 10Scoring-platform-team: ORES 500s since 2018-11-29 6:25 - https://phabricator.wikimedia.org/T210701 (10Ladsgroup) That was the incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/20181129-ores It's not related to T210610 or T210575. They are different issues but with... [15:50:42] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [15:51:19] 10Operations, 10ORES, 10Scoring-platform-team: ORES 500s since 2018-11-29 6:25 - https://phabricator.wikimedia.org/T210701 (10jcrespo) Thank you, then I guess this can be closed as resolved, or I will let you handle it as you prefer. [15:53:21] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Platonides) Well, I don't think it even needs to be treated in a private task. There was a situation about how to interpret the rules / how mu... [15:54:55] 10Operations, 10cloud-services-team: WMCS-related dashboards using Diamond metrics - https://phabricator.wikimedia.org/T210850 (10MoritzMuehlenhoff) [15:58:11] 10Operations, 10ORES, 10Scoring-platform-team (Current), 10User-Ladsgroup: ORES 500s since 2018-11-29 6:25 - https://phabricator.wikimedia.org/T210701 (10Ladsgroup) 05Open>03Resolved a:03Ladsgroup [15:59:20] 10Operations, 10cloud-services-team (Kanban): WMCS-related dashboards using Diamond metrics - https://phabricator.wikimedia.org/T210850 (10Bstorm) [15:59:43] 10Operations, 10cloud-services-team (Kanban): WMCS-related dashboards using Diamond metrics - https://phabricator.wikimedia.org/T210850 (10Bstorm) p:05Triage>03Normal [16:04:54] (03PS1) 10Dmaza: Enable Block notice stats on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476882 (https://phabricator.wikimedia.org/T210452) [16:06:28] (03PS1) 10Muehlenhoff: Absent Redis Diamond collector on mc* servers [puppet] - 10https://gerrit.wikimedia.org/r/476883 [16:07:14] (03CR) 10Hashar: "I have added both in the same commit since they are closely related: pass some extra options to docker build." [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475843 (https://phabricator.wikimedia.org/T210438) (owner: 10Hashar) [16:07:20] (03CR) 10jerkins-bot: [V: 04-1] Absent Redis Diamond collector on mc* servers [puppet] - 10https://gerrit.wikimedia.org/r/476883 (owner: 10Muehlenhoff) [16:11:07] (03PS1) 10Dmaza: Enable Partial Blocks on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476884 (https://phabricator.wikimedia.org/T210444) [16:12:58] (03CR) 10Hashar: "Thank you! Whenever the new keyholder Debian package is published that will make cumin way faster on deployment-prep and integration WMCS " [software/keyholder] - 10https://gerrit.wikimedia.org/r/476424 (https://phabricator.wikimedia.org/T204681) (owner: 10Faidon Liambotis) [16:19:45] (03PS1) 10Mforns: Correct escape chars of EL sanitization in analytics data_purge.pp [puppet] - 10https://gerrit.wikimedia.org/r/476886 (https://phabricator.wikimedia.org/T202429) [16:21:24] (03PS1) 10Papaul: DHCP: Add MAC address entries for elastic2037-elastic2044 [puppet] - 10https://gerrit.wikimedia.org/r/476887 (https://phabricator.wikimedia.org/T210450) [16:22:55] (03PS2) 10Muehlenhoff: Absent Redis Diamond collector on mc* servers [puppet] - 10https://gerrit.wikimedia.org/r/476883 (https://phabricator.wikimedia.org/T183454) [16:23:43] lterm mw [16:23:49] oooops, sorry, disregard [16:24:53] (03PS1) 10Filippo Giunchedi: hieradata: add cassandra jbod device for new restbase hosts [puppet] - 10https://gerrit.wikimedia.org/r/476888 (https://phabricator.wikimedia.org/T209615) [16:26:03] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add cassandra jbod device for new restbase hosts [puppet] - 10https://gerrit.wikimedia.org/r/476888 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [16:26:28] (03PS1) 10Herron: logstash: ship logstash server logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476890 (https://phabricator.wikimedia.org/T205852) [16:26:33] (03PS5) 10Bstorm: sonofgridengine: set up shadow_master profile [puppet] - 10https://gerrit.wikimedia.org/r/476430 (https://phabricator.wikimedia.org/T200557) [16:29:27] (03CR) 10Bstorm: [C: 032] sonofgridengine: set up shadow_master profile [puppet] - 10https://gerrit.wikimedia.org/r/476430 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [16:29:59] (03CR) 10Bstorm: [C: 032] sonofgridengine: set up shadow_master profile [puppet] - 10https://gerrit.wikimedia.org/r/476430 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [16:33:27] (03CR) 10Herron: "PCC looks good https://puppet-compiler.wmflabs.org/compiler1002/13804/" [puppet] - 10https://gerrit.wikimedia.org/r/476890 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [16:33:48] (03PS1) 10Ladsgroup: [WIP] redis: Add redis::sentinel class [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) [16:34:22] (03CR) 10jerkins-bot: [V: 04-1] [WIP] redis: Add redis::sentinel class [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [16:37:00] (03PS1) 10Filippo Giunchedi: hieradata: add sde4 to restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476892 (https://phabricator.wikimedia.org/T209615) [16:37:27] (03PS1) 10Papaul: Partman: ADD elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/476893 (https://phabricator.wikimedia.org/T210450) [16:39:29] (03PS2) 10Ladsgroup: [WIP] redis: Add redis::sentinel class [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) [16:39:50] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/476892 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [16:40:03] (03CR) 10jerkins-bot: [V: 04-1] [WIP] redis: Add redis::sentinel class [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [16:40:07] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add sde4 to restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476892 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [16:40:14] (03PS2) 10Filippo Giunchedi: hieradata: add sde4 to restbase2013 [puppet] - 10https://gerrit.wikimedia.org/r/476892 (https://phabricator.wikimedia.org/T209615) [16:40:21] (03PS2) 10Herron: logstash: ship logstash server logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476890 (https://phabricator.wikimedia.org/T205852) [16:41:23] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [16:42:06] (03PS3) 10Herron: logstash: ship logstash server logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476890 (https://phabricator.wikimedia.org/T205852) [16:43:30] (03PS1) 10Filippo Giunchedi: hieradata: fix sde data directory path [puppet] - 10https://gerrit.wikimedia.org/r/476895 (https://phabricator.wikimedia.org/T209615) [16:43:37] (03CR) 10Herron: [C: 032] logstash: ship logstash server logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476890 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [16:44:17] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: fix sde data directory path [puppet] - 10https://gerrit.wikimedia.org/r/476895 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [16:44:25] (03PS2) 10Filippo Giunchedi: hieradata: fix sde data directory path [puppet] - 10https://gerrit.wikimedia.org/r/476895 (https://phabricator.wikimedia.org/T209615) [16:44:27] (03CR) 10Eevans: [C: 031] "(still )LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/476895 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [16:44:57] (03CR) 10jerkins-bot: [V: 04-1] hieradata: fix sde data directory path [puppet] - 10https://gerrit.wikimedia.org/r/476895 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [16:51:27] (03CR) 10Aezell: [C: 031] Enable Partial Blocks on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476884 (https://phabricator.wikimedia.org/T210444) (owner: 10Dmaza) [16:52:33] (03CR) 10Aezell: [C: 031] Enable Block notice stats on itwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476882 (https://phabricator.wikimedia.org/T210452) (owner: 10Dmaza) [16:52:46] (03PS3) 10Gehel: DNS: Add mgmt and production DNS entries for elastic2037-elastic2044 [dns] - 10https://gerrit.wikimedia.org/r/476802 (https://phabricator.wikimedia.org/T210450) (owner: 10Papaul) [16:54:06] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10bcampbell) @jcrespo It looks like @Dzahn added @RStallman-legalteam to the thread for Legal's input, but I can follow-up with legal via email if necessary. This request... [16:56:12] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [16:57:40] (03PS4) 10Gehel: DNS: Add mgmt and production DNS entries for elastic2037-elastic2044 [dns] - 10https://gerrit.wikimedia.org/r/476802 (https://phabricator.wikimedia.org/T210450) (owner: 10Papaul) [17:01:20] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [17:10:12] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, and 2 others: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata) [17:14:08] 10Operations, 10cloud-services-team (Kanban): WMCS-related dashboards using Diamond metrics - https://phabricator.wikimedia.org/T210850 (10GTirloni) @MoritzMuehlenhoff thanks for this. Could you give an example of one metric in one dashboard that is using a diamond and what would be the equivalent in prometheu... [17:14:10] (03CR) 10Gehel: [C: 032] DNS: Add mgmt and production DNS entries for elastic2037-elastic2044 [dns] - 10https://gerrit.wikimedia.org/r/476802 (https://phabricator.wikimedia.org/T210450) (owner: 10Papaul) [17:17:37] (03PS2) 10Gehel: DHCP: Add MAC address entries for elastic2037-elastic2044 [puppet] - 10https://gerrit.wikimedia.org/r/476887 (https://phabricator.wikimedia.org/T210450) (owner: 10Papaul) [17:18:42] (03CR) 10Gehel: [C: 032] DHCP: Add MAC address entries for elastic2037-elastic2044 [puppet] - 10https://gerrit.wikimedia.org/r/476887 (https://phabricator.wikimedia.org/T210450) (owner: 10Papaul) [17:19:04] (03PS2) 10Gehel: Partman: ADD elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/476893 (https://phabricator.wikimedia.org/T210450) (owner: 10Papaul) [17:19:54] (03CR) 10Gehel: [C: 032] Partman: ADD elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/476893 (https://phabricator.wikimedia.org/T210450) (owner: 10Papaul) [17:24:11] (03CR) 10Cwhite: [C: 031] Absent Redis Diamond collector on mc* servers [puppet] - 10https://gerrit.wikimedia.org/r/476883 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [17:25:52] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10chasemp) Thanks @faidon for weighing in, I think you got right to the heart of it. Not responding to you necessarily but I'm going to steal... [17:29:04] 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-Josve05a: Specific revisions of multiple files missing from Swift - 404 Not Found returned - https://phabricator.wikimedia.org/T124101 (10AlexisJazz) @Incnis_Mrsi at T198177 I also found some missing revisions. But the things that take me hou... [17:36:39] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10faidon) >>! In T210667#4789588, @chasemp wrote: > In this case specifically, my thinking was that I had agreement and understanding with anoth... [17:40:03] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10chasemp) >>! In T210667#4789604, @faidon wrote: >>>! In T210667#4789588, @chasemp wrote: >> In this case specifically, my thinking was that I... [17:50:55] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, and 2 others: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata) Hm. They are cattle, but it would probably be nice if the whole node doesn't go down if we lose a drive, and we can deal wit... [17:51:03] (03PS3) 10Ladsgroup: [WIP] redis: Add redis::sentinel class [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) [17:51:37] (03CR) 10jerkins-bot: [V: 04-1] [WIP] redis: Add redis::sentinel class [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [17:52:53] (03PS4) 10Ladsgroup: [WIP] redis: Add redis::sentinel class [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) [17:57:21] 10Operations, 10ops-codfw, 10Services (watching), 10User-fgiunchedi: Reconfigure hardware and reimage restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T210863 (10Eevans) p:05Triage>03High [17:57:36] 10Operations, 10ops-codfw, 10Services (watching), 10User-Eevans, 10User-fgiunchedi: Reconfigure hardware and reimage restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T210863 (10Eevans) [18:00:42] (03PS1) 10Bstorm: sonofgridengine: remove useless comment and fix the shadow profile [puppet] - 10https://gerrit.wikimedia.org/r/476902 (https://phabricator.wikimedia.org/T200557) [18:04:59] 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-Josve05a: Specific revisions of multiple files missing from Swift - 404 Not Found returned - https://phabricator.wikimedia.org/T124101 (10Tgr) >>! In T124101#4789594, @AlexisJazz wrote: > But the things that take me hours could be done in sec... [18:14:46] paladox: we talked about this but I can't find it in the interface so I'm wondering if I remember right -- the ability to create/look-at dashboards in the new gerrit, do we already have it in our systems? I know you said it's out in the new gerrit ui, but I don't remember if it's the version we're already using right now in our system [18:15:13] mooeypoo it's in 2.16, we are currently using 2.15. [18:15:41] example: https://gerrit-review.googlesource.com/admin/repos/gerrit,dashboards [18:15:46] oh oh okay [18:15:51] yeah I thought I remember that and wasn't sure [18:16:01] I was going to experiment but I'll wait 'till it's in our systems [18:16:04] 10Operations, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10Bstorm) So labstore1007 has a current certificate for dumps.wikimedia.org: ` Validity Not Before: Nov 28 13:59:40 2018 GMT Not After : Feb 26 13:59:40 2019... [18:16:09] ok :) [18:16:20] we will be able to make our own dashboard, right? [18:16:36] the list in that link looks like default ones, but I assume we could also set our own queries/rules [18:19:36] (03PS1) 10Bstorm: dumps distribution: fail over dumps to labstore1007 for upgrades [dns] - 10https://gerrit.wikimedia.org/r/476903 (https://phabricator.wikimedia.org/T207377) [18:19:40] thcipriani, what are your thoughts about us doing a parsoid deploy today to address a crasher from wed's deploy? [18:20:23] about 1 VE edit every 15 mins seem to have been affected. [18:20:35] is that still ongoing? [18:20:43] the 1/15minutes? [18:20:44] it is not a big number, so we can do it monday too but if it isn't a big deal for you, we can as well stop it. [18:20:46] yes. [18:21:06] it will till we deploy the fix. [18:21:21] mooeypoo you can already do that in 2.15 :) [18:21:36] (just you can only use the old ui to view the dashboard) [18:22:02] paladox: yeah that I know, I have those already and use them a lot, that's why I'm waiting so badly for the option to use them in the new UI [18:22:15] ah ok :) [18:23:47] (03CR) 10Dzahn: hadoop::ui: migrate from apache to httpd module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [18:24:03] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10elukey) Reading the backlog only now, this was good learning lesson for me too (I was aware of what Chase did as mentioned, and didn't think t... [18:24:13] 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-Josve05a: Specific revisions of multiple files missing from Swift - 404 Not Found returned - https://phabricator.wikimedia.org/T124101 (10AlexisJazz) >>! In T124101#4789680, @Tgr wrote: >>>! In T124101#4789594, @AlexisJazz wrote: >> But the t... [18:24:51] subbu: as long as fix is small and has been verified on beta, it's probably a good thing to fix sooner rather than later. [18:26:41] (03CR) 10Bstorm: [C: 032] dumps distribution: fail over dumps to labstore1007 for upgrades [dns] - 10https://gerrit.wikimedia.org/r/476903 (https://phabricator.wikimedia.org/T207377) (owner: 10Bstorm) [18:26:46] ok. thanks. looks like fewer instances today than y'day. [18:27:18] (03CR) 10Dzahn: "thank you for merging this and the follow-up about mod_proxy :)" [puppet] - 10https://gerrit.wikimedia.org/r/476867 (owner: 10Elukey) [18:51:22] (03PS1) 10Dzahn: superset: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476907 [18:51:54] (03CR) 10jerkins-bot: [V: 04-1] superset: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476907 (owner: 10Dzahn) [18:51:57] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/476907 (owner: 10Dzahn) [18:52:48] (03PS2) 10Dzahn: superset: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476907 [18:52:56] (03CR) 10jerkins-bot: [V: 04-1] superset: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476907 (owner: 10Dzahn) [18:52:58] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/476907 (owner: 10Dzahn) [18:53:42] ^ check experimental always says it failed.. even when it doesn't [18:55:39] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/33/analytics-tool1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/476907 (owner: 10Dzahn) [18:57:58] mutante: I think it's looking for host "," [18:59:05] thcipriani: oh, like .. it expects always multiple hosts ? [18:59:15] next time i will try just appending a , at the end [18:59:16] thanks [18:59:36] i only have a single host in these cases [19:00:42] I think it might be a bug with :https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/jjb/operations-puppet-catalog-compiler.yml#10 [19:01:24] aha! :) [19:01:25] so you have a single host, it ends in a newline, so nodes is: "analytics-tool1003.eqiad.wmnet, " [19:01:58] then you get: > ERROR: Unable to find facts for host , skipping [19:02:02] (03PS3) 10Dzahn: superset: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476907 [19:02:09] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/476907 (owner: 10Dzahn) [19:04:40] it was succesful when i use "Hosts: analytics-tool1003.eqiad.wmnet," [19:04:44] with the , [19:05:20] though that makes it ",," with grep 'Hosts: ' | sed 's/Hosts: //g' | tr '\n' ', ' [19:05:47] thanks, it works :) [19:07:55] heh, weird [19:25:31] (03CR) 10Bstorm: [C: 032] sonofgridengine: remove useless comment and fix the shadow profile [puppet] - 10https://gerrit.wikimedia.org/r/476902 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [19:31:39] (03CR) 10Cwhite: [C: 032] "Changes look good. Proceeding. https://puppet-compiler.wmflabs.org/compiler1002/13805/" [puppet] - 10https://gerrit.wikimedia.org/r/476393 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [19:31:41] (03PS4) 10Cwhite: hiera: add cluster definition to recursor role [puppet] - 10https://gerrit.wikimedia.org/r/476393 (https://phabricator.wikimedia.org/T210486) [19:34:34] gehel or ebernhardson, do you know if SMalyshev is working today? I'm hoping to get an answer to T210772 and I suspect that only he knows [19:34:34] T210772: wdqs-test.wikidata-query.eqiad.wmflabs is of unknown flavor, needs migrating - https://phabricator.wikimedia.org/T210772 [19:35:09] 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-Josve05a: Specific revisions of multiple files missing from Swift - 404 Not Found returned - https://phabricator.wikimedia.org/T124101 (10Tgr) Swift has filename prefix search but not substring search AFAIK. [19:38:47] andrewbogott: I think you'll have to wait next week [19:38:57] * gehel is having a look, who knows [19:39:17] gehel: ok [19:39:22] thank you for looking! [19:40:37] andrewbogott: I suspect it is now superseeded by the larger SSD based instance whose name I can't remember [19:41:06] (03PS1) 10Eevans: Partman: added 3SSD JBOD config for restbase201[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/476912 (https://phabricator.wikimedia.org/T210863) [19:41:09] but Stas might have experimentation going on with it, so if you can wait Monday, that would be great! [19:41:19] yep, I can wait [19:43:43] (03PS2) 10Cwhite: hiera: add cluster definition to spare role [puppet] - 10https://gerrit.wikimedia.org/r/476396 (https://phabricator.wikimedia.org/T210486) [19:46:48] (03PS3) 10Cwhite: hiera: add cluster definition to spare role [puppet] - 10https://gerrit.wikimedia.org/r/476396 (https://phabricator.wikimedia.org/T210486) [19:47:34] thcipriani, ok, deploying parsoid now. goo dtime? [19:48:21] subbu: sure [19:48:26] k [19:50:57] !log ssastry@deploy1001 Started deploy [parsoid/deploy@9981ddf]: Update Parsoid to 310edecd (deploy-20181130 branch) [19:50:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:26] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Legoktm) >>! In T210667#4789103, @faidon wrote: > So I think this task raises a few different issues (and @legoktm correct me if I'm wrong): >... [19:54:59] (03PS1) 10Bstorm: sonofgridengine: move systemd file to correct location [puppet] - 10https://gerrit.wikimedia.org/r/476914 (https://phabricator.wikimedia.org/T200557) [19:55:27] (03PS1) 10Eevans: hieradata: reconfigure restbase2013 for 3-SSD JBOD [puppet] - 10https://gerrit.wikimedia.org/r/476915 (https://phabricator.wikimedia.org/T210863) [19:58:10] (03PS1) 10Dzahn: kibana: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476916 [19:58:43] (03CR) 10jerkins-bot: [V: 04-1] kibana: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476916 (owner: 10Dzahn) [19:59:53] 10Operations, 10ops-codfw: Degraded RAID on restbase2013 - https://phabricator.wikimedia.org/T210877 (10ops-monitoring-bot) [20:01:00] PROBLEM - MD RAID on restbase2014 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 [20:01:05] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T210878 (10ops-monitoring-bot) [20:01:24] PROBLEM - MD RAID on restbase2015 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 [20:01:29] 10Operations, 10ops-codfw: Degraded RAID on restbase2015 - https://phabricator.wikimedia.org/T210879 (10ops-monitoring-bot) [20:01:44] (03PS2) 10Dzahn: kibana: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476916 [20:02:02] PROBLEM - MD RAID on restbase2017 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 [20:02:08] 10Operations, 10ops-codfw: Degraded RAID on restbase2017 - https://phabricator.wikimedia.org/T210880 (10ops-monitoring-bot) [20:02:20] PROBLEM - MD RAID on restbase2018 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 [20:02:25] 10Operations, 10ops-codfw: Degraded RAID on restbase2018 - https://phabricator.wikimedia.org/T210881 (10ops-monitoring-bot) [20:02:28] PROBLEM - MD RAID on restbase2016 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 [20:02:34] 10Operations, 10ops-codfw: Degraded RAID on restbase2016 - https://phabricator.wikimedia.org/T210882 (10ops-monitoring-bot) [20:02:35] !log ssastry@deploy1001 Finished deploy [parsoid/deploy@9981ddf]: Update Parsoid to 310edecd (deploy-20181130 branch) (duration: 11m 38s) [20:02:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:37] (03CR) 10Bstorm: [C: 032] sonofgridengine: move systemd file to correct location [puppet] - 10https://gerrit.wikimedia.org/r/476914 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [20:03:52] ACKNOWLEDGEMENT - MD RAID on restbase2014 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 daniel_zahn https://phabricator.wikimedia.org/T209615 [20:03:52] ACKNOWLEDGEMENT - MD RAID on restbase2015 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 daniel_zahn https://phabricator.wikimedia.org/T209615 [20:03:52] ACKNOWLEDGEMENT - MD RAID on restbase2016 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 daniel_zahn https://phabricator.wikimedia.org/T209615 [20:03:52] ACKNOWLEDGEMENT - MD RAID on restbase2017 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 daniel_zahn https://phabricator.wikimedia.org/T209615 [20:03:52] ACKNOWLEDGEMENT - MD RAID on restbase2018 is CRITICAL: CRITICAL: State: degraded, Active: 10, Working: 10, Failed: 0, Spare: 0 daniel_zahn https://phabricator.wikimedia.org/T209615 [20:04:36] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), and 2 others: Reconfigure hardware and reimage restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T210863 (10Papaul) @fgiunchedi @Eevans I removed 2 SSD's from each server. [20:04:57] fwiw.. is it weird that those checks say the RAID is degraded but also "10/10 working" [20:05:13] normally it's always "Failed > 0" if it degrades [20:06:45] mutante: see also all the emails we got [20:06:46] should have linked https://phabricator.wikimedia.org/T210863 instead [20:08:15] papaul: ^ [20:08:28] volans: i see. more than one RAID [20:09:26] mutante: yes [20:09:39] mutante: can you please setup downtime [20:09:44] for those servers [20:09:50] thank you [20:12:54] papaul: done. but the default is only 2 hours. should i do until next week? [20:13:03] 10Operations, 10ops-codfw: Degraded RAID on restbase2018 - https://phabricator.wikimedia.org/T210881 (10jijiki) 05Open>03Resolved a:03jijiki @Papaul removed 2 SSDs from each restbase server T210863#4790081 [20:13:28] 10Operations, 10ops-codfw: Degraded RAID on restbase2016 - https://phabricator.wikimedia.org/T210882 (10jijiki) 05Open>03Resolved a:03jijiki @Papaul removed 2 SSDs from each restbase server T210863#4790081 [20:13:37] 10Operations, 10ops-codfw: Degraded RAID on restbase2017 - https://phabricator.wikimedia.org/T210880 (10jijiki) 05Open>03Resolved a:03jijiki @Papaul removed 2 SSDs from each restbase server T210863#4790081 [20:13:52] 10Operations, 10ops-codfw: Degraded RAID on restbase2015 - https://phabricator.wikimedia.org/T210879 (10jijiki) 05Open>03Resolved a:03jijiki @Papaul removed 2 SSDs from each restbase server T210863#4790081 [20:14:01] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T210878 (10jijiki) 05Open>03Resolved a:03jijiki @Papaul removed 2 SSDs from each restbase server T210863#4790081 [20:14:23] 10Operations, 10ops-codfw: Degraded RAID on restbase2013 - https://phabricator.wikimedia.org/T210877 (10jijiki) 05Open>03Resolved a:03jijiki @Papaul removed 2 SSDs from each restbase server T210863#4790081 [20:15:42] papaul: sorry for spamming you [20:16:32] papaul: i set it for 4 days [20:17:22] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), and 2 others: Reconfigure hardware and reimage restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T210863 (10Dzahn) set downtime in icinga for 4 days: ` [icinga1001:~] $ for server in $(seq 14 18); do echo restbase20${ser... [20:18:24] mutante: thank you you rock [20:23:01] !log temporarily disabled puppet on stat1005 to test rsyncd changes [20:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:04] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:28:06] (03PS1) 10RobH: adding in hawk 1.92TB mixed use ssd sku [software] - 10https://gerrit.wikimedia.org/r/476918 [20:38:37] (03CR) 10RobH: [C: 032] adding in hawk 1.92TB mixed use ssd sku [software] - 10https://gerrit.wikimedia.org/r/476918 (owner: 10RobH) [20:44:49] Was there major lag on the wiki? [20:45:06] O_O [20:47:16] (03PS1) 10Ottomata: Allow pull based rsync between stat & notebook boxes only [puppet] - 10https://gerrit.wikimedia.org/r/476920 (https://phabricator.wikimedia.org/T205157) [20:48:24] don't know about lag, haven't heard any reports [20:48:35] how did I get nerdswiped into posting here at 11 pm [20:48:40] hi [20:48:44] I'm signing off, folks. have a good one [20:48:57] See ya, apergos :) [20:49:01] (hi, ttyl) [20:54:51] Bsadowski1, is there a problem? [20:57:41] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:57:44] I guess not anymore. I don't know why I thought there was one. [20:58:59] PROBLEM - Disk space on notebook1004 is CRITICAL: DISK CRITICAL - free space: /srv 5243 MB (3% inode=82%) [21:02:37] (03PS3) 10Dzahn: kibana: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476916 [21:02:45] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/476916 (owner: 10Dzahn) [21:08:48] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1002/36/" [puppet] - 10https://gerrit.wikimedia.org/r/476916 (owner: 10Dzahn) [21:09:27] (03CR) 10Dzahn: "the goal is to stop using the apache module entirely. we are getting close. only a few things still using it" [puppet] - 10https://gerrit.wikimedia.org/r/476916 (owner: 10Dzahn) [21:29:27] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [21:29:41] (03CR) 10GTirloni: [C: 032] jdk8: Switch base image to Stretch [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/463877 (https://phabricator.wikimedia.org/T205774) (owner: 10BryanDavis) [21:29:48] (03CR) 10jerkins-bot: [V: 04-1] jdk8: Switch base image to Stretch [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/463877 (https://phabricator.wikimedia.org/T205774) (owner: 10BryanDavis) [21:30:14] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [21:31:06] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [21:35:25] (03CR) 10Awight: "Very nice!" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [21:36:36] (03PS1) 10Herron: logstash: ship etherpad logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476926 (https://phabricator.wikimedia.org/T205852) [21:39:11] (03CR) 10Herron: "PCC looks good https://puppet-compiler.wmflabs.org/compiler1002/13806/" [puppet] - 10https://gerrit.wikimedia.org/r/476926 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [21:39:35] (03CR) 10Herron: [C: 032] logstash: ship etherpad logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476926 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron) [21:39:53] (03PS3) 10GTirloni: jdk8: Switch base image to Stretch [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/463877 (https://phabricator.wikimedia.org/T205774) (owner: 10BryanDavis) [21:40:31] (03CR) 10GTirloni: [C: 032] jdk8: Switch base image to Stretch [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/463877 (https://phabricator.wikimedia.org/T205774) (owner: 10BryanDavis) [21:43:17] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [21:44:17] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [21:49:07] (03PS1) 10GTirloni: jdk8: Switch base image to Stretch [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/476927 (https://phabricator.wikimedia.org/T205774) [21:50:36] (03CR) 10GTirloni: [C: 032] jdk8: Switch base image to Stretch [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/476927 (https://phabricator.wikimedia.org/T205774) (owner: 10GTirloni) [21:53:05] (03PS1) 10GTirloni: jdk8: Try to finally fix the invalid apt parameter [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/476975 (https://phabricator.wikimedia.org/T205774) [21:53:33] (03CR) 10GTirloni: [C: 032] jdk8: Try to finally fix the invalid apt parameter [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/476975 (https://phabricator.wikimedia.org/T205774) (owner: 10GTirloni) [22:16:00] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [22:16:29] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [22:16:30] (03PS2) 10Dzahn: Switch rsync::quickdatacopy to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/470612 (owner: 10Muehlenhoff) [22:16:42] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [22:18:02] (03PS1) 10Herron: logstash: ship zookeeper logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476977 (https://phabricator.wikimedia.org/T63789) [22:20:33] (03PS1) 10Thcipriani: Beta: add mwmaint01 to mediawiki-installation [puppet] - 10https://gerrit.wikimedia.org/r/476980 (https://phabricator.wikimedia.org/T125976) [22:22:41] (03CR) 10Dzahn: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13808/" [puppet] - 10https://gerrit.wikimedia.org/r/470612 (owner: 10Muehlenhoff) [22:24:07] (03CR) 10Herron: "PCC looks good https://puppet-compiler.wmflabs.org/compiler1001/13809/" [puppet] - 10https://gerrit.wikimedia.org/r/476977 (https://phabricator.wikimedia.org/T63789) (owner: 10Herron) [22:24:39] (03CR) 10Dzahn: [C: 032] Beta: add mwmaint01 to mediawiki-installation [puppet] - 10https://gerrit.wikimedia.org/r/476980 (https://phabricator.wikimedia.org/T125976) (owner: 10Thcipriani) [22:24:47] (03PS2) 10Dzahn: Beta: add mwmaint01 to mediawiki-installation [puppet] - 10https://gerrit.wikimedia.org/r/476980 (https://phabricator.wikimedia.org/T125976) (owner: 10Thcipriani) [22:24:56] (03CR) 10Dzahn: [C: 032] "beta-picked, labs only" [puppet] - 10https://gerrit.wikimedia.org/r/476980 (https://phabricator.wikimedia.org/T125976) (owner: 10Thcipriani) [22:25:03] thanks mutante ! [22:25:09] yw [22:26:43] (03CR) 10Alex Monk: "After rebasing we're still left with the modules/swift/manifests/mount_filesystem.pp change, I'll follow-up" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [22:28:22] (03CR) 10Alex Monk: "Oh, right, I actually think it's just an old copy of I90632b77, nvm" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [22:29:57] (03CR) 10Dzahn: [C: 032] "confirmed on multiple hosts: phab1002 can still rsync from phab1001, netmon2001 can still rsync from netmon1002, releases1001 can still r" [puppet] - 10https://gerrit.wikimedia.org/r/470612 (owner: 10Muehlenhoff) [22:31:04] (03PS1) 10RobH: updating sku list for quotes [software] - 10https://gerrit.wikimedia.org/r/476981 [22:31:22] (03PS12) 10Alex Monk: swift: use implicit /dev/swift prefix for swift devices [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) (owner: 10Filippo Giunchedi) [22:32:39] (03CR) 10Dzahn: "hashar wrote: "I don't think there is any place in production where we use php-fpm. It is probably better to stick to mod_php for consiste" [puppet] - 10https://gerrit.wikimedia.org/r/407958 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [22:33:26] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259 (10Krenair) [22:33:34] 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure, 10media-storage, 10Patch-For-Review: Puppet broken on deployment-ms-be0[34] with evaluation error in swift module - https://phabricator.wikimedia.org/T184236 (10Krenair) 05Open>03Resolved [22:36:49] (03PS1) 10Herron: logstash: ship kafka server logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476982 (https://phabricator.wikimedia.org/T63788) [22:49:33] (03PS1) 10Paladox: phabricator: Add support for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/476985 [22:53:03] (03PS1) 10Cwhite: update port to 9245 [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/476986 (https://phabricator.wikimedia.org/T208066) [23:00:34] (03PS1) 10Thcipriani: Beta: Replace deployment-redis servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476988 [23:08:35] (03PS1) 10Dzahn: phabricator: move httpd setup to a profile [puppet] - 10https://gerrit.wikimedia.org/r/476989 [23:08:39] (03CR) 10BryanDavis: "Largely the same as I2483f90221b6e748f52acac01305a8a93cd34c2d. Either seems fine to me. I'm not sure anyone knows why we ended up with 4 m" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476988 (owner: 10Thcipriani) [23:09:05] (03CR) 10Dzahn: "per IRC: let me move the httpd part to a profile first..to make it possible for you to use Hiera lookup in a profile parameter to conditio" [puppet] - 10https://gerrit.wikimedia.org/r/476985 (owner: 10Paladox) [23:09:14] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/476989" [puppet] - 10https://gerrit.wikimedia.org/r/476985 (owner: 10Paladox) [23:09:30] (03CR) 10Cwhite: [C: 032] update port to 9245 [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/476986 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [23:09:42] (03CR) 10Paladox: [C: 031] "Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/476989 (owner: 10Dzahn) [23:10:01] (03Abandoned) 10Thcipriani: Beta: Replace deployment-redis servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476988 (owner: 10Thcipriani) [23:10:11] (03PS2) 10Dzahn: phabricator: move httpd setup to a profile [puppet] - 10https://gerrit.wikimedia.org/r/476989 [23:10:51] (03PS2) 10Cwhite: role, profile: install, run, and collect icinga exporter metrics [puppet] - 10https://gerrit.wikimedia.org/r/476431 (https://phabricator.wikimedia.org/T208066) [23:13:23] (03CR) 10Dzahn: [C: 032] "no changes except the new class name https://puppet-compiler.wmflabs.org/compiler1002/13810/" [puppet] - 10https://gerrit.wikimedia.org/r/476989 (owner: 10Dzahn) [23:15:52] (03PS2) 10Paladox: phabricator: Add support for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/476985 [23:17:14] (03PS1) 10Bstorm: sonofgridengine: correct shadow and master init issues [puppet] - 10https://gerrit.wikimedia.org/r/476990 (https://phabricator.wikimedia.org/T200557) [23:18:22] (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: correct shadow and master init issues [puppet] - 10https://gerrit.wikimedia.org/r/476990 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [23:22:13] (03PS2) 10Bstorm: sonofgridengine: correct shadow and master init issues [puppet] - 10https://gerrit.wikimedia.org/r/476990 (https://phabricator.wikimedia.org/T200557) [23:23:42] (03CR) 10Bstorm: [C: 032] sonofgridengine: correct shadow and master init issues [puppet] - 10https://gerrit.wikimedia.org/r/476990 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [23:31:28] (03PS1) 10Dzahn: phabricator: monitoring to profile, merge mpm with httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/476996 [23:33:24] (03PS2) 10Dzahn: phabricator: monitoring to profile, merge mpm with httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/476996 [23:33:33] (03CR) 10Faidon Liambotis: [C: 04-1] updating sku list for quotes (035 comments) [software] - 10https://gerrit.wikimedia.org/r/476981 (owner: 10RobH) [23:34:00] (03CR) 10Paladox: [C: 031] phabricator: monitoring to profile, merge mpm with httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/476996 (owner: 10Dzahn) [23:40:21] (03PS3) 10Dzahn: phabricator: monitoring to profile, merge mpm with httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/476996 [23:42:51] (03PS2) 10RobH: Quotereviewer: updating SKU list for quotes [software] - 10https://gerrit.wikimedia.org/r/476981 [23:43:38] (03PS3) 10RobH: Quotereviewer: update SKU list for quotes [software] - 10https://gerrit.wikimedia.org/r/476981 [23:43:55] (03CR) 10Dzahn: [C: 032] "again no changes except resource/class names: https://puppet-compiler.wmflabs.org/compiler1002/13811/" [puppet] - 10https://gerrit.wikimedia.org/r/476996 (owner: 10Dzahn) [23:44:15] (03PS4) 10Dzahn: phabricator: monitoring to profile, merge mpm with httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/476996 [23:44:53] (03PS3) 10Paladox: phabricator: Add support for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/476985 [23:45:08] (03PS4) 10RobH: Quotereviewer: update SKU list for quotes [software] - 10https://gerrit.wikimedia.org/r/476981 [23:45:15] (03PS5) 10Dzahn: phabricator: monitoring to profile, merge mpm with httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/476996 [23:46:33] (03CR) 10RobH: [C: 031] "So I cleaned up the commit message, but then missed some double versus single spacing. (I tend to put double spacing, but I copy/pasted a" [software] - 10https://gerrit.wikimedia.org/r/476981 (owner: 10RobH) [23:58:06] (03PS4) 10Paladox: phabricator: Add support for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/476985 [23:58:36] (03PS5) 10Paladox: phabricator: Add support for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/476985 [23:59:20] (03CR) 10Dzahn: [C: 032] phabricator: monitoring to profile, merge mpm with httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/476996 (owner: 10Dzahn) [23:59:52] (03PS6) 10Paladox: phabricator: Add support for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/476985