[00:00:30] RECOVERY - puppet last run on analytics1075 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:01:54] (03CR) 10Bstorm: [C: 032] wiki replicas: Fix stale logging_whitelist reference [puppet] - 10https://gerrit.wikimedia.org/r/420937 (owner: 10BryanDavis) [00:03:19] (03PS1) 10Kaldari: admins: Updating ssh key for kaldari [puppet] - 10https://gerrit.wikimedia.org/r/420938 [00:03:25] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4067234 (10Papaul) [00:07:28] volans: Now I need to get my puppet patch to update my ssh key merged (^), but I forget what the procedure for verifying my identity is. [00:11:04] 10Operations, 10Security-Team: can we get rid of rsvg security patch? - https://phabricator.wikimedia.org/T104147#4067253 (10Jdforrester-WMF) [00:11:13] kaldari: you have access right now (with the old key), do you? [00:11:14] 10Operations, 10RelEng-Archive-FY201718-Q2, 10Release-Engineering-Team, 10WorkType-NewFunctionality: [Spike] Try out hack ( 10Operations, 10Goal, 10HHVM: Complete the use of HHVM over Zend PHP on the Wikimedia cluster - https://phabricator.wikimedia.org/T86081#4067249 (10Jdforrester-WMF) 05declined>03Resolved Despite being declined, the Release Engineering and Cloud Services teams between them fixed this last week as part of... [00:11:57] kaldari: an acceptable way to verify it is if you just put a file (or the new key itself) in your home dir somewhere [00:12:43] mutante: No, my new computer bricked so I'm actually locked out of the production servers. (I know, I should have backed up my keys.) What's the alternative? [00:13:04] kaldari: Google Hangout? [00:13:12] good idea :) [00:13:17] could you invite me [00:16:51] mutante: https://hangouts.google.com/hangouts/_/wikimedia.org/dzahn-rkaldari [00:18:26] confirmed that's you :) [00:18:46] (03PS2) 10Dzahn: admins: Updating ssh key for kaldari [puppet] - 10https://gerrit.wikimedia.org/r/420938 (owner: 10Kaldari) [00:18:49] This time I'm backing up my key! [00:19:00] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:19:34] (03CR) 10Dzahn: [C: 032] "confirmed in a hangout this was actually kaldari :)" [puppet] - 10https://gerrit.wikimedia.org/r/420938 (owner: 10Kaldari) [00:19:53] i hope the new machine is better. it has a lower inventory number :) [00:20:50] kaldari: btw, which bastion host are you using in your ssh config [00:21:11] bast1001.wikimedia.org [00:21:39] kaldari: please change it to bast1002 while at it , bast1001 is going away because it's old [00:21:47] or bast4001 [00:21:55] depending on your physical location [00:21:59] 4001 is in SF [00:23:00] see map on https://wikitech.wikimedia.org/wiki/Bastion [00:27:44] 10Operations, 10ops-codfw: rack/setup/install ms-be204[0-3] - https://phabricator.wikimedia.org/T189633#4067291 (10Papaul) [00:28:14] it has your new key now on either of them [00:50:11] (03CR) 10Jon Harald Søby: [C: 031] Update logo for Dutch Low Saxon Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420934 (https://phabricator.wikimedia.org/T190051) (owner: 10Odder) [01:05:23] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4002973 (10Dzahn) https://etherpad.wikimedia.org/p/mw2259-mw2290_install [01:08:02] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4067341 (10Dzahn) mw2259 thru mw2290 have all been installed by Papaul and then i added them to site with role::spare but already sectioned by rows, see change above.... [01:13:14] (03CR) 10Jon Harald Søby: [C: 031] Properly setup ProofreadPage namespaces for cywikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394189 (https://phabricator.wikimedia.org/T181406) (owner: 10Tpt) [01:18:49] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4067358 (10Dzahn) [01:19:20] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4002973 (10Dzahn) [01:36:03] 10Operations, 10Puppet, 10Patch-For-Review: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253#4067366 (10herron) The steps in https://phabricator.wikimedia.org/T177253#4056981 have been completed and production is now running puppetdb 4.4 from codfw Here are my notes from the upgra... [01:39:00] 10Operations, 10Puppet, 10Release-Engineering-Team: puppetdb4: use postgres db backend in puppet-compiler - https://phabricator.wikimedia.org/T187258#4067367 (10herron) compiler03 has been enabled and compiler02 disabled (in jenkins) Running a test puppet compiler build works and looks good. Will proceed w... [01:41:51] (03PS1) 10Herron: cumin: point cumin master to puppetdb1001 [puppet] - 10https://gerrit.wikimedia.org/r/420946 (https://phabricator.wikimedia.org/T177253) [01:42:23] Reedy: any chance you can push a new release of AWB? [01:48:03] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Access request to stat1005 and stat1006 for cooltey - https://phabricator.wikimedia.org/T190150#4067376 (10cooltey) @RobH Hi, I found that I cannot access to the Gerrit, does the previous actions affect anything to my account? [01:51:28] !log codfw puppetdb upgrade complete. eqiad puppetmaster remains depooled T177253 [01:51:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:51:35] T177253: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253 [01:57:35] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4067378 (10Papaul) mw2267 is fix now [01:59:51] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4067380 (10Papaul) [02:00:20] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4002973 (10Papaul) a:05Papaul>03Joe [02:00:30] PROBLEM - Disk space on furud is CRITICAL: DISK CRITICAL - free space: /mnt/3a 1753981 MB (4% inode=96%): /mnt/3b 1604342 MB (4% inode=97%): /mnt/4a 1862344 MB (4% inode=97%): /mnt/4b 1635708 MB (4% inode=97%): /mnt/5a 1717311 MB (4% inode=97%): /mnt/5b 1490589 MB (3% inode=97%): /mnt/6a 1845598 MB (4% inode=97%) [02:28:42] (03PS1) 10MaxSem: Redeploy GlobalPreferences to test wikis and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420947 (https://phabricator.wikimedia.org/T189806) [02:32:31] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.25) (duration: 06m 18s) [02:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:43:30] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 71, down: 1, dormant: 0, excluded: 0, unused: 0 [02:44:10] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0 [02:50:22] (03PS1) 10Niharika29: Deploy GlobalPrefs to all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 [02:55:32] (03CR) 10Ayounsi: [C: 032] Add DHCP and partman for ping1001 [puppet] - 10https://gerrit.wikimedia.org/r/420933 (https://phabricator.wikimedia.org/T190090) (owner: 10Ayounsi) [02:55:40] (03PS2) 10Ayounsi: Add DHCP and partman for ping1001 [puppet] - 10https://gerrit.wikimedia.org/r/420933 (https://phabricator.wikimedia.org/T190090) [03:16:49] (03PS1) 10Ayounsi: Add ping1001 to puppet (test) [puppet] - 10https://gerrit.wikimedia.org/r/420949 (https://phabricator.wikimedia.org/T190090) [03:17:48] (03CR) 10Ayounsi: [C: 032] Add ping1001 to puppet (test) [puppet] - 10https://gerrit.wikimedia.org/r/420949 (https://phabricator.wikimedia.org/T190090) (owner: 10Ayounsi) [03:22:19] (03CR) 10Legoktm: "Shouldn't this only be enabled on public SUL wikis?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 (owner: 10Niharika29) [03:26:10] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 819.13 seconds [03:27:22] 10Operations, 10Goal, 10HHVM: Complete the use of HHVM over Zend PHP on the Wikimedia cluster - https://phabricator.wikimedia.org/T86081#4067426 (10Krinkle) [03:27:28] 10Operations, 10Deployments, 10Beta-Cluster-reproducible, 10HHVM, and 2 others: Switch mwscript from Zend PHP5 to default php alternative (e.g. HHVM or PHP7) - https://phabricator.wikimedia.org/T146285#4067422 (10Krinkle) 05Resolved>03Open Reverted (again) as of {48b9d506a9bb7386d92b1bcaed109dc776ca041... [03:27:59] 10Operations, 10vm-requests: eqiad: 1 VM %request for ping offload - https://phabricator.wikimedia.org/T190243#4067428 (10ayounsi) [03:36:38] (03CR) 10Aaron Schulz: [WIP] Add dynomite module and dynomite_wancache profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415789 (owner: 10Aaron Schulz) [03:38:41] (03PS2) 10BBlack: VCL: fixed 20m grace value regardless of health [puppet] - 10https://gerrit.wikimedia.org/r/420927 (https://phabricator.wikimedia.org/T189892) [03:40:49] (03CR) 10BBlack: [C: 032] VCL: fixed 20m grace value regardless of health [puppet] - 10https://gerrit.wikimedia.org/r/420927 (https://phabricator.wikimedia.org/T189892) (owner: 10BBlack) [03:41:03] (03PS2) 10BBlack: VCL: only keep objects worth keep-ing [puppet] - 10https://gerrit.wikimedia.org/r/420928 (https://phabricator.wikimedia.org/T189892) [03:41:42] (03CR) 10BBlack: [C: 032] VCL: only keep objects worth keep-ing [puppet] - 10https://gerrit.wikimedia.org/r/420928 (https://phabricator.wikimedia.org/T189892) (owner: 10BBlack) [03:45:17] _joe_: where would a new empty dynomite repo go? operations/debs/dynomite? [03:49:39] if it's just tacking on debian/ contents for debianization of an upstream, probably [03:49:50] (or modifying them from another debianized upstream) [03:56:41] PROBLEM - puppet last run on labservices1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:06:02] (03CR) 10Niharika29: "Oh right. Let me check which ones should be skipped." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 (owner: 10Niharika29) [04:10:06] (03PS2) 10Niharika29: Deploy GlobalPrefs to all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 [04:11:40] RECOVERY - puppet last run on labservices1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:12:20] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 254.04 seconds [04:36:00] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received [04:38:00] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [04:43:30] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [04:43:31] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [04:52:31] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [04:52:31] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [04:53:31] (03CR) 10TerraCodes: Deploy GlobalPrefs to all production wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 (owner: 10Niharika29) [05:06:15] (03CR) 10Niharika29: "When did they change that to nonglobal? :P" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 (owner: 10Niharika29) [05:06:33] (03PS3) 10Niharika29: Deploy GlobalPrefs to all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 [05:18:42] is anyone else getting a certificate issue when trying to access phab? [05:20:11] https://awau.moe/35f2a9.png [05:21:00] (03CR) 10TerraCodes: [C: 031] Deploy GlobalPrefs to all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 (owner: 10Niharika29) [05:28:46] I'm also unable to access enwiki? [05:31:13] Cupid: I can access both. [05:31:22] I'm able to access gerrit tho, weird [05:32:26] weird, nothing should be blocking it, I was able to access both phab and enwiki fine a bit ago [05:33:56] Cupid: works for me. Is your clock accurate? hmmm [05:35:14] yes, my clock is correct [06:24:00] (03PS1) 10Marostegui: db-eqiad.php: Depool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420952 (https://phabricator.wikimedia.org/T148507) [06:24:56] (03PS1) 10Marostegui: es1019.yaml: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/420953 [06:25:31] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420952 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [06:26:40] (03Merged) 10jenkins-bot: db-eqiad.php: Depool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420952 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [06:28:38] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool es1019 - socket location upgrade (duration: 01m 21s) [06:28:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:01] !log Stop MySQL on es1019 to upgrade socket path [06:29:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:14] (03CR) 10Marostegui: [C: 032] es1019.yaml: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/420953 (owner: 10Marostegui) [06:29:38] (03CR) 10jenkins-bot: db-eqiad.php: Depool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420952 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [06:30:40] PROBLEM - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/gen_fingerprints] [06:33:30] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420954 [06:40:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420954 (owner: 10Marostegui) [06:41:12] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420954 (owner: 10Marostegui) [06:41:38] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420954 (owner: 10Marostegui) [06:41:53] (03CR) 10Aklapper: "Yes. https://gerrit.wikimedia.org/r/420131 was deployed instead (as far as I understand)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408073 (https://phabricator.wikimedia.org/T186463) (owner: 10Zoranzoki21) [06:42:47] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 after socket location upgrade (duration: 01m 14s) [06:42:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:37] (03PS1) 10Marostegui: db-eqiad.php: Depool pc1005 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420955 [06:50:51] !log Stop MySQL on db1020 - T189773 [06:50:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:57] T189773: Decommission db1020 - https://phabricator.wikimedia.org/T189773 [06:55:09] (03PS1) 10Marostegui: mariadb: Get ready to decommission db1020 [puppet] - 10https://gerrit.wikimedia.org/r/420956 (https://phabricator.wikimedia.org/T189773) [06:55:40] RECOVERY - puppet last run on labvirt1017 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:56:57] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1020 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420958 (https://phabricator.wikimedia.org/T189773) [06:58:25] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Remove db1020 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420958 (https://phabricator.wikimedia.org/T189773) (owner: 10Marostegui) [06:59:35] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1020 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420958 (https://phabricator.wikimedia.org/T189773) (owner: 10Marostegui) [06:59:50] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1020 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420958 (https://phabricator.wikimedia.org/T189773) (owner: 10Marostegui) [07:01:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db1020 from config - T189773 (duration: 01m 13s) [07:01:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:01:18] T189773: Decommission db1020 - https://phabricator.wikimedia.org/T189773 [07:02:55] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db1020 from config - T189773 (duration: 01m 15s) [07:03:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:17] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler03/10548/" [puppet] - 10https://gerrit.wikimedia.org/r/420956 (https://phabricator.wikimedia.org/T189773) (owner: 10Marostegui) [07:07:49] !log Remove db1020 from tendril - T189773 [07:07:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:07:56] T189773: Decommission db1020 - https://phabricator.wikimedia.org/T189773 [07:09:34] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1020 - https://phabricator.wikimedia.org/T189773#4067590 (10Marostegui) a:05Marostegui>03RobH This host is now ready for DC Ops steps. Assigning to @RobH [07:22:06] 08Warning Alert for device cr2-esams.wikimedia.org - Inbound interface errors [07:27:56] checking librenms for --^ but it is not that easy to figure out what's wrong [07:28:50] (03PS1) 10Marostegui: m2.hosts: Remove db1020 [software] - 10https://gerrit.wikimedia.org/r/420960 (https://phabricator.wikimedia.org/T189773) [07:29:45] (03CR) 10Marostegui: [C: 032] m2.hosts: Remove db1020 [software] - 10https://gerrit.wikimedia.org/r/420960 (https://phabricator.wikimedia.org/T189773) (owner: 10Marostegui) [07:30:36] (03Merged) 10jenkins-bot: m2.hosts: Remove db1020 [software] - 10https://gerrit.wikimedia.org/r/420960 (https://phabricator.wikimedia.org/T189773) (owner: 10Marostegui) [07:30:51] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [07:31:51] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [07:33:23] so the cr2-esams link is https://librenms.wikimedia.org/device/device=145/tab=port/port=11521/ (Cc: XioNoX - if possible would be great to be linked in the icinga log itself) [07:33:57] about the 5xx, cp3032 seems to be peaking in failed fetches [07:34:55] but the 503 peak seems gone now [07:34:58] Cc: ema [07:36:03] it would be interesting to know if the cr2-esams warning and this event are somehow correlated [07:38:21] now cp3033 is showing failed fetches [07:44:53] the level of 503s is still ongoing but not that heavy [07:47:20] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 [07:47:51] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [07:50:49] (03PS7) 10Jcrespo: mariadb: Move default socket for misc services to /run [puppet] - 10https://gerrit.wikimedia.org/r/420331 (https://phabricator.wikimedia.org/T148507) [07:56:00] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [07:56:00] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [07:59:28] hashar: o/ - contint1001's root partition has only ~4.5GB left, afaics 25G are in /var/lib/docker/overlay2 [08:01:22] elukey: hello. Yeah annoyingly docker is on the root partition :^\ [08:05:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool pc1005 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420955 (owner: 10Marostegui) [08:05:03] (03PS2) 10Marostegui: db-eqiad.php: Depool pc1005 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420955 [08:05:33] elukey: that would be https://phabricator.wikimedia.org/T178663 which is to rethink the storage backend. I will compliment that task [08:05:35] and do the cleanu [08:05:36] p [08:06:29] super thanks! [08:07:06] 08̶W̶a̶r̶n̶i̶n̶g Device cr2-esams.wikimedia.org recovered from Inbound interface errors [08:07:27] really curious to know how to read those --^ [08:07:43] * elukey asks netops [08:07:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool pc1005 for kernel, mariadb and socket location upgrade (duration: 01m 15s) [08:07:58] 10Operations, 10Release Pipeline, 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban): Switch CI Docker Storage Driver to its own partition and to use devicemapper - https://phabricator.wikimedia.org/T178663#4067655 (10hashar) [08:08:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:23] its barely able to read yellow with a line through it 👍🏻 [08:08:25] https://awau.moe/70cf8b.png [08:08:31] !log Stop MySQL on pc1005 for kernel, mariadb and socket path upgrade [08:08:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:52] Cupid: ah right yellow on white background kills the eye :D [08:09:09] (03CR) 10jenkins-bot: db-eqiad.php: Depool pc1005 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420955 (owner: 10Marostegui) [08:09:19] !log contint1001: docker image prune ; docker container prune [08:09:22] hm... [08:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:34] !log contint1001: docker image prune ; docker container prune # T178663 [08:09:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:40] T178663: Switch CI Docker Storage Driver to its own partition and to use devicemapper - https://phabricator.wikimedia.org/T178663 [08:09:49] is there anything different between gerrit and the rest of wmf's sites, ssl certificate wise? [08:10:25] !log contint1001: deleting some old docker images [08:10:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:33] elukey: hi :) [08:11:04] ema: buongiorno :) [08:11:40] (03PS1) 10Giuseppe Lavagetto: prometheus: unbreak all jmx-originated metrics collection [puppet] - 10https://gerrit.wikimedia.org/r/420962 (https://phabricator.wikimedia.org/T187259) [08:11:48] so I see that bblack merged some interesting VCL patches during the EU night [08:12:02] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool pc1005" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420963 [08:12:25] they might affect the text@esams morning drama, let's see [08:12:49] (03CR) 10Giuseppe Lavagetto: [C: 032] prometheus: unbreak all jmx-originated metrics collection [puppet] - 10https://gerrit.wikimedia.org/r/420962 (https://phabricator.wikimedia.org/T187259) (owner: 10Giuseppe Lavagetto) [08:12:59] * elukey sends wikilove to _joe_ [08:15:33] weird [08:15:44] I'm unable to access any wmf site, except for gerrit [08:16:21] Cupid: can you give us more info? What do you mean with "unable to access any wmf site" ? SSL/TLS errors? [08:17:20] https://awau.moe/35f2a9.png is what I get when I try to go to any wmf site (except gerrit) [08:18:02] can you provide details about the certificate you are seeing? [08:18:14] and can you check the date of your systemd? [08:18:47] it only started happening recently tho [08:18:56] my clock is correct [08:18:59] !log ema@neodymium conftool action : set/pooled=no; selector: name=cp3033,service=varnish-be [08:19:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:06] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool pc1005" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420963 (owner: 10Marostegui) [08:19:42] and no, firefox isn't giving me a way to view the cerificate [08:20:13] this is what we need to see https://i.imgur.com/Uo9roWv.png [08:20:31] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1005" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420963 (owner: 10Marostegui) [08:20:41] !log ema@neodymium conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet,service=varnish-be [08:20:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:00] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1005" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420963 (owner: 10Marostegui) [08:21:04] https://awau.moe/71416c.png but I don't have a view certificate button [08:21:25] interesting that if you're retarded like I am and use the wrong confctl command, it still announces stuff on irc without doing anything [08:21:41] I've ran this as non-root: `confctl select name=cp3033,service='varnish-be' set/pooled=no` [08:21:53] and it logged here without doing anything useful [08:22:08] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool pc1005 after kernel, mariadb and socket location upgrade (duration: 01m 15s) [08:22:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:45] which version of firefox are you using? which OS? [08:22:48] Cupid: it says connection not encrypted [08:22:58] which is weird (compare to mine) [08:23:17] so that would be it, why it happens cannot say [08:23:24] moritzm: firefox 59 on windows 10 [08:23:57] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4067669 (10Marostegui) [08:25:35] (03PS1) 10Marostegui: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420964 (https://phabricator.wikimedia.org/T183469) [08:26:01] is that a private internet connection or some corporate network (which might have proxies in between)? are you using any security tools which might meddle with your TLS setup locally? [08:26:28] moritzm: I will let you handle, I am needed elsewhere [08:26:33] jynus: sure [08:27:01] (03PS8) 10Jcrespo: mariadb: Move default socket for misc services to /run [puppet] - 10https://gerrit.wikimedia.org/r/420331 (https://phabricator.wikimedia.org/T148507) [08:27:13] Cupid: your browser supports state-of-the-art cryptography (as does our site), so there's something blocking it [08:27:59] (03CR) 10Jcrespo: [C: 032] mariadb: Move default socket for misc services to /run [puppet] - 10https://gerrit.wikimedia.org/r/420331 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [08:28:14] private internet connection, might be something blocking it, but it was working fine a few hours ago and it works fine in chrome, so I don't think its being blocked [08:28:38] if it works ok on other browsers, it could be an extension [08:30:09] are you using so-called antivirus tools? if so, maybe try whether stopping that makes a difference [08:30:37] it could also be a firefox latest version breakage- it wouldn't be the first time that firefox "breaks" wikimedia, although we would need more test cases to confirm it [08:31:30] (03PS1) 10Giuseppe Lavagetto: prometheus: unbreak pdus, redis exporters [puppet] - 10https://gerrit.wikimedia.org/r/420965 (https://phabricator.wikimedia.org/T187259) [08:32:13] the only two extensions I have are lastpass and ublock origin, I tried disabling ublock origin, but that didn't work [08:32:32] (03CR) 10Urbanecm: "Wasn't requested." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420122 (https://phabricator.wikimedia.org/T189778) (owner: 10Urbanecm) [08:32:49] and I tried disabling my anti virus [08:32:53] and it worked [08:33:02] strange it didn't affect chrome [08:33:17] <_joe_> Cupid: your anti-virust was doing man-in-the-middle for you, probably [08:33:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420964 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [08:34:17] (03CR) 10Giuseppe Lavagetto: [C: 032] prometheus: unbreak pdus, redis exporters [puppet] - 10https://gerrit.wikimedia.org/r/420965 (https://phabricator.wikimedia.org/T187259) (owner: 10Giuseppe Lavagetto) [08:34:25] if my antivirus downgraded my TLS connections, I would uninstall immediately [08:34:27] (03PS2) 10Giuseppe Lavagetto: prometheus: unbreak pdus, redis exporters [puppet] - 10https://gerrit.wikimedia.org/r/420965 (https://phabricator.wikimedia.org/T187259) [08:34:33] *it [08:34:58] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420964 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [08:35:11] RECOVERY - Check systemd state on labtestweb2001 is OK: OK - running: The system is fully operational [08:35:27] !log ema@neodymium conftool action : set/pooled=yes; selector: name=cp3033.esams.wmnet,service=varnish-be [08:35:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:35:38] marostegui: running puppet on db1063 [08:35:39] !log ema@neodymium conftool action : set/pooled=no; selector: name=cp3030.esams.wmnet,service=varnish-be [08:35:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:29] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1106 - T183469 (duration: 01m 14s) [08:36:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:35] T183469: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469 [08:37:21] I will also kill heartbeat and restart it with puppet [08:37:30] cool [08:37:41] yep, disabling kaspersky's scanning of secure connections seems to have fixed my issue [08:37:44] thanks all [08:38:11] PROBLEM - Check systemd state on labtestweb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:38:42] you are welcome but "scanning of secure connections" sound really sketchy [08:39:09] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420964 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [08:43:00] true [08:43:02] https://awau.moe/baa88d.png [08:45:21] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0 [08:45:30] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: CRITICAL - Destination Unreachable (2607:f6f0:205::153) [08:45:40] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [08:46:38] !log ema@neodymium conftool action : set/pooled=yes; selector: name=cp3030.esams.wmnet,service=varnish-be [08:46:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:46] !log ema@neodymium conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet,service=varnish-be [08:46:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:01] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [08:48:01] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [08:48:21] RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0 [08:50:40] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 3.21 ms [08:50:50] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.58 ms [08:51:17] looks like the 5xx are esams, known? [08:51:27] godog: yes [08:51:47] kk, thanks [08:56:26] (03CR) 10Filippo Giunchedi: "LGTM modulo naming (see inline)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/420832 (https://phabricator.wikimedia.org/T184942) (owner: 10Vgutierrez) [08:56:31] !log Stop mysql on db2037 for new socket config [08:56:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:01] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [08:59:01] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [09:03:07] (03PS2) 10Volans: cumin: point cumin master to puppetdb1001 [puppet] - 10https://gerrit.wikimedia.org/r/420946 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [09:05:39] (03PS1) 10Marostegui: misc.pp: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/420970 [09:06:14] (03CR) 10Marostegui: [C: 032] misc.pp: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/420970 (owner: 10Marostegui) [09:06:19] (03PS3) 10Volans: cumin: point cumin master to puppetdb1001 [puppet] - 10https://gerrit.wikimedia.org/r/420946 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [09:06:51] 10Operations, 10Domains, 10Traffic, 10Wikimedia-Apache-configuration: en-wp.org certificate error - https://phabricator.wikimedia.org/T190244#4067737 (10Peachey88) [09:08:42] (03CR) 10Volans: "Compiler results: https://puppet-compiler.wmflabs.org/compiler03/10550/sarin.codfw.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/420946 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [09:11:16] (03CR) 10Filippo Giunchedi: [C: 031] cumin: point cumin master to puppetdb1001 [puppet] - 10https://gerrit.wikimedia.org/r/420946 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [09:11:36] !log Stop mysql on db2078 for new socket config [09:11:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:53] !log ema@neodymium conftool action : set/pooled=yes; selector: name=cp3033.esams.wmnet,service=varnish-be [09:11:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:23] (03CR) 10Volans: [C: 032] cumin: point cumin master to puppetdb1001 [puppet] - 10https://gerrit.wikimedia.org/r/420946 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [09:12:33] (03PS4) 10Volans: cumin: point cumin master to puppetdb1001 [puppet] - 10https://gerrit.wikimedia.org/r/420946 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [09:17:50] PROBLEM - Request latencies on chlorine is CRITICAL: CRITICAL - apiserver_request_latencies is 1653402 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:18:51] RECOVERY - Request latencies on chlorine is OK: OK - apiserver_request_latencies is 3859 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:23:08] !log ema@neodymium conftool action : set/pooled=no; selector: name=cp3032.esams.wmnet,service=varnish-be [09:23:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:24] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4067766 (10Joe) >>! In T188301#4023842, @elukey wrote: > This is the current layout of our mw codfw servers: > > ||A|B|C|D| > |appserver| 20|25 (20)|37|0| > |api|12|28... [09:26:00] 10Operations, 10Puppet: Port puppetdb grafana dashboard to v4 metrics - https://phabricator.wikimedia.org/T190252#4067768 (10fgiunchedi) p:05Triage>03Normal [09:26:59] text@esams issue still ongoing, trying to keep it reasonable [09:27:37] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4067780 (10Joe) At a later time, we might also want to redistribute servers a bit more, specifically I'd move some of the row-c appservers to the api cluster. But that... [09:28:20] meanwhile, help is welcome from whoever is good at kibanaing/logstashing: how do you do string replacement? [09:28:23] https://logstash.wikimedia.org/goto/5a66b5c5dfc72c077f492e6f07b11a63 [09:29:02] I have this data and I'd need to remove the vcl-$blah part from the origin_server field (till 'be_' basically) [09:29:40] vcl-2f432183-025a-4afb-a282-7e95a155b6ee.be_cp3030_esams_wmnet [09:29:52] -> be_cp3030_esams_wmnet [09:32:50] PROBLEM - Varnish HTTP text-backend - port 3128 on cp3030 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:33:40] RECOVERY - Varnish HTTP text-backend - port 3128 on cp3030 is OK: HTTP OK: HTTP/1.1 200 OK - 220 bytes in 0.168 second response time [09:40:10] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [09:40:42] !log Stop db1065 and db1106 in sync - this will generate lag on labs [09:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:10] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [09:47:13] !log installing tiff security updates on trusty [09:47:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:07] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420974 [09:53:25] (03PS2) 10Vgutierrez: prometheus: aggregate varnish_x_cache metrics [puppet] - 10https://gerrit.wikimedia.org/r/420832 (https://phabricator.wikimedia.org/T184942) [09:54:04] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420974 (owner: 10Marostegui) [09:54:11] (03CR) 10Vgutierrez: "Fixed, thanks for your help Filippo!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/420832 (https://phabricator.wikimedia.org/T184942) (owner: 10Vgutierrez) [09:54:34] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/420832 (https://phabricator.wikimedia.org/T184942) (owner: 10Vgutierrez) [09:54:52] (03CR) 10Vgutierrez: [C: 032] prometheus: aggregate varnish_x_cache metrics [puppet] - 10https://gerrit.wikimedia.org/r/420832 (https://phabricator.wikimedia.org/T184942) (owner: 10Vgutierrez) [09:55:11] (03PS3) 10Vgutierrez: prometheus: aggregate varnish_x_cache metrics [puppet] - 10https://gerrit.wikimedia.org/r/420832 (https://phabricator.wikimedia.org/T184942) [09:55:17] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420974 (owner: 10Marostegui) [09:56:55] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1106 - T183469 (duration: 01m 15s) [09:57:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:01] T183469: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469 [09:57:27] (03PS1) 10Marostegui: db1065.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/420975 (https://phabricator.wikimedia.org/T183469) [09:57:55] !log installing php5 security updates on trusty (jessie already fixed) [09:58:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:02] (03PS2) 10Marostegui: db1065.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/420975 (https://phabricator.wikimedia.org/T183469) [09:58:11] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [09:58:11] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [09:59:02] (03CR) 10Marostegui: [C: 032] db1065.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/420975 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [09:59:10] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420974 (owner: 10Marostegui) [09:59:30] vgutierrez: can I merge your puppet changes? [10:00:02] err according to gerrit is merged [10:00:13] vgutierrez: but not in puppetmaster :) [10:00:21] oh please, go ahead :) [10:00:21] maybe you forgot to do that? [10:00:28] ok, doing it [10:00:37] thx :* [10:00:41] done :) [10:03:07] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1065 from MW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420976 (https://phabricator.wikimedia.org/T183469) [10:04:40] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Remove db1065 from MW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420976 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:04:49] (03PS1) 10Ema: varnishospital: distinguish between origin server and vcl id [puppet] - 10https://gerrit.wikimedia.org/r/420977 (https://phabricator.wikimedia.org/T174932) [10:04:53] 10Operations, 10Puppet, 10Patch-For-Review: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253#4067901 (10Volans) List of hosts with puppet disabled since before the migration, that are missing in the new puppetdb and would disappear from Icinga upon re-enabling puppet there: ``` la... [10:05:41] (03PS1) 10Jcrespo: mariadb: Cleanup of hosts that are no longer part of production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 [10:05:50] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1065 from MW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420976 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:06:07] (03PS2) 10Jcrespo: mariadb: Cleanup hosts that are no longer part of production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 [10:07:00] (03CR) 10Vgutierrez: [C: 031] varnishospital: distinguish between origin server and vcl id [puppet] - 10https://gerrit.wikimedia.org/r/420977 (https://phabricator.wikimedia.org/T174932) (owner: 10Ema) [10:07:21] (03CR) 10Ema: [C: 032] varnishospital: distinguish between origin server and vcl id [puppet] - 10https://gerrit.wikimedia.org/r/420977 (https://phabricator.wikimedia.org/T174932) (owner: 10Ema) [10:10:37] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1065 from MW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420976 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:11:00] (03PS1) 10Marostegui: mariadb: Move db1065 to misc [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) [10:12:17] (03PS1) 10Elukey: profile::analytics::cluster::client: remove useless sudo for nrpe check [puppet] - 10https://gerrit.wikimedia.org/r/420980 (https://phabricator.wikimedia.org/T187073) [10:13:43] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler03/10551/" [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:14:28] (03PS3) 10Jcrespo: mariadb: Cleanup hosts that are no longer part of production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 [10:14:43] (03PS1) 10Marostegui: m1,s1.hosts: Move db1065 to m1 [software] - 10https://gerrit.wikimedia.org/r/420981 (https://phabricator.wikimedia.org/T183469) [10:15:22] (03CR) 10Elukey: [C: 032] profile::analytics::cluster::client: remove useless sudo for nrpe check [puppet] - 10https://gerrit.wikimedia.org/r/420980 (https://phabricator.wikimedia.org/T187073) (owner: 10Elukey) [10:15:35] (03PS2) 10Marostegui: mariadb: Move db1065 to misc [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) [10:15:39] (03CR) 10Marostegui: [C: 032] m1,s1.hosts: Move db1065 to m1 [software] - 10https://gerrit.wikimedia.org/r/420981 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:15:46] (03CR) 10Jcrespo: [C: 031] m1,s1.hosts: Move db1065 to m1 [software] - 10https://gerrit.wikimedia.org/r/420981 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:16:28] (03Merged) 10jenkins-bot: m1,s1.hosts: Move db1065 to m1 [software] - 10https://gerrit.wikimedia.org/r/420981 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:16:52] !log re-enabling puppet on einsteinium (icinga host) see T177253#4067901 [10:16:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:58] T177253: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253 [10:17:05] (03CR) 10Jcrespo: mariadb: Move db1065 to misc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:17:56] marostegui: https://gerrit.wikimedia.org/r/#/c/420981/1/dbtools/m1.hosts it has some typos it looks like "db1063.equad.wnbet" [10:18:44] (03CR) 10Jcrespo: [C: 031] "Other than the comment everthing good. Disclaimer- I normally do not remove hosts from dhcp." [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:20:17] PROBLEM - Check systemd state on einsteinium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:20:40] yeah I know, already on it [10:22:03] (03PS3) 10Marostegui: mariadb: Move db1065 to misc [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) [10:22:19] (03PS4) 10Jcrespo: mariadb: Cleanup hosts that are no longer part of production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 (https://phabricator.wikimedia.org/T134476) [10:23:05] Wiki13 thank you very much for the ping, it is indeed wrong [10:23:23] no problem :) [10:23:26] Wiki13 thanks a lot [10:23:34] I will fix it [10:23:41] where does that come from? [10:23:43] aaah the failover [10:23:50] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4067969 (10Joe) Since I'm still not sure we're going to actually merge videoscalers to jobrunners soon, I think I'll re-add two in row B, in place of 2 appservers. [10:24:03] (03CR) 10Marostegui: mariadb: Move db1065 to misc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:24:22] (03PS4) 10Marostegui: mariadb: Move db1065 to misc [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) [10:25:25] (03PS1) 10Jcrespo: dblists: Fix typo on db1063 [software] - 10https://gerrit.wikimedia.org/r/420982 [10:26:16] (03CR) 10Jcrespo: [C: 031] mariadb: Move db1065 to misc [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:26:34] (03CR) 10Marostegui: [C: 032] mariadb: Move db1065 to misc [puppet] - 10https://gerrit.wikimedia.org/r/420979 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:27:22] (03CR) 10Marostegui: [C: 031] dblists: Fix typo on db1063 [software] - 10https://gerrit.wikimedia.org/r/420982 (owner: 10Jcrespo) [10:27:26] hey, Wiki13 do you have a developer account? [10:27:36] no, I dont think so [10:27:50] I was going to add you as reviewer :-( [10:28:10] Wiki13: you can see the fix here https://gerrit.wikimedia.org/r/420982 [10:28:31] when was db1063 changed? was it yesterday? I think I missed it [10:28:40] ah yeah, I see the commit now [10:28:43] I probably rushed the change yesterday [10:28:50] after the failover [10:29:42] 10Operations, 10ORES, 10Scoring-platform-team (Current): Reboot oresrdb - https://phabricator.wikimedia.org/T189781#4067975 (10akosiaris) 05Open>03Resolved a:03akosiaris Indeed. Here it is https://wikitech.wikimedia.org/wiki/Incident_documentation/20180314-ORES. Anyway, let's close this for now and f... [10:29:46] yeah, the time matches [10:29:49] (can i login with wikitech to gerrit?)? cause in that case I have an account on wikitech [10:30:03] yes developper acount == account on wikitech [10:30:12] ah [10:30:33] if you have a nick, I can add you to the list of reviewers [10:30:46] and then you can vote +1 or -1, etc [10:30:52] same one as nick here [10:31:03] let me see [10:31:19] I see you now [10:31:20] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4067978 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` db1065.eqiad.wmnet ``` The lo... [10:31:39] so you can log in with same account credentials as wikitech and vote there [10:32:17] (03CR) 10Wiki13: [C: 031] dblists: Fix typo on db1063 [software] - 10https://gerrit.wikimedia.org/r/420982 (owner: 10Jcrespo) [10:32:25] thanks! [10:32:27] done :) [10:32:28] will merge now [10:32:32] (03PS1) 10Filippo Giunchedi: puppetmaster: blacklist potentially high cardinality mbeans [puppet] - 10https://gerrit.wikimedia.org/r/420983 (https://phabricator.wikimedia.org/T190252) [10:32:45] (03CR) 10Jcrespo: [C: 032] dblists: Fix typo on db1063 [software] - 10https://gerrit.wikimedia.org/r/420982 (owner: 10Jcrespo) [10:33:33] (03Merged) 10jenkins-bot: dblists: Fix typo on db1063 [software] - 10https://gerrit.wikimedia.org/r/420982 (owner: 10Jcrespo) [10:34:08] (03PS3) 10Urbanecm: Initial configuration for gorwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416930 (https://phabricator.wikimedia.org/T189109) [10:34:14] https://phabricator.wikimedia.org/T190260 [10:34:53] hey, yannf did that happen to you? what where you doing/viewing at the time? [10:35:13] oh, sorry, I just read " trying to undelete a file on Commons" [10:35:24] yes [10:35:26] (03PS3) 10Urbanecm: Initial configuration for euwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419171 [10:35:36] but it worked on 2nd try [10:36:03] (03PS1) 10Volans: Add deploy1001 AAAA records [dns] - 10https://gerrit.wikimedia.org/r/420985 [10:36:24] we have a problem with some images, that contain a lot of exif information [10:36:38] I have pinged performance team so they can have a look [10:36:50] thanks for the report! [10:37:09] (03CR) 10Filippo Giunchedi: [C: 031] Add deploy1001 AAAA records [dns] - 10https://gerrit.wikimedia.org/r/420985 (owner: 10Volans) [10:37:36] (03CR) 10Volans: "Because of Ia6bf8d40c3ee50db6ea33079605e84e09d053885 now ferm is trying to resolve deploy1001 AAAA record too, but is missing" [dns] - 10https://gerrit.wikimedia.org/r/420985 (owner: 10Volans) [10:37:56] !log rolling restart of elasticsearch on relforge to pick up OpenJDK security update [10:38:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:16] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db1065 from config - T183469 (duration: 01m 15s) [10:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:22] T183469: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469 [10:40:43] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db1065 from config - T183469 (duration: 01m 15s) [10:40:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:20] (03CR) 10Volans: [C: 032] Add deploy1001 AAAA records [dns] - 10https://gerrit.wikimedia.org/r/420985 (owner: 10Volans) [10:43:34] (03PS5) 10Jcrespo: mariadb: Cleanup hosts that are no longer part of production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 (https://phabricator.wikimedia.org/T134476) [10:43:38] (03PS2) 10Volans: Add missing reserved LVS IPs comments [dns] - 10https://gerrit.wikimedia.org/r/419799 [10:44:05] (03CR) 10Volans: [C: 032] Add missing reserved LVS IPs comments [dns] - 10https://gerrit.wikimedia.org/r/419799 (owner: 10Volans) [10:44:28] (03CR) 10Jcrespo: "I will start with this, will do further commits later." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 (https://phabricator.wikimedia.org/T134476) (owner: 10Jcrespo) [10:45:55] (03PS1) 10Ema: 5.1.3-1wm7: apply 'Fix issue #1799 for keep' upstream patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/420987 [10:46:44] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4068044 (10Marostegui) [10:47:34] !log rolling restart of elasticsearch on logstash to pick up OpenJDK security update [10:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:38] (03CR) 10Alexandros Kosiaris: [C: 031] network::constants: add deploy1001 as deployment server [puppet] - 10https://gerrit.wikimedia.org/r/420919 (https://phabricator.wikimedia.org/T175288) (owner: 10Dzahn) [10:48:44] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4026873 (10Marostegui) [10:49:39] (03PS2) 10Arturo Borrero Gonzalez: site.pp: put labmon1002 into work [puppet] - 10https://gerrit.wikimedia.org/r/420019 (https://phabricator.wikimedia.org/T189871) [10:51:20] !log Stop MySQL on db1016 to clone db1065 - T183469 [10:51:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:25] T183469: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469 [10:51:28] (03CR) 10Arturo Borrero Gonzalez: [C: 032] site.pp: put labmon1002 into work [puppet] - 10https://gerrit.wikimedia.org/r/420019 (https://phabricator.wikimedia.org/T189871) (owner: 10Arturo Borrero Gonzalez) [10:52:05] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4068065 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1065.eqiad.wmnet'] ``` and were **ALL** successful. [10:53:28] RECOVERY - Check systemd state on einsteinium is OK: OK - running: The system is fully operational [10:55:20] elukey: when you get a second, for your eyes https://gerrit.wikimedia.org/r/c/420983/ [10:56:56] and for your eyes only :D [10:58:32] godog: checking! [10:58:45] volans: haha no feel free to chime in too! [11:00:31] PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[prometheus] [11:01:07] (03CR) 10Ema: [C: 032] 5.1.3-1wm7: apply 'Fix issue #1799 for keep' upstream patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/420987 (owner: 10Ema) [11:06:32] (03CR) 10Elukey: [C: 031] puppetmaster: blacklist potentially high cardinality mbeans [puppet] - 10https://gerrit.wikimedia.org/r/420983 (https://phabricator.wikimedia.org/T190252) (owner: 10Filippo Giunchedi) [11:07:11] (03PS2) 10Filippo Giunchedi: puppetmaster: blacklist potentially high cardinality mbeans [puppet] - 10https://gerrit.wikimedia.org/r/420983 (https://phabricator.wikimedia.org/T190252) [11:08:24] (03CR) 10Filippo Giunchedi: [C: 032] puppetmaster: blacklist potentially high cardinality mbeans [puppet] - 10https://gerrit.wikimedia.org/r/420983 (https://phabricator.wikimedia.org/T190252) (owner: 10Filippo Giunchedi) [11:14:38] (03PS10) 10MarcoAurelio: beta: disable abusefilter from collecting user IP addresses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416346 (https://phabricator.wikimedia.org/T188862) [11:15:15] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4068156 (10Marostegui) db1065 is now replicating in m1. I will leave mysql on db1016 stopped [11:17:15] !log varnish 5.1.3-1wm7 uploaded to apt.w.o [11:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:10] (03CR) 10Jcrespo: [C: 032] mariadb: Cleanup hosts that are no longer part of production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 (https://phabricator.wikimedia.org/T134476) (owner: 10Jcrespo) [11:19:47] (03Abandoned) 10Urbanecm: Add several domains of Ukraine government to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405550 (https://phabricator.wikimedia.org/T185399) (owner: 10Urbanecm) [11:20:05] (03PS1) 10Giuseppe Lavagetto: codfw: assign roles to the new appservers [puppet] - 10https://gerrit.wikimedia.org/r/420990 (https://phabricator.wikimedia.org/T188301) [11:20:07] (03PS7) 10Urbanecm: Initial configuration for romdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/412902 (https://phabricator.wikimedia.org/T187184) [11:20:19] (03PS2) 10Urbanecm: Initial configuration for hiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417201 (https://phabricator.wikimedia.org/T188366) [11:20:24] (03PS2) 10Urbanecm: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420122 (https://phabricator.wikimedia.org/T189778) [11:20:40] (03Merged) 10jenkins-bot: mariadb: Cleanup hosts that are no longer part of production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 (https://phabricator.wikimedia.org/T134476) (owner: 10Jcrespo) [11:21:15] (03Draft1) 10MarcoAurelio: enable profiler and runtime profiler for eswikibooks abusefilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) [11:21:19] (03PS1) 10Marostegui: dbproxy100{1,6}: Change standby host [puppet] - 10https://gerrit.wikimedia.org/r/420991 (https://phabricator.wikimedia.org/T183469) [11:21:21] (03Draft2) 10MarcoAurelio: enable profiler and runtime profiler for eswikibooks abusefilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) [11:22:16] (03CR) 10Marostegui: [C: 04-2] "Let's wait till tomorrow to merge this and proceed with db1001 decom" [puppet] - 10https://gerrit.wikimedia.org/r/420991 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [11:24:50] (03PS3) 10MarcoAurelio: Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) [11:25:12] (03PS4) 10MarcoAurelio: Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) [11:27:35] (03PS6) 10Urbanecm: Initial configuration for lfnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400234 (https://phabricator.wikimedia.org/T183561) [11:29:33] !log jynus@tin Synchronized wmf-config/db-codfw.php: Cleanup old hosts (duration: 01m 13s) [11:29:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:52] error deploying at mw1276.eqiad.wmnet [11:31:05] will do the other deploy, then check mw1276 [11:32:07] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Cleanup old hosts (duration: 01m 18s) [11:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:23] !log cache_misc@esams: upgrade varnish to 5.1.3-1wm7 [11:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:05] (03CR) 10Hoo man: [C: 031] "> how do you want this to be deployed, btw? SWAT?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420336 (https://phabricator.wikimedia.org/T189776) (owner: 10Lucas Werkmeister (WMDE)) [11:33:20] !log rolling restart of Kibana/Logstash to pick up OpenJDK security update [11:33:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:29] see no errors on logs [11:33:53] at least no more than usual [11:39:42] (03CR) 10jenkins-bot: mariadb: Cleanup hosts that are no longer part of production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420978 (https://phabricator.wikimedia.org/T134476) (owner: 10Jcrespo) [11:41:02] (03CR) 10Jcrespo: [C: 031] dbproxy100{1,6}: Change standby host [puppet] - 10https://gerrit.wikimedia.org/r/420991 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [11:41:29] (03CR) 10Elukey: [C: 031] "Checked:" [puppet] - 10https://gerrit.wikimedia.org/r/420990 (https://phabricator.wikimedia.org/T188301) (owner: 10Giuseppe Lavagetto) [11:44:37] godog: labmon requires prometheus installed, but the package requires some Depends from jessie-backports, so apt-get fails (since it doesn't enable backports by default) any hint? [11:47:42] (03CR) 10Muehlenhoff: [C: 031] "Looks good, also doublechecked against the data in racktables." [puppet] - 10https://gerrit.wikimedia.org/r/420990 (https://phabricator.wikimedia.org/T188301) (owner: 10Giuseppe Lavagetto) [11:47:59] arturo: enabling jessie-backports should do the trick [11:48:26] though I thought we were enabling backports by default these days on >= jessie [11:48:47] shouldn't this be persisted in puppet somehow? otherwise the first puppet run on the server fails [11:49:14] yeah, jessie-backports is automatically considered if a binary package isn't avaialble in jessie itself [11:49:21] PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: CRITICAL - kubelet_operational_latencies is 33166 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [11:50:03] moritzm: that doesn't seem to be the case in labmon1002 [11:50:07] it's different for versioned dependencies, though, so if you need foo 2.0 and jessie has foo 1.0, it will not consider 2,0 unless explicitly hinted [11:50:20] RECOVERY - kubelet operational latencies on kubernetes1003 is OK: OK - kubelet_operational_latencies is 1083 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [11:50:27] yeah, in this case is about versioned depends [11:51:02] ok so labmon does have jessie-backports enabled but still fails due to versioned depends? [11:51:19] https://www.irccloud.com/pastebin/OPjPDEtg/ [11:51:28] you need to add an apt::pin in puppet for the packages you want to pull from jessie-backports [11:51:48] however [11:51:49] https://www.irccloud.com/pastebin/w62qD8pd/ [11:52:00] so, yes, probably a pinning or something is required [11:52:18] but it surprises me, is this the first case ever? :-) [11:52:54] no, there's plenty of apt::pin use cases in puppet.git [11:53:15] I mean, prometheus in jessie [11:54:08] (03PS1) 10Volans: Fix setup for puppet integration [software/puppetboard/deploy] - 10https://gerrit.wikimedia.org/r/420997 (https://phabricator.wikimedia.org/T184563) [11:54:09] it isn't, though we originally installed prometheus from jessie-backports and then upgraded to a version from jessie-wikimedia [11:54:09] the prometheus hosts in production are based on jessie, but they probably use differnet packages than what labmon uses [11:54:29] it is likely the first case of a reinstall though [11:54:37] ok, will try to craft a patch [11:55:06] but why setup the host with jessie at this point? better start with stretch at this point [11:55:55] (03PS1) 10Marostegui: db1065.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/420998 [11:56:25] although role::labmon does more than prometheus alone, so might entail a little more work [11:56:39] (03CR) 10Marostegui: [C: 032] db1065.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/420998 (owner: 10Marostegui) [11:56:56] we can't simply upgrade OS at will, even I would like to [12:01:43] (03PS3) 10Mark Bergsma: Fix Attribute.__eq__ and .__ne__ [debs/pybal] - 10https://gerrit.wikimedia.org/r/420119 [12:01:45] (03PS3) 10Mark Bergsma: Fix MPReachNLRIAttribute AFI_INET construction from tuple [debs/pybal] - 10https://gerrit.wikimedia.org/r/420120 [12:01:47] (03PS1) 10Mark Bergsma: Rename test class variable attributes to not start with 'test' [debs/pybal] - 10https://gerrit.wikimedia.org/r/420999 [12:02:01] 10Operations, 10Traffic, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Deprecate python varnish cachestats - https://phabricator.wikimedia.org/T184942#4068343 (10Vgutierrez) I just updated the [[ https://grafana.wikimedia.org/dashboard/db/prometheus-varnish-caching | Prometheus Varnish caching dashboard... [12:03:21] (03CR) 10Volans: [C: 031] "LGTM, one optional nitpick inline" (031 comment) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/413397 (owner: 10Muehlenhoff) [12:17:59] (03PS4) 10Muehlenhoff: Fix Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/420771 [12:19:11] (03CR) 10Muehlenhoff: [C: 032] Fix Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/420771 (owner: 10Muehlenhoff) [12:20:26] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4068432 (10Marostegui) [12:20:57] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4026946 (10Marostegui) [12:44:20] (03CR) 10Muehlenhoff: "Apart of the ordering, require_package has the additional benefit that it avoids duplicated definitions, if you use package in two classes" [puppet] - 10https://gerrit.wikimedia.org/r/420670 (owner: 10Muehlenhoff) [12:46:33] (03CR) 10Muehlenhoff: "This change extends existing sudo permissions for a group, so usually needs to be acknowledged in an ops meeting (despite this change bein" [puppet] - 10https://gerrit.wikimedia.org/r/408555 (owner: 10Hashar) [12:51:37] jouncebot, next [12:51:37] In 0 hour(s) and 8 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T1300) [12:52:32] (03PS3) 10Muehlenhoff: Add conftool::scripts to Prometheus servers [puppet] - 10https://gerrit.wikimedia.org/r/415328 [12:53:08] (03PS1) 10Arturo Borrero Gonzalez: prometheus: server: install depends packages from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/421006 (https://phabricator.wikimedia.org/T189871) [12:54:06] godog: https://gerrit.wikimedia.org/r/421006 [12:58:37] 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External): contint-admins sudo for service jenkins - https://phabricator.wikimedia.org/T190277#4068588 (10hashar) [12:58:45] (03PS3) 10Hashar: admin: contint-admins to restart Jenkins via systemd [puppet] - 10https://gerrit.wikimedia.org/r/408555 (https://phabricator.wikimedia.org/T190277) [12:58:52] (03CR) 10Muehlenhoff: [C: 032] Add conftool::scripts to Prometheus servers [puppet] - 10https://gerrit.wikimedia.org/r/415328 (owner: 10Muehlenhoff) [12:58:55] (03CR) 10Hashar: [C: 031] "Ticket filled T190277 Danke :]" [puppet] - 10https://gerrit.wikimedia.org/r/408555 (https://phabricator.wikimedia.org/T190277) (owner: 10Hashar) [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a European Mid-day SWAT(Max 8 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T1300). [13:00:04] marlier and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:24] (03Abandoned) 10Muehlenhoff: Update db cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/391530 (owner: 10Muehlenhoff) [13:00:25] oh, it's SWAT time [13:00:39] I can SWAT today [13:00:46] marlier and Urbanecm: around for SWAT? [13:00:51] 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Pybal should be able to advertise to multiple routers - https://phabricator.wikimedia.org/T180069#4068603 (10mark) 05Open>03Resolved a:03mark This is now in the latest PyBal releases, so resolving this ticket. [13:00:57] I'm here [13:01:44] zeljkof, I'm here [13:02:12] Urbanecm: your request for deployments is still not resolved? [13:02:22] marlier: do you want to deploy your commits yourself? [13:03:20] zeljkof, which request? You mean for deploy privs? [13:04:30] Urbanecm: yes [13:04:48] No, you must deploy, at least today :D [13:04:59] Urbanecm: no problem, just asking :) [13:05:11] zeljkof: I have no idea how to do that... If you don't mind, easier if you do it [13:05:31] marlier: this is all I know ;) https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers [13:05:53] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4068610 (10Joe) Some of the figures above were wrong, so here it is again, this time correctly counted from site.pp: |role|A|B|C|D |appserver |19|8|37|7| |api | 12|15|... [13:05:58] marlier: I can deploy, but in general (I think) developers should deploy their code, if possible [13:06:21] marlier: reviewing your commit [13:06:39] Urbanecm: I'll merge and deploy your first, since it's just a throttle [13:06:51] ack [13:07:02] zeljkof: I strongly agree! I just haven't had a chance to actually figure out how to do that yet. [13:07:28] marlier: there is just two commits today, I can lead you if you want to try ;) [13:08:05] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420122 (https://phabricator.wikimedia.org/T189778) (owner: 10Urbanecm) [13:09:14] Urbanecm: in the future, please use more descriptive commit messages [13:09:22] (03Merged) 10jenkins-bot: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420122 (https://phabricator.wikimedia.org/T189778) (owner: 10Urbanecm) [13:09:38] (03CR) 10jenkins-bot: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420122 (https://phabricator.wikimedia.org/T189778) (owner: 10Urbanecm) [13:09:42] (03CR) 10Filippo Giunchedi: [C: 031] prometheus: server: install depends packages from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/421006 (https://phabricator.wikimedia.org/T189871) (owner: 10Arturo Borrero Gonzalez) [13:09:42] Urbanecm: it does not have to be fancy, "throttle: hackathon at university of texas", something like that [13:09:57] zeljkof, what do not have to be fancy? [13:10:01] I don't understand you right now [13:10:18] Urbanecm: your current commit messages are mostly `New throttle rule` [13:10:34] it does not have to be 10 paragraphs (that is fancy I guess) [13:10:52] but something like `throttle: hackathon at university of texas` would be much better [13:10:58] As you want, will remember [13:11:33] Urbanecm: it's not just me :) there are guidelines for commit messages [13:11:37] let me find them... [13:12:02] Urbanecm: https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines [13:12:03] zeljkof: I need to drop offline relatively soon, so better if you handle it for the moment. [13:12:17] marlier: sure, you are next, reviewing your code [13:12:27] I have another patch going out later today, maybe I'll take that one [13:12:35] !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:420122|New throttle rule (T189778)]] (duration: 01m 16s) [13:12:38] feel free to ping me any time during swat (if it's not busy) for a quick intro :) [13:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:42] T189778: IP cap lift for Texas Library Association Workshop "Learn to Edit" on 2018-04-04. - https://phabricator.wikimedia.org/T189778 [13:12:47] Urbanecm: your patch is deployed [13:13:00] thx [13:13:44] (03PS2) 10Arturo Borrero Gonzalez: prometheus: server: install depends packages from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/421006 (https://phabricator.wikimedia.org/T189871) [13:14:03] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420909 (https://phabricator.wikimedia.org/T190229) (owner: 10Imarlier) [13:14:53] marlier: I will not be around for the other swat(s) today, but I am sure thcipriani would also be glad to help :) [13:15:17] Sweet [13:15:59] (03Merged) 10jenkins-bot: config: Enable testwiki NavTiming oversample for a bunch more countries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420909 (https://phabricator.wikimedia.org/T190229) (owner: 10Imarlier) [13:16:40] marlier: your patch is at mwdebug1002, please test and let me know if I can deploy [13:17:07] Sure, 1 minute [13:17:58] Looks good [13:18:04] marlier: ok, deploying [13:18:29] zeljkof: thanks! [13:19:22] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:420909|config: Enable testwiki NavTiming oversample for a bunch more countries (T190229)]] (duration: 01m 15s) [13:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:28] T190229: Enable additional countries for oversampling prior to Singapore - https://phabricator.wikimedia.org/T190229 [13:19:42] marlier: deployed, please check relevant logs [13:20:01] Urbanecm, marlier: looks like that was all for today, thanks for deploying with #releng ;) [13:20:37] !log EU SWAT finished [13:20:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:17] (03CR) 10jenkins-bot: config: Enable testwiki NavTiming oversample for a bunch more countries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420909 (https://phabricator.wikimedia.org/T190229) (owner: 10Imarlier) [13:23:16] !log ema@neodymium conftool action : set/pooled=yes; selector: name=cp3032.esams.wmnet,service=varnish-be [13:23:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:38] (03CR) 10Filippo Giunchedi: [C: 031] "Minor nit inline, really a matter of taste in this case" (031 comment) [software/puppetboard/deploy] - 10https://gerrit.wikimedia.org/r/420997 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [13:29:46] (03CR) 10Arturo Borrero Gonzalez: [C: 032] prometheus: server: install depends packages from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/421006 (https://phabricator.wikimedia.org/T189871) (owner: 10Arturo Borrero Gonzalez) [13:33:32] (03PS1) 10ArielGlenn: command to recombine page content xml files into one [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/421011 (https://phabricator.wikimedia.org/T179059) [13:36:13] (03CR) 10Ottomata: [C: 031] "1 nit but +1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/420383 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [13:37:09] 10Operations, 10Beta-Cluster-Infrastructure: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4068684 (10Niedzielski) It's consistently text04: ``` Request from 73.252.38.252 via deployment-cache-text04 deployment-cache-text04, Varnish XID 161469920 Error: 503, Backe... [13:38:10] (03CR) 10Ottomata: [C: 031] "NICE!" [puppet] - 10https://gerrit.wikimedia.org/r/420646 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [13:39:14] !log ema@neodymium conftool action : set/pooled=no; selector: name=cp3032.esams.wmnet,service=varnish-be [13:39:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:28] (03PS2) 10Volans: Fix setup for puppet integration [software/puppetboard/deploy] - 10https://gerrit.wikimedia.org/r/420997 (https://phabricator.wikimedia.org/T184563) [13:40:33] RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:41:03] (03CR) 10Elukey: [C: 032] Refactor stat1005's roles into role/profiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/420383 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [13:41:20] (03PS4) 10Elukey: Refactor the last bits of the Analytics code not following role/profile [puppet] - 10https://gerrit.wikimedia.org/r/420646 (https://phabricator.wikimedia.org/T167790) [13:43:05] (03CR) 10Ottomata: coal: Process from Kafka instead of from ZMQ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [13:43:42] PROBLEM - puppet last run on prometheus2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:43:47] (03PS3) 10Muehlenhoff: mediawiki: Remove unused python-pygments package [puppet] - 10https://gerrit.wikimedia.org/r/400458 (https://phabricator.wikimedia.org/T182851) (owner: 10Legoktm) [13:44:39] (03CR) 10Muehlenhoff: mediawiki: Remove unused python-pygments package (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/400458 (https://phabricator.wikimedia.org/T182851) (owner: 10Legoktm) [13:45:04] arturo: the puppet failure on prometheus2004 seems related to your change [13:45:32] PROBLEM - puppet last run on prometheus2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:45:52] cc govg [13:46:07] cc godog (sorry go.vg, bad autocomplete) [13:46:20] jouncebot: now [13:46:20] For the next 0 hour(s) and 13 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T1300) [13:46:29] is still swat running? [13:46:36] (03CR) 10Elukey: [C: 032] Refactor the last bits of the Analytics code not following role/profile [puppet] - 10https://gerrit.wikimedia.org/r/420646 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [13:46:37] I though it was one hour later [13:46:50] volans: do you have the error message? [13:47:21] arturo: yes, just look at /var/log/puppet.log in any of those, it's duplicate declaration [13:47:23] zeljkof: is still swat ongoing? [13:47:39] arturo: you added it to a define [13:47:40] Hauskatze: no [13:47:48] Hauskatze: something urgent? [13:47:58] zeljkof: I see we're still on time [13:47:59] that can be called multiple times in the same host [13:48:29] Hauskatze: I have closed swat, but there should be time for a patch if it's urgent [13:48:57] urgency is relative but this one is about user privacy so I'd say yes [13:48:59] oh I see, let me check if there is a quick fix, if not we can just revert [13:49:11] however I'll schedule it for later [13:49:12] PROBLEM - puppet last run on prometheus1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:49:37] Hauskatze: ok [13:50:01] (03PS7) 10Urbanecm: Initial configuration for lfnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400234 (https://phabricator.wikimedia.org/T183561) [13:50:05] (03CR) 10Volans: [C: 032] "Fixed comment, thanks for the review." (031 comment) [software/puppetboard/deploy] - 10https://gerrit.wikimedia.org/r/420997 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [13:50:12] (03PS8) 10Urbanecm: Initial configuration for romdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/412902 (https://phabricator.wikimedia.org/T187184) [13:50:16] (03CR) 10Volans: [V: 032 C: 032] Fix setup for puppet integration [software/puppetboard/deploy] - 10https://gerrit.wikimedia.org/r/420997 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [13:52:37] (03CR) 10Muehlenhoff: [C: 032] mediawiki: Remove unused python-pygments package [puppet] - 10https://gerrit.wikimedia.org/r/400458 (https://phabricator.wikimedia.org/T182851) (owner: 10Legoktm) [13:52:45] (03PS8) 10Urbanecm: Initial configuration for inhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402658 (https://phabricator.wikimedia.org/T184374) [13:53:03] (03PS4) 10Urbanecm: Initial configuration for gorwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416930 (https://phabricator.wikimedia.org/T189109) [13:53:07] (03PS3) 10Urbanecm: Initial configuration for hiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417201 (https://phabricator.wikimedia.org/T188366) [13:53:15] (03PS4) 10Muehlenhoff: mediawiki: Remove unused python-pygments package [puppet] - 10https://gerrit.wikimedia.org/r/400458 (https://phabricator.wikimedia.org/T182851) (owner: 10Legoktm) [13:53:19] (03PS4) 10Urbanecm: Initial configuration for euwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419171 [13:53:42] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/logrotate.d/hdfs_balancer] [13:54:02] (03PS1) 10Arturo Borrero Gonzalez: prometheus: server: remove apt pinning declaration [puppet] - 10https://gerrit.wikimedia.org/r/421014 (https://phabricator.wikimedia.org/T189871) [13:54:13] volans: please +1 https://gerrit.wikimedia.org/r/421014 [13:54:24] puppet broken on an1003 is me, fixing in a bit! [13:54:39] (03PS2) 10Arturo Borrero Gonzalez: prometheus: server: remove apt pinning declaration [puppet] - 10https://gerrit.wikimedia.org/r/421014 (https://phabricator.wikimedia.org/T189871) [13:57:53] (03CR) 10Volans: [C: 031] "LGTM, noop for labmon according to the compiler: https://puppet-compiler.wmflabs.org/compiler03/10554/" [puppet] - 10https://gerrit.wikimedia.org/r/421014 (https://phabricator.wikimedia.org/T189871) (owner: 10Arturo Borrero Gonzalez) [13:58:57] 10Operations, 10ORES, 10Scoring-platform-team (Current): Reboot oresrdb - https://phabricator.wikimedia.org/T189781#4068740 (10Halfak) Great! Thank you. [14:02:08] (03PS5) 10MarcoAurelio: Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) [14:03:23] (03CR) 10Arturo Borrero Gonzalez: [C: 032] prometheus: server: remove apt pinning declaration [puppet] - 10https://gerrit.wikimedia.org/r/421014 (https://phabricator.wikimedia.org/T189871) (owner: 10Arturo Borrero Gonzalez) [14:03:32] PROBLEM - puppet last run on prometheus1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:03:44] thanks volans! [14:03:44] (03CR) 10Muehlenhoff: [C: 031] Install parallel gzip (pigz) and parallel xz (pxz) on all servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/419709 (owner: 10Jcrespo) [14:04:01] np :) [14:06:00] puppet in prometheus2004 is back to normal [14:08:42] RECOVERY - puppet last run on prometheus2004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:09:15] (03CR) 10Jcrespo: [C: 031] Install parallel gzip (pigz) and parallel xz (pxz) on all servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/419709 (owner: 10Jcrespo) [14:10:00] (03CR) 10Jcrespo: [C: 031] "> So seems good from that point of view" [puppet] - 10https://gerrit.wikimedia.org/r/419709 (owner: 10Jcrespo) [14:11:48] (03PS3) 10Alexandros Kosiaris: Update mathoid chart to resemble current production [deployment-charts] - 10https://gerrit.wikimedia.org/r/420305 [14:11:50] (03PS1) 10Alexandros Kosiaris: Expose K8S_NODE_IP to the fluent-bit sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/421016 [14:12:27] (03Restored) 10Hashar: Fix nrpe spec for os_version() [puppet] - 10https://gerrit.wikimedia.org/r/419410 (owner: 10Hashar) [14:14:44] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#4068806 (10kchapman) Thanks @Anomie my information might be old. Moving to TechCom-RFC Inbox for discussion. [14:15:09] (03CR) 10Alexandros Kosiaris: Update mathoid chart to resemble current production (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/420305 (owner: 10Alexandros Kosiaris) [14:15:32] (03CR) 10Alexandros Kosiaris: [C: 032] "I 'll proceed as is and let's reevaluate soon." [deployment-charts] - 10https://gerrit.wikimedia.org/r/420305 (owner: 10Alexandros Kosiaris) [14:15:33] RECOVERY - puppet last run on prometheus2003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:15:34] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Update mathoid chart to resemble current production [deployment-charts] - 10https://gerrit.wikimedia.org/r/420305 (owner: 10Alexandros Kosiaris) [14:15:42] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Expose K8S_NODE_IP to the fluent-bit sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/421016 (owner: 10Alexandros Kosiaris) [14:17:03] PROBLEM - Request latencies on argon is CRITICAL: CRITICAL - apiserver_request_latencies is 21693379 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:17:39] 10Operations, 10Puppet, 10Patch-For-Review: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253#4068821 (10fgiunchedi) [14:17:43] 10Operations, 10Puppet, 10Patch-For-Review: Port puppetdb grafana dashboard to v4 metrics - https://phabricator.wikimedia.org/T190252#4068818 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi Ported the dashboard to pdb 4, the dashboard could use some more work but Good Enough (tm) for now. [14:19:12] RECOVERY - puppet last run on prometheus1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:21:03] RECOVERY - Request latencies on argon is OK: OK - apiserver_request_latencies is 6719 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:22:33] PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: CRITICAL - kubelet_operational_latencies is 15667 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [14:23:16] (03CR) 10Hashar: Fix nrpe spec for os_version() (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/419410 (owner: 10Hashar) [14:23:32] PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: CRITICAL - kubelet_operational_latencies is 27856 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [14:23:33] RECOVERY - kubelet operational latencies on kubernetes1001 is OK: OK - kubelet_operational_latencies is 2214 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [14:24:32] RECOVERY - kubelet operational latencies on kubernetes1002 is OK: OK - kubelet_operational_latencies is 2921 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [14:24:34] (03PS1) 10Elukey: profile::hadoop:balancer: fix logrotate source [puppet] - 10https://gerrit.wikimedia.org/r/421020 (https://phabricator.wikimedia.org/T167790) [14:24:49] (03CR) 10Volans: [C: 031] "I'm not sure if there is a better/cleaner way to do it within puppet specs, but the code looks reasonable to me given the goal." [puppet] - 10https://gerrit.wikimedia.org/r/419410 (owner: 10Hashar) [14:25:15] (03CR) 10Elukey: [C: 032] profile::hadoop:balancer: fix logrotate source [puppet] - 10https://gerrit.wikimedia.org/r/421020 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [14:27:53] volans: yup that is an ideal spec fix. Probably I should dig a bit more and fill an issue to upstream [14:28:01] but as you said it is good enough probably :] [14:28:42] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:33:32] RECOVERY - puppet last run on prometheus1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:36:25] 10Operations, 10Puppet, 10Patch-For-Review: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253#4068910 (10herron) [14:40:42] (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for nagios-nrpe-server [puppet] - 10https://gerrit.wikimedia.org/r/419400 (https://phabricator.wikimedia.org/T135991) [14:41:27] (03PS2) 10Daimona Eaytoy: Enable $wgAbuseFilterProfile on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420687 (https://phabricator.wikimedia.org/T190137) [14:42:05] (03PS3) 10Daimona Eaytoy: Enable AbuseFilter runtime profile on more Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420672 (https://phabricator.wikimedia.org/T175954) [14:43:08] 10Operations, 10Puppet, 10Patch-For-Review: Failover puppet ca service from eqiad to codfw - https://phabricator.wikimedia.org/T189891#4068946 (10herron) [14:45:46] (03CR) 10Andrew Bogott: Post-silver cleanups (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420805 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [14:48:36] James_F: I'm being dense but still don't understand your comment on https://gerrit.wikimedia.org/r/#/c/420805/3/wmf-config/InitialiseSettings.php. I feel like the thing you're suggesting I do is already done in that very patch so I must be misunderstanding. [14:53:02] (03PS1) 10Mforns: Add 2 fields to Print schema in EventLogging whitelist [puppet] - 10https://gerrit.wikimedia.org/r/421029 (https://phabricator.wikimedia.org/T190223) [14:54:41] (03CR) 10Mforns: "@elukey, this schema was missing the wiki and the webHost, I didn't think of them when I reviewed the previous related patch." [puppet] - 10https://gerrit.wikimedia.org/r/421029 (https://phabricator.wikimedia.org/T190223) (owner: 10Mforns) [14:54:50] (03PS1) 10Jcrespo: mariadb: Depool db1079 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421030 (https://phabricator.wikimedia.org/T181777) [14:55:50] (03PS2) 10Filippo Giunchedi: cache: depool puppetmaster1001 from config-master.w.o [puppet] - 10https://gerrit.wikimedia.org/r/420744 (https://phabricator.wikimedia.org/T184562) [14:55:52] (03PS1) 10Filippo Giunchedi: install_server: reinstall puppetmaster1001 with stretch [puppet] - 10https://gerrit.wikimedia.org/r/421031 (https://phabricator.wikimedia.org/T184562) [14:56:03] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Depool db1079 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421030 (https://phabricator.wikimedia.org/T181777) (owner: 10Jcrespo) [14:56:27] (03PS2) 10Jcrespo: mariadb: Depool db1079 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421030 (https://phabricator.wikimedia.org/T181777) [14:57:24] (03CR) 10Filippo Giunchedi: [C: 032] install_server: reinstall puppetmaster1001 with stretch [puppet] - 10https://gerrit.wikimedia.org/r/421031 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [14:57:32] (03CR) 10Elukey: [C: 032] Add 2 fields to Print schema in EventLogging whitelist [puppet] - 10https://gerrit.wikimedia.org/r/421029 (https://phabricator.wikimedia.org/T190223) (owner: 10Mforns) [14:57:38] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Depool db1079 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421030 (https://phabricator.wikimedia.org/T181777) (owner: 10Jcrespo) [14:57:40] (03PS2) 10Elukey: Add 2 fields to Print schema in EventLogging whitelist [puppet] - 10https://gerrit.wikimedia.org/r/421029 (https://phabricator.wikimedia.org/T190223) (owner: 10Mforns) [14:57:49] (03PS3) 10Jcrespo: mariadb: Depool db1079 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421030 (https://phabricator.wikimedia.org/T181777) [14:59:45] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1079 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421030 (https://phabricator.wikimedia.org/T181777) (owner: 10Jcrespo) [14:59:46] (03PS1) 10Elukey: profile::mariadb::misc::eventlogging: create dirs for maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/421033 (https://phabricator.wikimedia.org/T171203) [15:00:57] (03CR) 10Filippo Giunchedi: "LGTM, see minor inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/420731 (https://phabricator.wikimedia.org/T184923) (owner: 10Alexandros Kosiaris) [15:01:08] (03PS4) 10Muehlenhoff: Enable base::service_auto_restart for nagios-nrpe-server [puppet] - 10https://gerrit.wikimedia.org/r/419400 (https://phabricator.wikimedia.org/T135991) [15:01:30] (03Merged) 10jenkins-bot: mariadb: Depool db1079 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421030 (https://phabricator.wikimedia.org/T181777) (owner: 10Jcrespo) [15:01:44] (03CR) 10jenkins-bot: mariadb: Depool db1079 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421030 (https://phabricator.wikimedia.org/T181777) (owner: 10Jcrespo) [15:02:37] (03PS1) 10Volans: Puppetboard: adjust uwsgi config [puppet] - 10https://gerrit.wikimedia.org/r/421034 (https://phabricator.wikimedia.org/T184563) [15:03:30] (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for nagios-nrpe-server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/419400 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:03:39] (03PS5) 10Muehlenhoff: Enable base::service_auto_restart for nagios-nrpe-server [puppet] - 10https://gerrit.wikimedia.org/r/419400 (https://phabricator.wikimedia.org/T135991) [15:04:55] (03Abandoned) 10Muehlenhoff: Fix nrpe spec for os_version() [puppet] - 10https://gerrit.wikimedia.org/r/419410 (owner: 10Hashar) [15:05:02] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 01m 15s) [15:05:03] (03PS1) 10BryanDavis: wiki replicas: Update help for maintain-views --table [puppet] - 10https://gerrit.wikimedia.org/r/421035 [15:05:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:51] !log stop, upgrade and restart db1079 [15:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:47] (03CR) 10Filippo Giunchedi: [C: 031] ""LGTM"" [puppet] - 10https://gerrit.wikimedia.org/r/421034 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [15:07:26] (03PS2) 10Volans: Puppetboard: adjust uwsgi config [puppet] - 10https://gerrit.wikimedia.org/r/421034 (https://phabricator.wikimedia.org/T184563) [15:08:14] (03CR) 10Volans: [C: 032] Puppetboard: adjust uwsgi config [puppet] - 10https://gerrit.wikimedia.org/r/421034 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [15:08:42] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1079 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421037 [15:11:15] !log volans@tin Started deploy [puppetboard/deploy@81cd93a]: Adjust wsgi config - T184563 [15:11:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:21] T184563: Investigate landscape of PuppetDB Frontends and Provision One - https://phabricator.wikimedia.org/T184563 [15:11:22] !log volans@tin Finished deploy [puppetboard/deploy@81cd93a]: Adjust wsgi config - T184563 (duration: 00m 06s) [15:11:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:56] (03PS2) 10Andrew Bogott: Horizon: remove singular horizon_host hiera setting and all uses [puppet] - 10https://gerrit.wikimedia.org/r/420908 (https://phabricator.wikimedia.org/T168470) [15:17:15] (03PS3) 10Andrew Bogott: Horizon: remove singular horizon_host hiera setting and all uses [puppet] - 10https://gerrit.wikimedia.org/r/420908 (https://phabricator.wikimedia.org/T168470) [15:20:01] (03PS1) 10Jcrespo: mariadb: Repool db1079 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421040 [15:20:40] (03CR) 10Alexandros Kosiaris: prometheus: Add kubernetes pods jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/420731 (https://phabricator.wikimedia.org/T184923) (owner: 10Alexandros Kosiaris) [15:20:53] (03PS2) 10Alexandros Kosiaris: prometheus: Add kubernetes pods jobs [puppet] - 10https://gerrit.wikimedia.org/r/420731 (https://phabricator.wikimedia.org/T184923) [15:22:33] (03PS1) 10Marostegui: db-eqiad.php: Restore es1019 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421042 [15:22:35] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1079 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421040 (owner: 10Jcrespo) [15:22:39] (03PS2) 10Elukey: profile::mariadb::misc::eventlogging: create dirs for maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/421033 (https://phabricator.wikimedia.org/T171203) [15:22:56] (03CR) 10jenkins-bot: mariadb: Repool db1079 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421040 (owner: 10Jcrespo) [15:23:33] jouncebot: next [15:23:33] In 1 hour(s) and 36 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T1700) [15:23:39] (03PS1) 10Papaul: DNS: Add mgmt DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421044 (https://phabricator.wikimedia.org/T189633) [15:23:40] (03PS4) 10Andrew Bogott: Horizon: remove singular horizon_host hiera setting and all uses [puppet] - 10https://gerrit.wikimedia.org/r/420908 (https://phabricator.wikimedia.org/T168470) [15:23:43] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for systemd-timesyncd [puppet] - 10https://gerrit.wikimedia.org/r/421045 (https://phabricator.wikimedia.org/T135991) [15:23:48] (03PS2) 10Marostegui: db-eqiad.php: Restore es1019 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421042 [15:24:10] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM! I suspect we'll need to tweak this further but looks like a good start" [puppet] - 10https://gerrit.wikimedia.org/r/420731 (https://phabricator.wikimedia.org/T184923) (owner: 10Alexandros Kosiaris) [15:24:14] (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for systemd-timesyncd [puppet] - 10https://gerrit.wikimedia.org/r/421045 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:24:22] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1079 with low load (duration: 01m 15s) [15:24:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:37] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore es1019 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421042 (owner: 10Marostegui) [15:26:10] (03PS1) 10Ladsgroup: Remove editinterface right from templateeditors of fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421046 (https://phabricator.wikimedia.org/T190297) [15:26:41] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1079 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421037 [15:26:47] (03Merged) 10jenkins-bot: db-eqiad.php: Restore es1019 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421042 (owner: 10Marostegui) [15:27:39] (03PS5) 10Andrew Bogott: Horizon: remove singular horizon_host hiera setting and all uses [puppet] - 10https://gerrit.wikimedia.org/r/420908 (https://phabricator.wikimedia.org/T168470) [15:27:54] /w/win 22 [15:28:18] (03CR) 10jenkins-bot: db-eqiad.php: Restore es1019 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421042 (owner: 10Marostegui) [15:28:19] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool es1019 after socket location upgrade (duration: 01m 12s) [15:28:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:33] (03CR) 10Andrew Bogott: [C: 032] Horizon: remove singular horizon_host hiera setting and all uses [puppet] - 10https://gerrit.wikimedia.org/r/420908 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [15:28:42] !log ema@neodymium conftool action : set/pooled=yes; selector: name=cp3032.esams.wmnet,service=varnish-be [15:28:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:32] (03PS3) 10Jcrespo: Revert "mariadb: Depool db1079 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421037 [15:32:28] (03PS3) 10Elukey: profile::mariadb::misc::eventlogging: create dirs for maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/421033 (https://phabricator.wikimedia.org/T171203) [15:37:24] (03PS2) 10Andrew Bogott: m5: remove grants for Californium [puppet] - 10https://gerrit.wikimedia.org/r/413748 (https://phabricator.wikimedia.org/T168470) [15:38:24] (03CR) 10Alexandros Kosiaris: [C: 032] prometheus: Add kubernetes pods jobs [puppet] - 10https://gerrit.wikimedia.org/r/420731 (https://phabricator.wikimedia.org/T184923) (owner: 10Alexandros Kosiaris) [15:38:35] (03PS3) 10Alexandros Kosiaris: prometheus: Add kubernetes pods jobs [puppet] - 10https://gerrit.wikimedia.org/r/420731 (https://phabricator.wikimedia.org/T184923) [15:38:38] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] prometheus: Add kubernetes pods jobs [puppet] - 10https://gerrit.wikimedia.org/r/420731 (https://phabricator.wikimedia.org/T184923) (owner: 10Alexandros Kosiaris) [15:38:47] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "Thanks! merging" [puppet] - 10https://gerrit.wikimedia.org/r/420731 (https://phabricator.wikimedia.org/T184923) (owner: 10Alexandros Kosiaris) [15:39:18] PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:42:36] (03PS3) 10Andrew Bogott: m5: remove grants for Californium [puppet] - 10https://gerrit.wikimedia.org/r/413748 (https://phabricator.wikimedia.org/T168470) [15:42:51] (03CR) 10Andrew Bogott: [C: 032] "I also revoked these grants on the db." [puppet] - 10https://gerrit.wikimedia.org/r/413748 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [15:45:20] (03PS4) 10Andrew Bogott: m5: remove grants for Californium [puppet] - 10https://gerrit.wikimedia.org/r/413748 (https://phabricator.wikimedia.org/T189921) [15:45:22] (03PS1) 10Andrew Bogott: Remove californium from smart_health_wikimedia_labs [puppet] - 10https://gerrit.wikimedia.org/r/421048 (https://phabricator.wikimedia.org/T189921) [15:46:29] (03PS5) 10Andrew Bogott: m5: remove grants for Californium [puppet] - 10https://gerrit.wikimedia.org/r/413748 (https://phabricator.wikimedia.org/T189921) [15:46:43] 10Operations, 10ops-eqiad, 10DBA: db1061 (s6 master) disk with lots of predictive failure errors - https://phabricator.wikimedia.org/T190299#4069201 (10Marostegui) p:05Triage>03Normal [15:46:45] (03PS2) 10Andrew Bogott: Remove californium from smart_health_wikimedia_labs [puppet] - 10https://gerrit.wikimedia.org/r/421048 (https://phabricator.wikimedia.org/T189921) [15:46:59] PROBLEM - Request latencies on chlorine is CRITICAL: CRITICAL - apiserver_request_latencies is 21658636 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:47:36] (03CR) 10Andrew Bogott: [C: 032] Remove californium from smart_health_wikimedia_labs [puppet] - 10https://gerrit.wikimedia.org/r/421048 (https://phabricator.wikimedia.org/T189921) (owner: 10Andrew Bogott) [15:47:59] RECOVERY - Request latencies on chlorine is OK: OK - apiserver_request_latencies is 4083 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:48:18] RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational [15:48:54] !log ppchelko@tin Started deploy [cpjobqueue/deploy@0dcdc82]: Partition the refreshLinks topic by DB shard T189738 [15:48:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:00] T189738: Support per-db-shard concurrency in ChangeProp - https://phabricator.wikimedia.org/T189738 [15:49:39] 10Operations, 10ops-eqiad, 10DBA: db1061 (s6 master) disk with lots of predictive failure errors - https://phabricator.wikimedia.org/T190299#4069225 (10Marostegui) [15:50:07] 10Operations, 10Cloud-Services, 10DC-Ops, 10hardware-requests, 10Patch-For-Review: decom californium - https://phabricator.wikimedia.org/T189921#4069227 (10Andrew) [15:51:20] (03Draft1) 10MarcoAurelio: guwiki: fix $wg{Add,Remove}Groups configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421049 [15:51:23] (03PS2) 10MarcoAurelio: guwiki: fix $wg{Add,Remove}Groups configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421049 [15:51:56] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@0dcdc82]: Partition the refreshLinks topic by DB shard T189738 (duration: 03m 03s) [15:52:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:16] !log ppchelko@tin Started deploy [cpjobqueue/deploy@b291728]: Partition the refreshLinks topic by DB shard T189738 take 2 [15:53:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:56] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@b291728]: Partition the refreshLinks topic by DB shard T189738 take 2 (duration: 00m 40s) [15:54:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:02] T189738: Support per-db-shard concurrency in ChangeProp - https://phabricator.wikimedia.org/T189738 [15:55:15] 10Operations, 10ops-eqiad, 10DBA: db1061 (s6 master) disk with lots of predictive failure errors - https://phabricator.wikimedia.org/T190299#4069261 (10jcrespo) Ok to me, we should wait for chris to be around. [15:55:23] (03PS2) 10Mforns: Modify eventlogging purging script to read from YAML whitelist [puppet] - 10https://gerrit.wikimedia.org/r/420685 (https://phabricator.wikimedia.org/T189692) [15:56:35] (03PS1) 10Mark Bergsma: Fix testRepool test case for previously-down-but-pooled [debs/pybal] - 10https://gerrit.wikimedia.org/r/421051 [15:56:37] (03PS1) 10Mark Bergsma: Fix StubLVSService to use a set instead of a dict for .servers [debs/pybal] - 10https://gerrit.wikimedia.org/r/421052 [15:56:39] (03PS1) 10Mark Bergsma: Introduce server.is_pooled and make server.pooled usage more consistent [debs/pybal] - 10https://gerrit.wikimedia.org/r/421053 [15:57:02] 10Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 10Patch-For-Review: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4069269 (10RobH) a:05bmansurov>03None Ok, sorry for the delay on this, but it turns out everyone seems to have thought someon... [15:57:21] 10Operations, 10ops-eqiad, 10DBA: db1052 (s1 master) disk with lots of predictive failure errors - https://phabricator.wikimedia.org/T190301#4069271 (10Marostegui) p:05Triage>03Normal [15:58:00] (03PS2) 10RobH: admin: Grant bmansurov access to terbium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/419387 (https://phabricator.wikimedia.org/T189285) (owner: 10Vgutierrez) [16:00:10] (03PS3) 10RobH: admin: Grant bmansurov access to terbium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/419387 (https://phabricator.wikimedia.org/T189285) (owner: 10Vgutierrez) [16:00:54] 10Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 10Patch-For-Review: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4069296 (10RobH) I've gone ahead and modified https://gerrit.wikimedia.org/r/#/c/419387/ to have restricted, not deployers. This... [16:00:57] (03PS2) 10Giuseppe Lavagetto: codfw: assign roles to the new appservers [puppet] - 10https://gerrit.wikimedia.org/r/420990 (https://phabricator.wikimedia.org/T188301) [16:01:07] 10Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 10Patch-For-Review: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4069298 (10RobH) [16:01:40] 10Operations, 10ops-eqiad, 10DBA: db1054 (s2 master) disk with lots of predictive failure errors - https://phabricator.wikimedia.org/T190302#4069301 (10Marostegui) p:05Triage>03Normal [16:02:11] (03PS4) 10Andrew Bogott: Post-silver cleanups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420805 (https://phabricator.wikimedia.org/T168470) [16:02:13] (03PS1) 10Andrew Bogott: labtestwiki: use swift for images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421055 [16:02:31] (03CR) 10Giuseppe Lavagetto: [C: 032] codfw: assign roles to the new appservers [puppet] - 10https://gerrit.wikimedia.org/r/420990 (https://phabricator.wikimedia.org/T188301) (owner: 10Giuseppe Lavagetto) [16:03:02] <_joe_> please do not do mediawiki-config releases until I say it's ok :) [16:03:32] 10Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 10Patch-For-Review: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4069323 (10MarcoAurelio) Not an op, but as I said at T189285#4041129 I think `restricted` is better as it does not grant any depl... [16:03:35] 10Operations, 10ops-eqiad, 10DBA: db1062 (s7 master) disk with lots of predictive failure errors - https://phabricator.wikimedia.org/T190303#4069324 (10Marostegui) p:05Triage>03Normal [16:04:29] 10Operations, 10ops-eqiad, 10DBA: db1052 (s1 master) disks with lots of predictive failure errors - https://phabricator.wikimedia.org/T190301#4069340 (10Marostegui) [16:05:16] (03CR) 10Andrew Bogott: [C: 032] labtestwiki: use swift for images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421055 (owner: 10Andrew Bogott) [16:05:33] (03PS2) 10Mark Bergsma: Fix testRepool test case for previously-down-but-pooled [debs/pybal] - 10https://gerrit.wikimedia.org/r/421051 [16:05:35] 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#4069343 (10Marostegui) 05Open>03declined This host is no longer the master and will be decommissioned - T190179 [16:05:38] (03PS2) 10Mark Bergsma: Fix StubLVSService to use a set instead of a dict for .servers [debs/pybal] - 10https://gerrit.wikimedia.org/r/421052 [16:05:39] (03PS2) 10Mark Bergsma: Introduce server.is_pooled and make server.pooled usage more consistent [debs/pybal] - 10https://gerrit.wikimedia.org/r/421053 [16:06:37] (03Merged) 10jenkins-bot: labtestwiki: use swift for images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421055 (owner: 10Andrew Bogott) [16:07:00] (03CR) 10Mark Bergsma: [C: 031] Fix testRepool test case for previously-down-but-pooled [debs/pybal] - 10https://gerrit.wikimedia.org/r/421051 (owner: 10Mark Bergsma) [16:07:22] !log oblivian@puppetmaster2001 conftool action : set/pooled=inactive; selector: name=mw22(59|[6-9][0-9])\.codfw\.wmnet [16:07:23] <_joe_> andrewbogott: please do not scap that [16:07:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:31] <_joe_> for another couple minutes [16:07:37] _joe_: ok, let me know [16:07:44] <_joe_> andrewbogott: else you might see failures [16:08:10] (03CR) 10Mark Bergsma: [C: 031] Fix StubLVSService to use a set instead of a dict for .servers [debs/pybal] - 10https://gerrit.wikimedia.org/r/421052 (owner: 10Mark Bergsma) [16:08:22] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for systemd-journald [puppet] - 10https://gerrit.wikimedia.org/r/421058 (https://phabricator.wikimedia.org/T135991) [16:09:16] (03CR) 10jenkins-bot: labtestwiki: use swift for images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421055 (owner: 10Andrew Bogott) [16:09:28] <_joe_> andrewbogott: go on [16:09:34] <_joe_> all cleared [16:09:37] ok thanks [16:10:14] (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for systemd-journald [puppet] - 10https://gerrit.wikimedia.org/r/421058 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [16:10:55] !log andrew@tin Synchronized wmf-config/InitialiseSettings.php: labtestwikitech -> swift (duration: 01m 15s) [16:10:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:24] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for systemd-journald [puppet] - 10https://gerrit.wikimedia.org/r/421058 (https://phabricator.wikimedia.org/T135991) [16:11:46] (03CR) 10MarcoAurelio: "What is holding this? :-) Want some tests on production beforehand? Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [16:12:38] !log andrew@tin Synchronized wmf-config/filebackend.php: labtestwikitech -> swift (duration: 01m 14s) [16:12:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:17] 10Operations, 10Puppet, 10Patch-For-Review, 10User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4069362 (10fgiunchedi) Tomorrow we're going to reinstall puppetmaster1001, puppet traffic is already pointed away from it. After the CA/private f... [16:16:49] (03PS2) 10Arturo Borrero Gonzalez: new cert for *.tools.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/420762 (https://phabricator.wikimedia.org/T190182) (owner: 10RobH) [16:17:03] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for systemd-timesyncd [puppet] - 10https://gerrit.wikimedia.org/r/421045 (https://phabricator.wikimedia.org/T135991) [16:19:46] (03CR) 10Alexandros Kosiaris: [C: 031] Add switch to allow building images that match a glob pattern [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/420711 (https://phabricator.wikimedia.org/T186416) (owner: 10Giuseppe Lavagetto) [16:24:23] (03Abandoned) 10Filippo Giunchedi: Depool eqiad puppetmaster [dns] - 10https://gerrit.wikimedia.org/r/420733 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [16:24:58] (03PS3) 10Filippo Giunchedi: Move config-master to codfw [dns] - 10https://gerrit.wikimedia.org/r/420734 (https://phabricator.wikimedia.org/T184562) [16:25:00] (03PS1) 10Filippo Giunchedi: wmnet: point esams puppet to eqiad [dns] - 10https://gerrit.wikimedia.org/r/421060 (https://phabricator.wikimedia.org/T184562) [16:25:02] (03PS1) 10Filippo Giunchedi: Point wikimedia.org and eqiad puppet to eqiad [dns] - 10https://gerrit.wikimedia.org/r/421061 (https://phabricator.wikimedia.org/T184562) [16:25:19] PROBLEM - HHVM rendering on mw2235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:26:09] RECOVERY - HHVM rendering on mw2235 is OK: HTTP OK: HTTP/1.1 200 OK - 75096 bytes in 0.294 second response time [16:26:38] (03CR) 10Arturo Borrero Gonzalez: [C: 032] new cert for *.tools.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/420762 (https://phabricator.wikimedia.org/T190182) (owner: 10RobH) [16:28:40] PROBLEM - Check systemd state on mw2259 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:28:41] PROBLEM - Check systemd state on mw2273 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:28:41] PROBLEM - Check systemd state on mw2283 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:28:48] <_joe_> that's me installing those systems [16:28:50] PROBLEM - Check systemd state on mw2284 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:28:50] PROBLEM - Check systemd state on mw2260 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:28:50] PROBLEM - Check systemd state on mw2282 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:28:51] PROBLEM - Check systemd state on mw2269 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:29:00] PROBLEM - puppet last run on mw2259 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[jobrunner/jobrunner] [16:29:01] PROBLEM - Check systemd state on mw2268 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:29:02] <_joe_> I thought I wouldn't need to downtime them, I was wrong [16:29:10] PROBLEM - Check systemd state on mw2276 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:29:10] PROBLEM - Check systemd state on mw2278 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:29:10] PROBLEM - puppet last run on mw2279 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[jobrunner/jobrunner] [16:29:11] PROBLEM - Check systemd state on mw2270 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:29:11] PROBLEM - Check systemd state on mw2275 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:29:20] <_joe_> sorry for the spam [16:29:21] PROBLEM - Check systemd state on mw2279 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:30:36] _joe_: one trick that is new-ish, is that you can temporarilly disable alerts with a hiera key [16:30:57] we use it for all new hosts and works nicely [16:31:01] <_joe_> jynus: yeah this is a bit of a peculiar situation [16:31:11] <_joe_> machines were installed already, we just changed their role [16:31:25] yes, that works, too [16:31:35] <_joe_> so I just icinga-downtimed them [16:31:39] I see [16:31:55] yeah, the hiera key is more reliable because icinga will know instantly [16:32:09] on puppet run there [16:33:58] andrewbogott: Re. https://gerrit.wikimedia.org/r/#/c/420805/ I'm suggesting also editing wikitech.php to drop the "if" block as it's now always true. [16:34:00] RECOVERY - puppet last run on mw2259 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:34:20] James_F: like https://gerrit.wikimedia.org/r/#/c/420805/4/wmf-config/wikitech.php ? [16:34:22] andrewbogott: (Following the comment you're deleting at line 20037 of InitialiseSettings.) [16:34:46] _joe_: unrelated but I guess related- is it safe to do deploys right now? [16:35:01] <_joe_> jynus: yes, as I said to andrewbogott before [16:35:04] andrewbogott: Oh. Yes. Did you just add that or am I going mad? [16:35:11] sorry, I didn't see that [16:35:15] andrewbogott: Sorry! [16:35:27] James_F: No problem, I'm glad we agree about that part :) [16:35:43] * James_F grins. [16:37:27] (03CR) 10Andrew Bogott: [C: 032] Post-silver cleanups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420805 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [16:38:49] (03Merged) 10jenkins-bot: Post-silver cleanups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420805 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [16:39:10] RECOVERY - puppet last run on mw2279 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:39:14] (03CR) 10jenkins-bot: Post-silver cleanups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420805 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [16:42:57] !log andrew@tin Synchronized wmf-config/wikitech.php: first of many wikitech cleanups (duration: 03m 16s) [16:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:22] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4069474 (10ayounsi) [16:43:50] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [16:44:46] I guess that is not user-facing^ [16:44:57] (03PS1) 10Volans: Copy the wsgi.py script into the venv [software/puppetboard/deploy] - 10https://gerrit.wikimedia.org/r/421063 (https://phabricator.wikimedia.org/T184563) [16:45:06] (03PS1) 10Volans: Puppetboard: allow to connect to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) [16:45:09] (03PS4) 10Jcrespo: Revert "mariadb: Depool db1079 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421037 [16:45:48] (03CR) 10jerkins-bot: [V: 04-1] Puppetboard: allow to connect to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [16:46:11] (03PS1) 10Papaul: DNS: Add production DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) [16:46:17] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4069488 (10ayounsi) 1st part completed, server's specific configuration (loopback IP) is not puppetized yet. From outside: ``` $ ping ping-test.eqiad.wikimedia.org PING ping... [16:46:23] (03CR) 10Filippo Giunchedi: [C: 031] Copy the wsgi.py script into the venv [software/puppetboard/deploy] - 10https://gerrit.wikimedia.org/r/421063 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [16:46:27] !log andrew@tin Synchronized wmf-config/InitialiseSettings.php: one of many wikitech cleanups (duration: 03m 12s) [16:46:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:40] RECOVERY - Check systemd state on mw2259 is OK: OK - running: The system is fully operational [16:46:51] (03PS2) 10Volans: Puppetboard: allow to connect to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) [16:47:07] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1079 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421037 (owner: 10Jcrespo) [16:47:35] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install ms-be204[0-3] - https://phabricator.wikimedia.org/T189633#4069513 (10Papaul) [16:47:47] (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for systemd-journald [puppet] - 10https://gerrit.wikimedia.org/r/421058 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [16:47:52] (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for systemd-timesyncd [puppet] - 10https://gerrit.wikimedia.org/r/421045 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [16:48:11] andrewbogott: have you synced eqiad.php yet? [16:48:15] !log andrew@tin Synchronized wmf-config/CommonSettings.php: one of many wikitech cleanups (duration: 01m 38s) [16:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:20] jynus: not yet [16:48:30] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1079 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421037 (owner: 10Jcrespo) [16:48:32] can I deploy another change? [16:48:37] jynus: sure [16:48:38] and take care of checking errors [16:48:45] (03CR) 10Volans: [V: 032 C: 032] Copy the wsgi.py script into the venv [software/puppetboard/deploy] - 10https://gerrit.wikimedia.org/r/421063 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [16:48:48] I'll just log off of tin and leave this to you :) [16:48:53] As long as you'll do codfw too [16:49:10] what is the msg for codfw? [16:49:15] I can take care [16:49:25] Post-silver cleanups? [16:49:29] this is all the same stuff, just cleaning up old silver refs [16:49:30] yep [16:49:31] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4069519 (10ayounsi) [16:49:36] thank you! [16:49:47] only db-{eqiad,codfw} left, right? [16:50:21] yep [16:50:46] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1079 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421037 (owner: 10Jcrespo) [16:50:55] doing, thank you [16:51:37] !log jynus@tin Synchronized wmf-config/db-codfw.php: Post-silver cleanup (duration: 01m 03s) [16:51:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:18] (03CR) 10Volans: "Compiler results here: https://puppet-compiler.wmflabs.org/compiler03/10561/puppetdb1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [16:53:00] <_joe_> !log running systemd-tmpfiles --create on the new appservers [16:53:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:24] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1079 fully, post-silver cleanup (duration: 01m 14s) [16:53:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:34] (03PS3) 10Mark Bergsma: Introduce server.is_pooled and make server.pooled usage more consistent [debs/pybal] - 10https://gerrit.wikimedia.org/r/421053 [16:55:02] there is some noise due to nutcracked on codfw, but it seems all is noise [16:56:31] <_joe_> jynus: it is, see discussion in -security [16:57:08] <_joe_> I can't seem to pull from gerrit ops/puppet at the moment [16:57:13] (03Abandoned) 10Mark Bergsma: Set hhvm.server.request_timeout_seconds to 60s [puppet] - 10https://gerrit.wikimedia.org/r/326144 (https://phabricator.wikimedia.org/T97192) (owner: 10Mark Bergsma) [16:58:07] <_joe_> anyone has any luck with that? [16:58:29] 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): contint-admins sudo for service jenkins - https://phabricator.wikimedia.org/T190277#4069554 (10RobH) As @MoritzMuehlenhoff pointed out on the patch comment h... [16:58:49] _joe_: wfm [16:59:02] <_joe_> uhm, let me check my network then :P [17:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Morning SWAT (Max 8 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T1700). [17:00:04] subbu, RoanKattouw, James_F, marlier, MaxSem, framawiki, and Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:18] o/ [17:00:20] <_joe_> and ofc, as soon as I start using mtr, the issue goes away [17:00:26] Hey. [17:00:28] (03PS1) 10Andrew Bogott: labtestwikitech: remove prometheus monitoring of this db [puppet] - 10https://gerrit.wikimedia.org/r/421068 [17:00:42] Here [17:01:24] (03CR) 10Andrew Bogott: [C: 032] labtestwikitech: remove prometheus monitoring of this db [puppet] - 10https://gerrit.wikimedia.org/r/421068 (owner: 10Andrew Bogott) [17:01:41] (03CR) 10Imarlier: coal: Process from Kafka instead of from ZMQ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [17:01:50] (03PS1) 10Giuseppe Lavagetto: site.pp: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/421069 [17:02:16] (03PS2) 10Giuseppe Lavagetto: site.pp: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/421069 [17:02:31] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] site.pp: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/421069 (owner: 10Giuseppe Lavagetto) [17:02:46] (03PS1) 10Bstorm: cloud novaproxy: Set a custom logrotate config for nginx [puppet] - 10https://gerrit.wikimedia.org/r/421070 (https://phabricator.wikimedia.org/T190218) [17:04:25] 10Operations, 10HHVM, 10Patch-For-Review: Long running mediawiki web requests impacts service availability, specially databases - https://phabricator.wikimedia.org/T149421#4069592 (10jcrespo) Partially mitigated at T160984#3209072 by setting up a query killer at database side- but that is far from ideal beca... [17:04:33] I'll do the deed [17:04:35] o/ [17:04:38] anyone swatting? [17:04:45] RECOVERY - Check systemd state on mw2279 is OK: OK - running: The system is fully operational [17:04:57] (03PS2) 10MaxSem: Enable RemexHtml on all wikiversity wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419611 (https://phabricator.wikimedia.org/T188880) (owner: 10Subramanya Sastry) [17:05:02] (03CR) 10MaxSem: [C: 032] Enable RemexHtml on all wikiversity wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419611 (https://phabricator.wikimedia.org/T188880) (owner: 10Subramanya Sastry) [17:06:07] (03PS3) 10Volans: Puppetboard: allow to connect to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) [17:06:20] (03Merged) 10jenkins-bot: Enable RemexHtml on all wikiversity wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419611 (https://phabricator.wikimedia.org/T188880) (owner: 10Subramanya Sastry) [17:06:47] !log ppchelko@tin Started deploy [cpjobqueue/deploy@545cb61]: Increase refreshLinks concurrency to 20 per partition [17:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:56] RECOVERY - Check systemd state on mw2283 is OK: OK - running: The system is fully operational [17:07:13] (03PS1) 10Papaul: DHCP: Add MAC address entries for ms-be204[0-3] [puppet] - 10https://gerrit.wikimedia.org/r/421072 (https://phabricator.wikimedia.org/T189633) [17:07:23] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@545cb61]: Increase refreshLinks concurrency to 20 per partition (duration: 00m 37s) [17:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:07:35] subbu, pulled on mwdebug1002, please test [17:07:48] k [17:08:39] (03CR) 10Mark Bergsma: [C: 032] Rename test class variable attributes to not start with 'test' [debs/pybal] - 10https://gerrit.wikimedia.org/r/420999 (owner: 10Mark Bergsma) [17:08:41] (03PS2) 10Rduran: Create tests skeleton [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/420746 [17:09:04] (03CR) 10jenkins-bot: Enable RemexHtml on all wikiversity wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419611 (https://phabricator.wikimedia.org/T188880) (owner: 10Subramanya Sastry) [17:09:11] (03Merged) 10jenkins-bot: Rename test class variable attributes to not start with 'test' [debs/pybal] - 10https://gerrit.wikimedia.org/r/420999 (owner: 10Mark Bergsma) [17:09:20] MaxSem, ok, lgtm as long as there are no errors in logs. [17:09:35] RECOVERY - Check systemd state on mw2275 is OK: OK - running: The system is fully operational [17:09:54] (03PS4) 10Volans: Puppetboard: allow to connect to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) [17:10:16] RECOVERY - Check systemd state on labtestweb2001 is OK: OK - running: The system is fully operational [17:10:39] RoanKattouw: yt? [17:11:15] RECOVERY - Check systemd state on mw2269 is OK: OK - running: The system is fully operational [17:11:18] Yes sorry [17:11:29] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/419611/ (duration: 01m 15s) [17:11:32] (03PS2) 10MaxSem: Disable Flow on tswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420799 (https://phabricator.wikimedia.org/T188815) (owner: 10Catrope) [17:11:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:11:35] RECOVERY - Check systemd state on mw2278 is OK: OK - running: The system is fully operational [17:11:41] subbu: ^ [17:12:04] alright, thanks. [17:12:05] RECOVERY - Check systemd state on mw2282 is OK: OK - running: The system is fully operational [17:12:30] (03CR) 10MaxSem: [C: 032] Disable Flow on tswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420799 (https://phabricator.wikimedia.org/T188815) (owner: 10Catrope) [17:12:46] (03CR) 10Volans: "More safe hiera-based approach, compiler results at:" [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [17:12:56] RECOVERY - Check systemd state on mw2273 is OK: OK - running: The system is fully operational [17:13:41] (03Merged) 10jenkins-bot: Disable Flow on tswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420799 (https://phabricator.wikimedia.org/T188815) (owner: 10Catrope) [17:13:44] <_joe_> volans: eww, why did you use regex.yaml? [17:14:11] <_joe_> create a separate role instead [17:14:30] (03PS1) 10Catrope: Increase cache size for osm2pgsql import [puppet] - 10https://gerrit.wikimedia.org/r/421074 (https://phabricator.wikimedia.org/T190110) [17:14:38] (03CR) 10jenkins-bot: Disable Flow on tswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420799 (https://phabricator.wikimedia.org/T188815) (owner: 10Catrope) [17:14:46] _joe_: I didn't, it's temporary and will go away [17:15:07] <_joe_> volans: like puppet_major_version? [17:15:28] RoanKattouw: pulled on mwdebug1002 [17:15:35] RECOVERY - Check systemd state on mw2270 is OK: OK - running: The system is fully operational [17:15:45] RECOVERY - Check systemd state on mw2268 is OK: OK - running: The system is fully operational [17:16:16] MaxSem: lgtm [17:16:22] Once deployed I'll have to run namespaceDupes [17:16:33] (03CR) 10Filippo Giunchedi: [C: 031] Puppetboard: allow to connect to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [17:17:06] RECOVERY - Check systemd state on mw2260 is OK: OK - running: The system is fully operational [17:17:19] (03CR) 10Volans: [C: 032] Puppetboard: allow to connect to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/421064 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [17:17:35] RECOVERY - Check systemd state on mw2276 is OK: OK - running: The system is fully operational [17:17:40] (03CR) 10Dmaza: [C: 031] Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) (owner: 10MarcoAurelio) [17:17:48] !log maxsem@tin Synchronized dblists/flow.dblist: https://gerrit.wikimedia.org/r/#/c/420799/ (duration: 01m 12s) [17:17:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:04] RoanKattouw: ^ [17:19:04] 10Operations, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): implement renewed *.tools.wmflabs.org cert/key pair - https://phabricator.wikimedia.org/T190182#4069692 (10aborrero) 05Open>03Resolved [17:19:06] RECOVERY - Check systemd state on mw2284 is OK: OK - running: The system is fully operational [17:19:22] (03PS3) 10MaxSem: Enable wgCiteResponsiveReferences on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419528 (https://phabricator.wikimedia.org/T189658) (owner: 10Esanders) [17:19:24] (03CR) 10MaxSem: [C: 032] Enable wgCiteResponsiveReferences on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419528 (https://phabricator.wikimedia.org/T189658) (owner: 10Esanders) [17:19:26] (03PS1) 10Andrew Bogott: labtestweb: more hiera overrides [puppet] - 10https://gerrit.wikimedia.org/r/421076 [17:19:28] (03PS2) 10Andrew Bogott: labtestweb: more hiera overrides [puppet] - 10https://gerrit.wikimedia.org/r/421076 [17:19:38] (03CR) 10jerkins-bot: [V: 04-1] labtestweb: more hiera overrides [puppet] - 10https://gerrit.wikimedia.org/r/421076 (owner: 10Andrew Bogott) [17:19:44] Thanks MaxSem [17:20:08] (03Merged) 10jenkins-bot: Enable wgCiteResponsiveReferences on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419528 (https://phabricator.wikimedia.org/T189658) (owner: 10Esanders) [17:20:52] (03CR) 10jenkins-bot: Enable wgCiteResponsiveReferences on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419528 (https://phabricator.wikimedia.org/T189658) (owner: 10Esanders) [17:21:09] (03PS3) 10Andrew Bogott: labtestweb: more hiera overrides [puppet] - 10https://gerrit.wikimedia.org/r/421076 [17:21:29] (03CR) 10MaxSem: "I'm not confident about the alias removal part. Is it hurting anything? How many links will be affected?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418070 (https://phabricator.wikimedia.org/T189277) (owner: 10Framawiki) [17:21:54] (03CR) 10Andrew Bogott: [C: 032] labtestweb: more hiera overrides [puppet] - 10https://gerrit.wikimedia.org/r/421076 (owner: 10Andrew Bogott) [17:22:29] !log volans@tin Started deploy [puppetboard/deploy@d6514d6]: Adjust wsgi config - T184563 [17:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:35] !log volans@tin Finished deploy [puppetboard/deploy@d6514d6]: Adjust wsgi config - T184563 (duration: 00m 06s) [17:22:35] T184563: Investigate landscape of PuppetDB Frontends and Provision One - https://phabricator.wikimedia.org/T184563 [17:22:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:57] James_F: pulled on mwdebug1002 [17:23:16] Kk. [17:24:52] MaxSem: Yup, LGTM. [17:25:16] (03PS2) 10MaxSem: config: enable NavTiming oversample in a bunch of countries as default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420910 (https://phabricator.wikimedia.org/T190229) (owner: 10Imarlier) [17:26:14] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/419528/ (duration: 01m 15s) [17:26:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:26:21] James_F: ^ [17:26:26] Thanks MaxSem. [17:26:36] (03CR) 10MaxSem: [C: 032] config: enable NavTiming oversample in a bunch of countries as default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420910 (https://phabricator.wikimedia.org/T190229) (owner: 10Imarlier) [17:27:53] (03Merged) 10jenkins-bot: config: enable NavTiming oversample in a bunch of countries as default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420910 (https://phabricator.wikimedia.org/T190229) (owner: 10Imarlier) [17:28:19] marlier: pulled on mwdebug, please test [17:28:22] Looking [17:28:46] MaxSem: Looks good, go for it. [17:29:02] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4069764 (10Cmjohnson) A case has been opened. Your case was successfully submitted. Please note your Case ID: 5328012773 for future reference. [17:29:30] (03PS2) 10MaxSem: Redeploy GlobalPreferences to test wikis and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420947 (https://phabricator.wikimedia.org/T189806) [17:29:42] <_joe_> MaxSem: can you ping me once you're done with SWAT? [17:29:59] <_joe_> I have 32 new appservers to add to the mix and I'd tend to do it out of deployment windows [17:30:29] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/420910/ (duration: 01m 16s) [17:30:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:31:01] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [17:31:12] _joe_: affirmative [17:31:13] _joe_: ooo, fun [17:31:20] Thanks, MaxSem [17:31:49] (03CR) 10MaxSem: [C: 032] Redeploy GlobalPreferences to test wikis and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420947 (https://phabricator.wikimedia.org/T189806) (owner: 10MaxSem) [17:31:51] <_joe_> greg-g: I just need to issue two commands, but I just want to avoid ending in any inconsistent state by chance [17:32:22] (03PS1) 10MusikAnimal: Enable PageAssessments on trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421080 (https://phabricator.wikimedia.org/T184969) [17:32:35] (03CR) 10jenkins-bot: config: enable NavTiming oversample in a bunch of countries as default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420910 (https://phabricator.wikimedia.org/T190229) (owner: 10Imarlier) [17:33:00] (03Merged) 10jenkins-bot: Redeploy GlobalPreferences to test wikis and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420947 (https://phabricator.wikimedia.org/T189806) (owner: 10MaxSem) [17:33:35] _joe_: yeah, just "moar hardware" fun :) [17:34:05] <_joe_> oh the "hardware" part was addressed by others, I'm just reaping the rewards of their hard work :P [17:35:34] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/420947/ (duration: 01m 15s) [17:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:52] (03PS1) 10BBlack: Revert "Increase varnish probe interval to 1s" [puppet] - 10https://gerrit.wikimedia.org/r/421081 [17:35:54] (03PS1) 10BBlack: Revert "bump vcl reload delay by 3.6s" [puppet] - 10https://gerrit.wikimedia.org/r/421082 [17:36:01] :) [17:36:03] framawiki: yt? [17:36:10] 10Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 10Patch-For-Review: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4069778 (10Dzahn) I think the best option is using `maintenance-log-readers`. maintenance-log-readers just give shell on exactly... [17:36:15] (03PS19) 10Imarlier: coal: Process from Kafka instead of from ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) [17:36:35] (03CR) 10BBlack: [C: 032] Revert "Increase varnish probe interval to 1s" [puppet] - 10https://gerrit.wikimedia.org/r/421081 (owner: 10BBlack) [17:36:41] ok, next [17:36:56] (03PS2) 10MaxSem: Remove editinterface right from templateeditors of fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421046 (https://phabricator.wikimedia.org/T190297) (owner: 10Ladsgroup) [17:37:00] (03CR) 10MaxSem: [C: 032] Remove editinterface right from templateeditors of fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421046 (https://phabricator.wikimedia.org/T190297) (owner: 10Ladsgroup) [17:38:21] (03Merged) 10jenkins-bot: Remove editinterface right from templateeditors of fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421046 (https://phabricator.wikimedia.org/T190297) (owner: 10Ladsgroup) [17:38:23] (03CR) 10jenkins-bot: Redeploy GlobalPreferences to test wikis and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420947 (https://phabricator.wikimedia.org/T189806) (owner: 10MaxSem) [17:38:36] (03CR) 10jenkins-bot: Remove editinterface right from templateeditors of fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421046 (https://phabricator.wikimedia.org/T190297) (owner: 10Ladsgroup) [17:38:56] Amir1: pulled on mwdebug1002 [17:39:11] MaxSem: testing [17:39:15] o/ [17:39:21] sorry [17:41:51] MaxSem: works, please proceed [17:42:11] MaxSem: the next one is not testable and it's only on test wikidata [17:42:31] (03PS3) 10MaxSem: Disable reading wb_terms search fields on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420336 (https://phabricator.wikimedia.org/T189776) (owner: 10Lucas Werkmeister (WMDE)) [17:42:36] (03PS1) 10Filippo Giunchedi: puppetmaster: blacklist more high cardinality mbeans [puppet] - 10https://gerrit.wikimedia.org/r/421085 (https://phabricator.wikimedia.org/T190252) [17:43:33] (03PS2) 10Filippo Giunchedi: puppetmaster: blacklist more high cardinality mbeans [puppet] - 10https://gerrit.wikimedia.org/r/421085 (https://phabricator.wikimedia.org/T190252) [17:43:35] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/421046/ (duration: 01m 15s) [17:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:43] (03CR) 10MaxSem: [C: 032] Disable reading wb_terms search fields on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420336 (https://phabricator.wikimedia.org/T189776) (owner: 10Lucas Werkmeister (WMDE)) [17:44:00] (03CR) 10Framawiki: "Nothing says that something need to be done with alias." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418070 (https://phabricator.wikimedia.org/T189277) (owner: 10Framawiki) [17:44:30] (03CR) 10Filippo Giunchedi: [C: 032] puppetmaster: blacklist more high cardinality mbeans [puppet] - 10https://gerrit.wikimedia.org/r/421085 (https://phabricator.wikimedia.org/T190252) (owner: 10Filippo Giunchedi) [17:44:58] (03Merged) 10jenkins-bot: Disable reading wb_terms search fields on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420336 (https://phabricator.wikimedia.org/T189776) (owner: 10Lucas Werkmeister (WMDE)) [17:45:12] (03CR) 10jenkins-bot: Disable reading wb_terms search fields on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420336 (https://phabricator.wikimedia.org/T189776) (owner: 10Lucas Werkmeister (WMDE)) [17:46:44] !log maxsem@tin Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/#/c/420336/ (duration: 01m 15s) [17:46:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:56] Amir1: ^ [17:47:06] Thank you! [17:47:25] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler03/10560/" [puppet] - 10https://gerrit.wikimedia.org/r/421033 (https://phabricator.wikimedia.org/T171203) (owner: 10Elukey) [17:47:36] framawiki: https://gerrit.wikimedia.org/r/#/c/420397/ needs a rebase [17:47:59] (03PS2) 10BBlack: Revert "bump vcl reload delay by 3.6s" [puppet] - 10https://gerrit.wikimedia.org/r/421082 [17:51:24] (03PS1) 10Filippo Giunchedi: puppetmaster: fix jmx_exporter mbeans blacklist [puppet] - 10https://gerrit.wikimedia.org/r/421086 (https://phabricator.wikimedia.org/T190252) [17:51:44] (03CR) 10Filippo Giunchedi: [C: 032] puppetmaster: fix jmx_exporter mbeans blacklist [puppet] - 10https://gerrit.wikimedia.org/r/421086 (https://phabricator.wikimedia.org/T190252) (owner: 10Filippo Giunchedi) [17:51:53] (03PS2) 10Filippo Giunchedi: puppetmaster: fix jmx_exporter mbeans blacklist [puppet] - 10https://gerrit.wikimedia.org/r/421086 (https://phabricator.wikimedia.org/T190252) [17:51:56] argh missed it in the review :( [17:52:31] ups, a little merge skirmish going on there [17:52:59] (03PS3) 10MaxSem: Change NS aliases on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418070 (https://phabricator.wikimedia.org/T189277) (owner: 10Framawiki) [17:53:02] 10Operations, 10netops: Update BGP_sanitize_in filter - https://phabricator.wikimedia.org/T190317#4069851 (10ayounsi) p:05Triage>03Normal [17:53:48] 10Operations, 10Puppet: remove puppet_major_version and puppetdb_major_version variables - https://phabricator.wikimedia.org/T190318#4069868 (10herron) p:05Triage>03Normal [17:53:51] (03PS2) 10Framawiki: New throttling exception for 2018-03-22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420397 (https://phabricator.wikimedia.org/T189796) [17:54:11] (03PS4) 10Elukey: profile::mariadb::misc::eventlogging: create dirs for maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/421033 (https://phabricator.wikimedia.org/T171203) [17:54:43] MaxSem: https://gerrit.wikimedia.org/r/420397 rebased [17:54:51] (03CR) 10jerkins-bot: [V: 04-1] New throttling exception for 2018-03-22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420397 (https://phabricator.wikimedia.org/T189796) (owner: 10Framawiki) [17:54:54] (03CR) 10Elukey: [C: 032] profile::mariadb::misc::eventlogging: create dirs for maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/421033 (https://phabricator.wikimedia.org/T171203) (owner: 10Elukey) [17:55:05] (03CR) 10MaxSem: "Yeah, but deficient docs doesn't mean that common sense needn't be used:)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418070 (https://phabricator.wikimedia.org/T189277) (owner: 10Framawiki) [17:56:06] (03PS3) 10BBlack: Revert "bump vcl reload delay by 3.6s" [puppet] - 10https://gerrit.wikimedia.org/r/421082 [17:56:10] (03CR) 10BBlack: [V: 032 C: 032] Revert "bump vcl reload delay by 3.6s" [puppet] - 10https://gerrit.wikimedia.org/r/421082 (owner: 10BBlack) [17:56:11] framawiki: now it has a syntax error [17:57:09] (03PS3) 10Framawiki: New throttling exception for 2018-03-22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420397 (https://phabricator.wikimedia.org/T189796) [17:58:06] yeah, with a closing bracket is better :) [17:58:46] MaxSem: feel free to refuse the problematic patch, it is not urgent [17:58:53] (03CR) 10MaxSem: [C: 032] New throttling exception for 2018-03-22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420397 (https://phabricator.wikimedia.org/T189796) (owner: 10Framawiki) [17:59:25] it's not necessarily bad, I just wanted to discuss this [17:59:47] and we don't have time for it during today's pretty packed window anyway [17:59:55] (03PS1) 10BBlack: eqsin+zero fallback [puppet] - 10https://gerrit.wikimedia.org/r/421088 (https://phabricator.wikimedia.org/T189250) [18:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T1800) [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:33] (03Merged) 10jenkins-bot: New throttling exception for 2018-03-22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420397 (https://phabricator.wikimedia.org/T189796) (owner: 10Framawiki) [18:00:48] (03CR) 10jenkins-bot: New throttling exception for 2018-03-22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420397 (https://phabricator.wikimedia.org/T189796) (owner: 10Framawiki) [18:01:02] (03PS1) 10BBlack: geo-maps: mark eqsin+zero issues, split out OC [dns] - 10https://gerrit.wikimedia.org/r/421089 (https://phabricator.wikimedia.org/T189250) [18:01:04] framawiki: pulled on mwdebug1002 [18:01:58] (03CR) 10BBlack: [C: 032] geo-maps: mark eqsin+zero issues, split out OC [dns] - 10https://gerrit.wikimedia.org/r/421089 (https://phabricator.wikimedia.org/T189250) (owner: 10BBlack) [18:02:30] !log delete obsolete metrics from prometheus following https://gerrit.wikimedia.org/r/c/421086 [18:02:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:11] <_joe_> MaxSem: still not done? [18:03:21] 1 sec [18:03:22] MaxSem: i can't verify throttle changes at my side :) [18:03:43] yeah, but I care only that the wikis don't explode :P [18:03:51] which they don't [18:04:06] good check of course [18:05:31] !log maxsem@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/420397/ (duration: 01m 15s) [18:05:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:41] framawiki: ^ [18:05:47] thanks ! [18:06:00] 10Operations, 10ops-codfw, 10netops: Switch port configuration for ms-be204[0-3] - https://phabricator.wikimedia.org/T190322#4069950 (10Papaul) p:05Triage>03Normal [18:06:21] aaaand, we're done [18:06:26] _joe_: all yours [18:07:19] <_joe_> MaxSem: thanks [18:08:55] <_joe_> !log pooling all the new codfw appservers [18:09:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:43] !log oblivian@puppetmaster2001 conftool action : set/pooled=yes; selector: name=mw22(59|[6-9][0-9])\.codfw\.wmnet [18:09:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:54] MaxSem: ah, https://noc.wikimedia.org/conf/highlight.php?file=throttle.php was not yet updated ? It's not instantaneous ? [18:10:34] (03PS2) 10BBlack: eqsin+zero fallback [puppet] - 10https://gerrit.wikimedia.org/r/421088 (https://phabricator.wikimedia.org/T189250) [18:11:17] dunno how it works. poking on a random appserver confirms the patch is live [18:11:40] ok, thanks for the check [18:12:34] (03CR) 10BBlack: [C: 032] eqsin+zero fallback [puppet] - 10https://gerrit.wikimedia.org/r/421088 (https://phabricator.wikimedia.org/T189250) (owner: 10BBlack) [18:15:13] (03PS1) 10Urbanecm: Change bewikibooks logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421093 (https://phabricator.wikimedia.org/T189218) [18:17:25] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install ms-be204[0-3] - https://phabricator.wikimedia.org/T189633#4069996 (10RobH) [18:17:28] 10Operations, 10ops-codfw, 10netops: Switch port configuration for ms-be204[0-3] - https://phabricator.wikimedia.org/T190322#4069994 (10RobH) 05Open>03Resolved all four systems have had their switch port descriptions updated, enabled, and placed into the private vlan for each row. [18:21:06] (03PS4) 10RobH: admin: Grant bmansurov access to terbium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/419387 (https://phabricator.wikimedia.org/T189285) (owner: 10Vgutierrez) [18:21:40] 10Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 10Patch-For-Review: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4070011 (10RobH) >>! In T189285#4069778, @Dzahn wrote: > I think the best option is using `maintenance-log-readers`. maintenance-... [18:23:05] (03CR) 10Dzahn: [C: 031] admin: Grant bmansurov access to terbium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/419387 (https://phabricator.wikimedia.org/T189285) (owner: 10Vgutierrez) [18:23:32] (03PS2) 10Chad: Gerrit: Swap git auth to HTTP_LDAP [puppet] - 10https://gerrit.wikimedia.org/r/410474 [18:24:26] 10Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 10Patch-For-Review: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4070018 (10Dzahn) ACK, +1 on the updated patch [18:24:51] 10Operations, 10Analytics, 10DBA, 10EventBus, and 7 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4070023 (10Pchelolo) [18:24:54] (03PS3) 10Chad: Gerrit: Allow enabling of tls encryption for SMTP [puppet] - 10https://gerrit.wikimedia.org/r/406145 [18:26:26] (03CR) 10Paladox: [C: 031] Gerrit: Allow enabling of tls encryption for SMTP [puppet] - 10https://gerrit.wikimedia.org/r/406145 (owner: 10Chad) [18:26:42] (03PS2) 10Chad: Run initSiteStats twice a month [puppet] - 10https://gerrit.wikimedia.org/r/415066 (https://phabricator.wikimedia.org/T59788) [18:27:13] (03CR) 10Paladox: [C: 031] Gerrit: Swap git auth to HTTP_LDAP [puppet] - 10https://gerrit.wikimedia.org/r/410474 (owner: 10Chad) [18:27:21] (03PS1) 10Rush: openstack: ml2 settings and allow disabling l3 agent [puppet] - 10https://gerrit.wikimedia.org/r/421094 (https://phabricator.wikimedia.org/T188266) [18:27:49] 10Operations, 10netops: Implement BGP graceful shutdown - https://phabricator.wikimedia.org/T190323#4070028 (10ayounsi) [18:28:25] (03PS2) 10Dzahn: network::constants: add deploy1001 as deployment server [puppet] - 10https://gerrit.wikimedia.org/r/420919 (https://phabricator.wikimedia.org/T175288) [18:28:45] (03PS2) 10Rush: openstack: ml2 settings and allow disabling l3 agent [puppet] - 10https://gerrit.wikimedia.org/r/421094 (https://phabricator.wikimedia.org/T188266) [18:28:47] (03CR) 10Paladox: [C: 031] Abstract Gerrit's public key out of gerrit::jetty [puppet] - 10https://gerrit.wikimedia.org/r/416201 (owner: 10Chad) [18:29:41] (03CR) 10Dzahn: [C: 032] "host deploy1001.eqiad.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/420919 (https://phabricator.wikimedia.org/T175288) (owner: 10Dzahn) [18:30:26] (03CR) 10Rush: [C: 032] openstack: ml2 settings and allow disabling l3 agent [puppet] - 10https://gerrit.wikimedia.org/r/421094 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [18:30:35] (03PS3) 10Rush: openstack: ml2 settings and allow disabling l3 agent [puppet] - 10https://gerrit.wikimedia.org/r/421094 (https://phabricator.wikimedia.org/T188266) [18:30:43] (03CR) 10Chad: [C: 032] scap prep: Scap-ify the creation of beta's StartProfiler.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416334 (https://phabricator.wikimedia.org/T180766) (owner: 10Krinkle) [18:31:24] 10Puppet, 10cloud-services-team (Kanban): role::puppet::self referenced in puppet_ssldir.rb - https://phabricator.wikimedia.org/T187622#4070049 (10Joe) the reference in `puppet_ssldir.rb` is to a hiera lookup, and is harmless. You can safely remove the class; we can then cleanup this function once that class i... [18:32:05] (03Merged) 10jenkins-bot: scap prep: Scap-ify the creation of beta's StartProfiler.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416334 (https://phabricator.wikimedia.org/T180766) (owner: 10Krinkle) [18:32:20] (03CR) 10jenkins-bot: scap prep: Scap-ify the creation of beta's StartProfiler.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416334 (https://phabricator.wikimedia.org/T180766) (owner: 10Krinkle) [18:33:17] (03CR) 10Chad: [C: 032] beta: disable abusefilter from collecting user IP addresses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416346 (https://phabricator.wikimedia.org/T188862) (owner: 10MarcoAurelio) [18:33:59] 10Operations, 10Analytics, 10DBA, 10EventBus, and 7 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4070065 (10Pchelolo) The change that partitioned the `refreshLinks` topic in line with MySQL sharding has been deployed. Now we just need to wait... [18:34:21] !log demon@tin Synchronized scap/plugins/prep.py: consistency (duration: 01m 17s) [18:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:26] (03PS9) 10Chad: hieradata: add redis stretch deployment-prep instances [puppet] - 10https://gerrit.wikimedia.org/r/386869 (https://phabricator.wikimedia.org/T179371) (owner: 10Filippo Giunchedi) [18:34:28] (03Merged) 10jenkins-bot: beta: disable abusefilter from collecting user IP addresses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416346 (https://phabricator.wikimedia.org/T188862) (owner: 10MarcoAurelio) [18:36:45] !log demon@tin Synchronized wmf-config/CommonSettings-labs.php: beta-only (duration: 01m 16s) [18:36:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:36:53] (03PS2) 10Andrew Bogott: remove role::puppet::self [puppet] - 10https://gerrit.wikimedia.org/r/411615 (https://phabricator.wikimedia.org/T182810) [18:37:07] (03PS2) 10Andrew Bogott: remove 'puppet' module [puppet] - 10https://gerrit.wikimedia.org/r/411616 (https://phabricator.wikimedia.org/T182810) [18:37:38] (03CR) 10Andrew Bogott: [C: 032] remove role::puppet::self [puppet] - 10https://gerrit.wikimedia.org/r/411615 (https://phabricator.wikimedia.org/T182810) (owner: 10Andrew Bogott) [18:37:48] (03CR) 10jenkins-bot: beta: disable abusefilter from collecting user IP addresses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416346 (https://phabricator.wikimedia.org/T188862) (owner: 10MarcoAurelio) [18:37:53] 10Operations, 10Analytics, 10DBA, 10EventBus, and 7 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4070079 (10mobrovac) Relevant dashboards to monitor (for posterity): - [MySQL open connections](https://grafana-admin.wikimedia.org/dashboard/db/... [18:38:41] (03CR) 10Andrew Bogott: [C: 032] remove 'puppet' module [puppet] - 10https://gerrit.wikimedia.org/r/411616 (https://phabricator.wikimedia.org/T182810) (owner: 10Andrew Bogott) [18:41:44] _joe_: I see some fatals from mw22[67].* - is that you? [18:42:40] <_joe_> MaxSem: uhm let me check [18:42:41] (03PS1) 10Rush: openstack: stop managing l3-agent external ip manually [puppet] - 10https://gerrit.wikimedia.org/r/421102 (https://phabricator.wikimedia.org/T188266) [18:43:01] (03PS2) 10Rush: openstack: stop managing l3-agent external ip manually [puppet] - 10https://gerrit.wikimedia.org/r/421102 (https://phabricator.wikimedia.org/T188266) [18:43:13] e.g. mw2262 require_once(/srv/mediawiki/docroot/m.wikipedia.org/w/../multiversion/MWMultiVersion.php): File not found in /srv/mediawiki/docroot/m.wikipedia.org/w/mobilelanding.php on line 2 [18:43:37] <_joe_> yes, that would be me, but that should happen on all codfw hosts [18:43:39] <_joe_> uhm [18:44:02] (03CR) 10Rush: [C: 032] openstack: stop managing l3-agent external ip manually [puppet] - 10https://gerrit.wikimedia.org/r/421102 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [18:44:11] <_joe_> that's requesting https://zero.wikipedia.org [18:44:24] monitoring, prolly [18:44:45] <_joe_> which we broke since I created the url list I use for validation of new servers [18:45:15] (03PS1) 10Chad: group1 to wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421103 [18:45:57] (03CR) 10MarcoAurelio: "PHP Notice: Undefined variable: wmgUseAbuseFilter in /srv/mediawiki/wmf-config/CommonSettings-labs.php on line 63" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416346 (https://phabricator.wikimedia.org/T188862) (owner: 10MarcoAurelio) [18:49:04] (03PS1) 10Elukey: role::eventlogging::analytics::server: allow deployment-prep usage [puppet] - 10https://gerrit.wikimedia.org/r/421104 [18:51:45] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler03/10566/ - no op in prod" [puppet] - 10https://gerrit.wikimedia.org/r/421104 (owner: 10Elukey) [18:58:20] (03PS2) 10Elukey: role::eventlogging::analytics::server: allow deployment-prep usage [puppet] - 10https://gerrit.wikimedia.org/r/421104 [19:00:04] no_justification: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T1900). [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:00:44] (03CR) 10Elukey: "no op again https://puppet-compiler.wmflabs.org/compiler03/10567/" [puppet] - 10https://gerrit.wikimedia.org/r/421104 (owner: 10Elukey) [19:02:20] (03PS3) 10Elukey: role::eventlogging::analytics::server: allow deployment-prep usage [puppet] - 10https://gerrit.wikimedia.org/r/421104 [19:03:13] !log Deleted some 12-year-old open proxy blocks to resolve T189840. [19:03:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:19] T189840: Cannot create a user with no name, no ID, and no actor ID - https://phabricator.wikimedia.org/T189840 [19:03:36] (03CR) 10Ottomata: [C: 031] "+1 to patchset 2 :p" [puppet] - 10https://gerrit.wikimedia.org/r/421104 (owner: 10Elukey) [19:03:57] (03PS1) 10Chad: Revert "beta: disable abusefilter from collecting user IP addresses" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421105 [19:04:06] (03PS4) 10Elukey: role::eventlogging::analytics::server: allow deployment-prep usage [puppet] - 10https://gerrit.wikimedia.org/r/421104 [19:04:11] (03CR) 10Chad: [V: 032 C: 032] Revert "beta: disable abusefilter from collecting user IP addresses" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421105 (owner: 10Chad) [19:04:34] (03CR) 10Chad: [C: 032] group1 to wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421103 (owner: 10Chad) [19:05:45] !log demon@tin Synchronized wmf-config/CommonSettings-labs.php: rvv (duration: 01m 15s) [19:05:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:00] (03Merged) 10jenkins-bot: group1 to wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421103 (owner: 10Chad) [19:06:05] (03CR) 10Elukey: [C: 032] role::eventlogging::analytics::server: allow deployment-prep usage [puppet] - 10https://gerrit.wikimedia.org/r/421104 (owner: 10Elukey) [19:08:48] (03CR) 10jenkins-bot: Revert "beta: disable abusefilter from collecting user IP addresses" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421105 (owner: 10Chad) [19:09:14] !log demon@tin Synchronized php: symlink bump (duration: 01m 15s) [19:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:12:05] !log demon@tin rebuilt and synchronized wikiversions files: group1 to wmf.26 [19:12:23] (03PS1) 10Thcipriani: Add blubber to docker integration agents [puppet] - 10https://gerrit.wikimedia.org/r/421108 (https://phabricator.wikimedia.org/T186548) [19:12:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:20] (03PS1) 10Elukey: role::eventlogging::analytics::mysql: allow a labs config [puppet] - 10https://gerrit.wikimedia.org/r/421110 [19:19:21] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4070187 (10mobrovac) [19:19:41] (03PS1) 10Volans: Puppetboard: use Puppet CA to verify SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/421111 (https://phabricator.wikimedia.org/T184563) [19:19:44] 10Operations, 10Deployments, 10Beta-Cluster-reproducible, 10HHVM, and 2 others: Switch mwscript from Zend PHP5 to default php alternative (e.g. HHVM or PHP7) - https://phabricator.wikimedia.org/T146285#4070188 (10demon) a:05demon>03None [19:20:14] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler03/10569/ - no op in prod" [puppet] - 10https://gerrit.wikimedia.org/r/421110 (owner: 10Elukey) [19:20:23] (03CR) 10Elukey: [C: 032] role::eventlogging::analytics::mysql: allow a labs config [puppet] - 10https://gerrit.wikimedia.org/r/421110 (owner: 10Elukey) [19:22:59] (03CR) 10Elukey: [C: 032] role::eventlogging::analytics::mysql: allow a labs config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/421110 (owner: 10Elukey) [19:23:43] (03PS1) 10Elukey: role::eventlogging::analytics::mysql: follow up previous commit [puppet] - 10https://gerrit.wikimedia.org/r/421113 [19:25:22] (03PS2) 10Elukey: role::eventlogging::analytics::mysql: follow up previous commit [puppet] - 10https://gerrit.wikimedia.org/r/421113 [19:27:17] PROBLEM - Disk space on restbase-dev1004 is CRITICAL: DISK CRITICAL - free space: / 824 MB (3% inode=94%) [19:27:19] (03CR) 10Herron: [C: 031] Puppetboard: use Puppet CA to verify SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/421111 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [19:27:21] (03CR) 10Volans: "compiler results: https://puppet-compiler.wmflabs.org/compiler03/10571/puppetboard1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/421111 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [19:27:30] 10Puppet, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove role::puppet::self and related support code - https://phabricator.wikimedia.org/T182810#4070229 (10Andrew) [19:27:49] 10Puppet, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove role::puppet::self and related support code - https://phabricator.wikimedia.org/T182810#3835105 (10Andrew) 05Open>03Resolved a:03Andrew [19:27:53] 10Operations, 10Cloud-Services, 10Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#4070232 (10Andrew) [19:28:18] (03CR) 10Elukey: [C: 032] role::eventlogging::analytics::mysql: follow up previous commit [puppet] - 10https://gerrit.wikimedia.org/r/421113 (owner: 10Elukey) [19:28:21] 10Puppet, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Set up puppet exported resources to collect ssh host keys for beta - https://phabricator.wikimedia.org/T72792#4070238 (10Andrew) [19:28:28] 10Operations, 10Cloud-Services, 10Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#1846901 (10Andrew) 05Open>03Resolved a:03Andrew [19:28:59] (03PS2) 10Volans: Puppetboard: use Puppet CA to verify SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/421111 (https://phabricator.wikimedia.org/T184563) [19:29:51] (03CR) 10Volans: [C: 032] Puppetboard: use Puppet CA to verify SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/421111 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [19:31:49] 10Puppet, 10Cloud-Services: Make changing puppetmasters for Labs instances more easy - https://phabricator.wikimedia.org/T152941#4070253 (10Andrew) 05Open>03declined The new arrangement with puppetmaster::standalone is quite a bit better. Changing the puppetmaster is a single hiera setting, and the subseq... [19:31:58] PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:32:17] RECOVERY - Disk space on restbase-dev1004 is OK: DISK OK [19:32:54] known ^ [19:33:00] (03CR) 10Andrew Bogott: [C: 031] cloud novaproxy: Set a custom logrotate config for nginx [puppet] - 10https://gerrit.wikimedia.org/r/421070 (https://phabricator.wikimedia.org/T190218) (owner: 10Bstorm) [19:33:54] 10Operations, 10Cloud-Services, 10hardware-requests, 10Patch-For-Review, 10cloud-services-team (Kanban): decom silver (was silver has trouble rebooting) - https://phabricator.wikimedia.org/T168559#4070261 (10Andrew) [19:40:07] RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational [19:49:37] (03PS1) 10Andrew Bogott: wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) [19:50:31] (03PS2) 10Andrew Bogott: wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) [19:51:46] (03CR) 10jerkins-bot: [V: 04-1] wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [19:53:15] (03PS2) 10Dzahn: add deployment_server role to deploy1001 [puppet] - 10https://gerrit.wikimedia.org/r/420912 (https://phabricator.wikimedia.org/T175288) [19:54:47] (03PS3) 10Andrew Bogott: wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) [19:55:40] (03CR) 10Dzahn: "yep, thank you!. i was planning to do the same and just didn't get to it yet" [dns] - 10https://gerrit.wikimedia.org/r/420985 (owner: 10Volans) [19:56:05] (03CR) 10jerkins-bot: [V: 04-1] wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [19:56:10] (03Abandoned) 10Jforrester: Show 'Publish' not 'Save' on final Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337531 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [19:56:17] (03PS4) 10Andrew Bogott: wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) [19:56:20] (03CR) 10Dzahn: [C: 032] add deployment_server role to deploy1001 [puppet] - 10https://gerrit.wikimedia.org/r/420912 (https://phabricator.wikimedia.org/T175288) (owner: 10Dzahn) [19:56:28] mutante: yw, it was required, ferm was complaining ;) [19:57:14] (03CR) 10Kaldari: [C: 031] Enable PageAssessments on trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421080 (https://phabricator.wikimedia.org/T184969) (owner: 10MusikAnimal) [19:57:16] volans: oh! makes sense. i didn't get the order right then because the "mapped" line was already on the node [19:57:23] (03CR) 10jerkins-bot: [V: 04-1] wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [19:57:44] not ferm on the host, ferm on einsteinium to allow the connection [19:57:54] it does a resolve A and AAAA of the list of hosts in hiera [19:58:06] *nod*.. i see, yea [19:58:28] ok, thanks [19:58:34] just FYI :) [19:58:52] (03PS5) 10Andrew Bogott: wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) [19:58:57] is always fun to add IPv6 PTRs :-p [19:59:01] i'm adding the deployment_server role to it now.. doesnt mean i am switching what the current deployment_server is though [19:59:08] that is another change [19:59:31] ack, I guess you'll send an email too beforehand :D [19:59:32] hehe, yea, i usually copy an existing one in the same subnet and then just edit the first (last) digits [19:59:51] yes, i will and also releng [20:00:00] i gave a little warning that it is coming [20:00:05] cscott, arlolra, subbu, bearND, halfak, and Amir1: How many deployers does it take to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T2000). [20:00:05] No GERRIT patches in the queue for this window AFAICS. [20:01:39] (03PS1) 10Smalyshev: Revert "Don't use Cirrus for wbsearchentities on beta Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421119 [20:01:41] there might be some icinga checks coming up for deploy1001.. because it is getting all the things now [20:01:50] i'm watching it [20:01:52] (03CR) 10jerkins-bot: [V: 04-1] Revert "Don't use Cirrus for wbsearchentities on beta Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421119 (owner: 10Smalyshev) [20:02:21] deployers can ignore that..it's still tin right now [20:03:21] (03PS2) 10Smalyshev: Revert "Don't use Cirrus for wbsearchentities on beta Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421119 [20:06:45] (03PS1) 10Gergő Tisza: Add jseditor to privileged groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421122 (https://phabricator.wikimedia.org/T190015) [20:06:47] (03PS1) 10Gergő Tisza: Temporarily preserve sysops' JS editing ability [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421123 (https://phabricator.wikimedia.org/T190015) [20:06:49] (03PS1) 10Gergő Tisza: Remove edituserjs from existing groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421124 (https://phabricator.wikimedia.org/T190015) [20:06:51] (03PS1) 10Gergő Tisza: Enforce that jseditor is the only group that can edit sitewide JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) [20:07:59] (03CR) 10jerkins-bot: [V: 04-1] Enforce that jseditor is the only group that can edit sitewide JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [20:08:25] (03PS2) 10Dzahn: DNS: Add mgmt DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421044 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [20:09:00] swift codfw new servers on the way ^ [20:09:22] (03PS1) 10Andrew Bogott: labtestweb: add striker config [puppet] - 10https://gerrit.wikimedia.org/r/421126 (https://phabricator.wikimedia.org/T190001) [20:09:24] (03CR) 10Dzahn: [C: 032] DNS: Add mgmt DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421044 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [20:10:01] (03CR) 10Andrew Bogott: [C: 032] labtestweb: add striker config [puppet] - 10https://gerrit.wikimedia.org/r/421126 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [20:10:41] (03PS2) 10Dzahn: DHCP: Add MAC address entries for ms-be204[0-3] [puppet] - 10https://gerrit.wikimedia.org/r/421072 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [20:11:29] (03CR) 10Dzahn: [C: 032] DHCP: Add MAC address entries for ms-be204[0-3] [puppet] - 10https://gerrit.wikimedia.org/r/421072 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [20:12:07] (03PS2) 10Dzahn: DNS: Add production DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [20:17:37] !log mholloway-shell@tin Started deploy [mobileapps/deploy@675837f]: Update mobileapps to e6b50a0 [20:17:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:45] (03PS1) 10Andrew Bogott: labtestwikitech db config: enable innodb_large_prefix [puppet] - 10https://gerrit.wikimedia.org/r/421130 (https://phabricator.wikimedia.org/T190001) [20:19:52] (03CR) 10Zoranzoki21: "> CR-1 pending a reply to https://phabricator.wikimedia.org/T190206#4067118" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) (owner: 10Zoranzoki21) [20:19:57] (03PS2) 10Zoranzoki21: Add new throttle rule and add task for one in comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) [20:20:27] (03CR) 10Andrew Bogott: [C: 032] labtestwikitech db config: enable innodb_large_prefix [puppet] - 10https://gerrit.wikimedia.org/r/421130 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [20:20:57] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/references/{title}{/revision}{/tid} (retrieve structured reference data for the Cat article on English Wikipedia) is WARNING: Test retrieve structured reference data for the Cat article on English Wikipedia responds with unexpected value at path /reference_lists[0] = Missing keys: [uorder, usection_heading] [20:21:01] (03PS2) 10Gergő Tisza: Enforce that jseditor is the only group that can edit sitewide JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) [20:21:25] (03CR) 10Dzahn: DNS: Add production DNS entries for ms-be204[0-3] (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [20:21:30] ^ saw this /cc bearND [20:21:34] (03CR) 10Dzahn: [C: 04-1] DNS: Add production DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [20:21:38] (03PS3) 10Zoranzoki21: Add new throttle rule and add task for one in comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) [20:21:57] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy [20:23:10] !log mholloway-shell@tin Finished deploy [mobileapps/deploy@675837f]: Update mobileapps to e6b50a0 (duration: 05m 33s) [20:23:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:53] (03PS3) 10Papaul: DNS: Add production DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) [20:29:01] PROBLEM - Check systemd state on deploy1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:29:49] ^ ACK, new install [20:30:29] (03PS4) 10Papaul: DNS: Add production DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) [20:31:22] !log mlitn@tin Started deploy [3d2png/deploy@812a68a]: Updating 3d2png [20:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:20] !log mlitn@tin Finished deploy [3d2png/deploy@812a68a]: Updating 3d2png (duration: 02m 57s) [20:34:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:37:55] Notice: Applied catalog in 1833.82 seconds [20:37:57] ;) [20:38:50] failed dependencies..one run isnt enough, but let's see [20:39:59] first issue trying to use stretch: Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install mysql-client-5.5' returned 100: [20:40:20] hardcoded 5.5 version somewhere it looks [20:42:07] yea, deployment_server is using the old mysql class ... [20:42:17] it shouldn't [20:52:22] (03PS5) 10Dzahn: DNS: Add production DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [20:52:29] 10Operations, 10Puppet, 10Patch-For-Review: compile/diff catalogs between puppetdb v2 (production) and puppetdb v4 - https://phabricator.wikimedia.org/T188544#4070615 (10herron) The puppet agent on rhodium has been re-enabled, so we should be in a good place to revert https://gerrit.wikimedia.org/r/420667 an... [20:59:58] (03CR) 10Dzahn: [C: 032] DNS: Add production DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [21:00:07] (03PS6) 10Dzahn: DNS: Add production DNS entries for ms-be204[0-3] [dns] - 10https://gerrit.wikimedia.org/r/421065 (https://phabricator.wikimedia.org/T189633) (owner: 10Papaul) [21:04:20] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: RFC: Use content hash based image / thumb URLs - https://phabricator.wikimedia.org/T149847#4070660 (10Krinkle) [21:06:52] (03CR) 10Dzahn: "i like how we keep adding new stuff to "remnant.conf" all the time :)" [puppet] - 10https://gerrit.wikimedia.org/r/420386 (https://phabricator.wikimedia.org/T189181) (owner: 10Reedy) [21:07:12] (03PS2) 10Dzahn: Add advisorswiki to apache config [puppet] - 10https://gerrit.wikimedia.org/r/420386 (https://phabricator.wikimedia.org/T189181) (owner: 10Reedy) [21:09:19] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: [Blocked] Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070684 (10mmodell) @awight: Scap 3.7.7 is deployed now, are you... [21:10:07] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: [Blocked] Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070691 (10awight) Cool! Yes it was 3.7.7 I was waiting for, wi... [21:10:29] 10Operations, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070707 (10awight) [21:10:43] 10Operations, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3778593 (10awight) 05stalled>03Open [21:11:06] 10Operations, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070712 (10mmodell) I probably _should_ have called it 3.8 ;) [21:13:53] (03PS1) 10Jdlrobson: Enable VirtualPageViews on s6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421134 (https://phabricator.wikimedia.org/T189906) [21:14:43] (03CR) 10Dzahn: [C: 032] "tested on mwdebug1001" [puppet] - 10https://gerrit.wikimedia.org/r/420386 (https://phabricator.wikimedia.org/T189181) (owner: 10Reedy) [21:26:22] (03CR) 10Pnorman: [C: 031] Increase cache size for osm2pgsql import [puppet] - 10https://gerrit.wikimedia.org/r/421074 (https://phabricator.wikimedia.org/T190110) (owner: 10Catrope) [21:36:52] (03PS2) 10Bstorm: cloud novaproxy: Set a custom logrotate config for nginx [puppet] - 10https://gerrit.wikimedia.org/r/421070 (https://phabricator.wikimedia.org/T190218) [21:38:34] (03CR) 10Bstorm: [C: 032] cloud novaproxy: Set a custom logrotate config for nginx [puppet] - 10https://gerrit.wikimedia.org/r/421070 (https://phabricator.wikimedia.org/T190218) (owner: 10Bstorm) [21:39:34] _joe_: can you ping me when you're around? [21:39:51] (03PS1) 10Dzahn: removebast1003 again, was just for hw testing [dns] - 10https://gerrit.wikimedia.org/r/421181 (https://phabricator.wikimedia.org/T190093) [21:40:41] (03PS2) 10Dzahn: remove bast1003 again, was just for hw testing [dns] - 10https://gerrit.wikimedia.org/r/421181 (https://phabricator.wikimedia.org/T190093) [21:40:48] (03PS1) 10Bstorm: Revert "cloud novaproxy: Set a custom logrotate config for nginx" [puppet] - 10https://gerrit.wikimedia.org/r/421182 [21:41:39] (03CR) 10Bstorm: [C: 032] Revert "cloud novaproxy: Set a custom logrotate config for nginx" [puppet] - 10https://gerrit.wikimedia.org/r/421182 (owner: 10Bstorm) [21:42:19] (03PS3) 10Dzahn: remove bast1003 again, was just for hw testing [dns] - 10https://gerrit.wikimedia.org/r/421181 (https://phabricator.wikimedia.org/T190093) [21:42:22] 10Operations, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4070861 (10awight) [21:45:40] (03CR) 10Dzahn: [C: 032] "@Robh this is for the "reclaim" step, just reverting what i did" [dns] - 10https://gerrit.wikimedia.org/r/421181 (https://phabricator.wikimedia.org/T190093) (owner: 10Dzahn) [21:46:21] (03CR) 10RobH: [C: 031] remove bast1003 again, was just for hw testing [dns] - 10https://gerrit.wikimedia.org/r/421181 (https://phabricator.wikimedia.org/T190093) (owner: 10Dzahn) [21:47:46] 10Operations, 10ops-eqiad: WMF4727 hardware issue - disks dont detect in installer - https://phabricator.wikimedia.org/T189804#4070882 (10Dzahn) per IRC talk, this already worked when Rob live-hacked for testing. it just needs to be reclaimed. i just reverted my DNS change in https://gerrit.wikimedia.org/r/#/... [21:47:57] (03PS1) 10Andrew Bogott: labtest striker: use tls for ldap connection [puppet] - 10https://gerrit.wikimedia.org/r/421183 (https://phabricator.wikimedia.org/T190001) [21:48:43] (03CR) 10Andrew Bogott: [C: 032] labtest striker: use tls for ldap connection [puppet] - 10https://gerrit.wikimedia.org/r/421183 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [21:49:10] 10Operations, 10Ops-Access-Requests: Requesting deployment access for samwilson - https://phabricator.wikimedia.org/T189414#4070885 (10RobH) a:05RobH>03None [21:51:38] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests, and 3 others: Decommission restbase-test environment - https://phabricator.wikimedia.org/T186755#3954137 (10RobH) So I don't see any hosts in this task, only sub-tasks. Is there any reason this task needs to stay open? [21:52:13] (03CR) 10Andrew Bogott: [C: 032] wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [21:52:49] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#4070902 (10RobH) [21:53:34] (03Merged) 10jenkins-bot: wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [21:55:41] !log andrew@tin Synchronized wmf-config/CommonSettings.php: turning off wgReadOnly on labtestwikitech (duration: 01m 16s) [21:55:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:56:20] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#4070912 (10RobH) [21:56:32] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#3976137 (10RobH) a:03RobH [21:58:06] I'd like to get https://gerrit.wikimedia.org/r/#/c/421074/ merged. Gehel said to do this through https://wikitech.wikimedia.org/wiki/PuppetSWAT, which says to add it to the PuppetSWAT window on https://wikitech.wikimedia.org/wiki/Deployments. When I try to edit that page and click on the table, it just selects all of it, and in the top-right it says its a template, but pressing edit there [21:58:12] doesn't bring up anything I can understand how to edit. [22:00:04] Niharika and MaxSem: #bothumor I � Unicode. All rise for Deploy GlobalPrefs everywhere in production deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T2200). [22:00:04] No GERRIT patches in the queue for this window AFAICS. [22:00:57] MaxSem: https://gerrit.wikimedia.org/r/#/c/420948/ [22:01:14] Bah, rebasing. [22:02:59] (03PS4) 10Niharika29: Deploy GlobalPrefs to all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420948 [22:03:29] (03PS1) 10Jdlrobson: Enable MFMobileMainPageCss on Hindi Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421184 (https://phabricator.wikimedia.org/T190101) [22:03:29] :O [22:03:59] Hauskatze: Shh. [22:04:23] Niharika: it's 'ssh', not 'shh' #noobhumor [22:05:06] :P [22:05:37] MaxSem: Wanna +2 that? [22:05:52] nah, I'm preparing my own [22:06:47] https://tools.wmflabs.org/bash/quip/AWJKmUDge0jClLAOQJQy :P [22:07:19] How do I go about getting the patch above deployed? [22:07:48] pnorman: https://wikitech.wikimedia.org/wiki/SWAT_deploys [22:08:02] Sagan: how is that page suposed to work? It just picks random stuff? [22:08:07] (03PS1) 10MaxSem: Deploy GlobalPreferences everywhere but Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421185 (https://phabricator.wikimedia.org/T189806) [22:08:27] Hauskatze: you can store funny irc quotes there, and then read it, and laugh [22:08:38] Hauskatze: https://tools.wmflabs.org/bash/random for some fun [22:08:48] and how the bot stores them there? [22:09:00] is it a bot or you have to add them manually? [22:09:07] Hauskatze: how to store them: login via oauth, add quiq [22:09:10] so manually [22:09:20] that explains [22:09:29] MaxSem: It's step 4 of how to submit that's not working for me. [22:09:53] (03CR) 10TerraCodes: Deploy GlobalPreferences everywhere but Wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421185 (https://phabricator.wikimedia.org/T189806) (owner: 10MaxSem) [22:10:07] pnorman: what are you trying to do? [22:10:21] (03PS4) 10Zoranzoki21: Add new throttle rule and add task for one in comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) [22:10:26] MaxSem: edit the page [22:10:42] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#4070971 (10RobH) So these are somehow showing in the decom rack in racktables, but I JUST logged into them to disable puppet and power them down. @cmjohnson: Please be aware these were prem... [22:11:02] (03CR) 10MaxSem: [C: 032] Deploy GlobalPreferences everywhere but Wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421185 (https://phabricator.wikimedia.org/T189806) (owner: 10MaxSem) [22:11:19] (03CR) 10jerkins-bot: [V: 04-1] Add new throttle rule and add task for one in comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) (owner: 10Zoranzoki21) [22:12:17] (03Merged) 10jenkins-bot: Deploy GlobalPreferences everywhere but Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421185 (https://phabricator.wikimedia.org/T189806) (owner: 10MaxSem) [22:12:18] what's the problem?are you logged in? [22:12:40] Yes, the problem is clicking on the table doesn't let me add text to it like other tables [22:13:09] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#4070974 (10RobH) a:05RobH>03Cmjohnson [22:13:11] switch to wikitext mode (upper right corner of edit area) [22:13:51] Deskana: VE just needs to be disabled on some pages :P ^ [22:14:06] Yes, it needs disabling, or the documentation needs fixing [22:14:31] who uses VE, anyway? XD [22:14:36] It's default? [22:14:40] I do not [22:14:46] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#3976137 (10RobH) p:05Normal>03High So now we have unwiped systems with live switch ports that are not in DNS. This is bad, and what leads to duplicate IP issues. I'm raising the priori... [22:14:50] monobook and wikitext ftw [22:15:54] Hauskatze: Nostalgia or it's not hardcore! [22:16:18] I started with monobook :) [22:16:30] There's also a lack of documentation on what to do to add an entry in source edit mode, but I'll just cargo-cult someone elses [22:16:47] (03PS5) 10Zoranzoki21: Add new throttle rule and add task for one in comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) [22:18:35] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/421185/ (duration: 01m 15s) [22:18:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:19:51] pnorman: there's an example in the docs: https://wikitech.wikimedia.org/wiki/SWAT_deploys#How_to_submit_a_patch_for_SWAT [22:19:57] (03PS6) 10Zoranzoki21: Add new throttle rule and add task for one in comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) [22:21:51] Niharika: would this be intentional? https://awau.moe/25b003.png (where the check boxes don't line up) [22:22:22] (03Abandoned) 10Zoranzoki21: Disable Flow extension on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408073 (https://phabricator.wikimedia.org/T186463) (owner: 10Zoranzoki21) [22:23:22] greg-g: Ah, is that linked from somewhere? I'm coming into /wiki/Deployments from /wiki/PuppetSWAT [22:23:23] MaxSem: Special:GlobalPreferences@metawiki HTTP ERROR 500 on save. [22:24:34] pnorman: the deployments page, from every swat window, each window is linked to that [22:25:23] Hauskatze: looking [22:25:50] Request from [redacted] via cp3031 cp3031, Varnish XID 671902023 [22:25:51] Error: 500, Internal Server Error at Wed, 21 Mar 2018 22:24:51 GMT [22:29:37] (03PS1) 10MaxSem: Revert "Deploy GlobalPreferences everywhere but Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421189 [22:29:43] (03CR) 10MaxSem: [C: 032] Revert "Deploy GlobalPreferences everywhere but Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421189 (owner: 10MaxSem) [22:29:56] 10Operations, 10ops-eqiad: WMF4727 hardware issue - disks dont detect in installer - https://phabricator.wikimedia.org/T189804#4070999 (10RobH) a:05RobH>03Cmjohnson Added back to spares. The only thing that has to happen is a disk wipe, since these got installed with the new install key. System is powere... [22:30:45] Mostly unrelated to that patch, I need a schema created on the maps db on maps-test2004, but don't have access. I'm not sure if this needs to be done through puppett, or by someone with access rights just running the CREATE SCHEMA command in the db [22:30:56] huh, I wonder if that 500 error could have been me, since I just went thought and added all my global preferences on metawiki [22:31:04] (03Merged) 10jenkins-bot: Revert "Deploy GlobalPreferences everywhere but Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421189 (owner: 10MaxSem) [22:32:53] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: Revert global prefs (duration: 01m 15s) [22:32:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:33:50] sigh [22:34:13] MaxSem: Where's the patch you synced? [22:34:31] in the revert pile :( [22:34:56] Why? [22:36:21] submitting bugs... [22:42:47] (03CR) 10jenkins-bot: Revert "Deploy GlobalPreferences everywhere but Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421189 (owner: 10MaxSem) [22:43:45] (03CR) 10jenkins-bot: group1 to wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421103 (owner: 10Chad) [22:44:02] (03CR) 10jenkins-bot: wikitech/labtestwikitech: override etcdConfig's $wgReadOnly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421117 (https://phabricator.wikimedia.org/T190001) (owner: 10Andrew Bogott) [22:44:16] (03CR) 10jenkins-bot: Deploy GlobalPreferences everywhere but Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421185 (https://phabricator.wikimedia.org/T189806) (owner: 10MaxSem) [22:48:03] mutante: i think a lot of data/research folks use bast1001 too, for access to the analytics servers (see e.g. https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access or https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access ) ... [22:48:19] ... so it may be worth sending the "bast1001 -> bast1002 " heads-up to the Analytics-l mailing list too [22:49:19] MaxSem: Or some tables need to be simplified. ;-) [22:49:26] (sorry, second link was meant to be https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Production_access ) [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180321T2300). [23:00:04] twkozlowski, Hauskatze, Smalyshev, and Zoranzoki21: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:15] o/ [23:00:19] hey [23:00:23] o/ [23:00:45] hi [23:04:21] who will swat now? [23:08:52] if there is no one, I can do it [23:09:04] let's wait until 12:10 [23:11:18] (03PS6) 10Ladsgroup: Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) (owner: 10MarcoAurelio) [23:11:40] (03PS1) 10Ladsgroup: Migrate $wgOresModels to the new config system [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421191 (https://phabricator.wikimedia.org/T189948) [23:11:52] noone will not swat? [23:13:25] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) (owner: 10MarcoAurelio) [23:13:34] Zoranzoki21: I'm doing swat [23:13:42] Amir1: Super. Thank you [23:13:48] Hauskatze: which file should be syned first? [23:14:02] Amir1: good question [23:14:10] I was never told [23:14:44] I can sync both at the same time [23:14:47] (03Merged) 10jenkins-bot: Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) (owner: 10MarcoAurelio) [23:15:20] Can I reply on question? :) I think first abusefilter.php [23:15:32] Ok, sorry [23:15:37] Zoranzoki21: yeah, sure [23:15:56] Hauskatze: your patch is live in mwdebug1002, please test [23:16:03] I think abusefilter.php goes first as well, but I don't know for sure [23:16:06] Amir1: ack, checking [23:17:06] Amir1: looks good to me [23:17:35] ack [23:17:52] dmaza said it takes two minutes to add the grafana dashboards for abusefilterruntime but I have no idea how to do/add that :) [23:18:35] overall, profiling data appears on the abusefilter page and the wiki isn't broken so we should be good to go imho [23:19:48] !log ladsgroup@tin Synchronized wmf-config/abusefilter.php: [[gerrit:420989|Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks, part I (T190264)]] (duration: 01m 15s) [23:19:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:54] T190264: AbuseFilter stats & profiler for es.wikibooks - https://phabricator.wikimedia.org/T190264 [23:20:22] (03CR) 10jenkins-bot: Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420989 (https://phabricator.wikimedia.org/T190264) (owner: 10MarcoAurelio) [23:20:50] (03PS3) 10Ladsgroup: guwiki: fix $wg{Add,Remove}Groups configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421049 (owner: 10MarcoAurelio) [23:21:28] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:420989|Enable $wgAbuseFilterProfile & $wgAbuseFilterRuntimeProfile on eswikibooks, part II (T190264)]] (duration: 01m 15s) [23:21:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:21:50] Hauskatze: it's live now, please test [23:22:00] Amir1: already on it, still looks good [23:22:05] cool [23:22:29] (and we did good on enabling this, d'oh, we've got very bad inneficient filters there wasting a lot of conditions) [23:22:38] 10Operations, 10netops: eqiad 10G ports needs - https://phabricator.wikimedia.org/T190364#4071230 (10ayounsi) [23:23:11] Great. Let me know when to move forward [23:24:48] Amir1: go ahead with the guwiki one, the abusefilter config is working fine [23:24:50] 10Operations, 10netops: eqiad 10G ports needs - https://phabricator.wikimedia.org/T190364#4071230 (10EBernhardson) For elasticsearch we are expecting to switch all servers to 10G on their standard server refresh schedule. I'm not sure what exactly that schedule is though. [23:25:10] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421049 (owner: 10MarcoAurelio) [23:25:33] Great, on it [23:26:25] (03Merged) 10jenkins-bot: guwiki: fix $wg{Add,Remove}Groups configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421049 (owner: 10MarcoAurelio) [23:26:39] (03CR) 10jenkins-bot: guwiki: fix $wg{Add,Remove}Groups configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421049 (owner: 10MarcoAurelio) [23:26:54] 10Operations, 10netops: eqiad 10G ports needs - https://phabricator.wikimedia.org/T190364#4071259 (10ayounsi) [23:27:08] Hauskatze: it's live in mwdebug1002 [23:27:18] Hauskatze: In what way does https://gerrit.wikimedia.org/r/#/c/421049/ "fix" the config? (I agree it's cleaner, just not seeing the bug) [23:27:56] Krinkle: today guwiki admin came to srp asking that we flag the user because they weren't able to do so [23:28:10] check special:listgrouprights, the admins cannot add the permission [23:28:33] Hauskatze: What I mean is, it looks to me like it changed the order and moved comments. Great changes, but what is the functional difference? Is the order significant? was there a typo? [23:28:54] Amir1: Can you please deploy my patch now. I feel bad [23:30:12] Amir1: is it on mwdebug, I cannot see the change [23:30:45] Zoranzoki21: the Hauskatze patch needs to be finished first, there is no way around that [23:30:57] Amir1: Ok. You can after his [23:31:25] Hauskatze: I rebased and everything looks fine [23:31:30] Can you double check? [23:31:41] rechecking [23:33:08] Amir1: it looks like something unrelated to the patch is preventing the sysops adding the group. I'd leave the patch as it's mostly a noop as it stands, but I shall investigate why sysops are not allowed to do this when the config says they should be able to [23:34:08] Hauskatze: Please file a bug, this is only cleanup [23:34:23] I sync it as it's pretty harmless and cleaner [23:34:24] Amir1: yep, I'll do [23:34:41] Krinkle: so after all there's a bug in there :) [23:34:51] (03CR) 10Krinkle: [C: 04-1] "Two things I noticed I'm not entirely sure about:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [23:35:37] Hauskatze: Right, so the patch was intended as clean up, not as an effective change. [23:35:58] Krinkle: I thought the syntax was the issue, halas, no explanation :) [23:36:25] but, congratulations! we still have a bug, but with cleaner syntax lol [23:36:38] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:421049|guwiki: clean up $wg{Add,Remove}Groups configuration]] (duration: 01m 16s) [23:36:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:56] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) (owner: 10Zoranzoki21) [23:38:14] Zoranzoki21: Is there a way to test your patch in mwdebug1002? [23:38:21] Amir1: No [23:38:31] Amir1: It is throttle rule patch [23:39:05] ack [23:39:19] (03Merged) 10jenkins-bot: Add new throttle rule and add task for one in comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) (owner: 10Zoranzoki21) [23:39:33] (03CR) 10jenkins-bot: Add new throttle rule and add task for one in comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420807 (https://phabricator.wikimedia.org/T190206) (owner: 10Zoranzoki21) [23:41:36] !log ladsgroup@tin Synchronized wmf-config/throttle.php: [[gerrit:420807|Add new throttle rule and add task for one in comment]] (duration: 01m 16s) [23:41:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:41:48] Zoranzoki21: ^ done [23:42:00] Amir1: ok. thank you [23:42:09] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421119 (owner: 10Smalyshev) [23:42:46] good night [23:42:51] SMalyshev: Your patch doesn't need to be in SWAT. labs is okay [23:42:56] Zoranzoki21: You too :) [23:43:24] (03Merged) 10jenkins-bot: Revert "Don't use Cirrus for wbsearchentities on beta Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421119 (owner: 10Smalyshev) [23:43:26] SMalyshev: I merge it and rebase tin. in half an hour, you will have it automatically [23:43:29] Amir1: what you mean labs is okay? [23:43:37] Issue filed as T190370 [23:43:38] T190370: guwiki sysops cannot add or remove 'rollback' but they're allowed on AddGroups/RemoveGroups - https://phabricator.wikimedia.org/T190370 [23:43:39] Amir1: ah, ok [23:44:03] Amir1: does it need any deployment to make it to labs hosts? [23:44:18] (03CR) 10jenkins-bot: Revert "Don't use Cirrus for wbsearchentities on beta Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421119 (owner: 10Smalyshev) [23:44:28] nope, it happens automatically unless it gets broken for whatever reason [23:44:40] ok, great [23:44:51] Amir1: ping me pls then when it's done so I could check it works ok [23:45:38] aaahhh, found the issue [23:45:39] SMalyshev: It should be done in less than an hour, can't tell for sure when exactly, please check half an hour from now [23:45:53] *half an hour [23:45:53] the groups is incorrectly called as 'rollback' when it should be 'rollbacker' [23:45:57] fixing that [23:46:29] Amir1: ok, thanks [23:46:37] (03PS2) 10Ladsgroup: Migrate $wgOresModels to the new config system [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421191 (https://phabricator.wikimedia.org/T189948) [23:46:45] (03CR) 10Ladsgroup: [C: 032] Migrate $wgOresModels to the new config system [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421191 (https://phabricator.wikimedia.org/T189948) (owner: 10Ladsgroup) [23:47:17] SMalyshev: next time, just ping me and I get it done in whatever time it is [23:47:27] ok, thanks [23:47:58] (03Merged) 10jenkins-bot: Migrate $wgOresModels to the new config system [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421191 (https://phabricator.wikimedia.org/T189948) (owner: 10Ladsgroup) [23:49:50] (03CR) 10jenkins-bot: Migrate $wgOresModels to the new config system [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421191 (https://phabricator.wikimedia.org/T189948) (owner: 10Ladsgroup) [23:49:52] (03Draft2) 10MarcoAurelio: guwiki: fix rollback -> rollbacker (group) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421194 (https://phabricator.wikimedia.org/T190370) [23:49:55] (03Draft1) 10MarcoAurelio: guwiki: fix rollback -> rollbacker (group) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421194 (https://phabricator.wikimedia.org/T190370) [23:50:13] Amir1 & Krinkle: https://gerrit.wikimedia.org/r/#/c/421194/ fixes the issue [23:51:21] (03CR) 10Platonides: [C: 031] guwiki: fix rollback -> rollbacker (group) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421194 (https://phabricator.wikimedia.org/T190370) (owner: 10MarcoAurelio) [23:53:04] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:421191|Migrate $wgOresModels to the new config system (T189948)]] (duration: 01m 16s) [23:53:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:10] T189948: Clean up old backward compatibility settings of $wgOresModels - https://phabricator.wikimedia.org/T189948 [23:53:57] Hauskatze: do you want to add it to SWAT? [23:54:15] Amir1: if that's okay, I can add it to wikitech [23:54:27] yup we have some time [23:54:37] okay so I'm adding it to the wiki [23:54:46] * Hauskatze logs in again [23:55:15] (03PS3) 10Ladsgroup: guwiki: fix rollback -> rollbacker (group) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421194 (https://phabricator.wikimedia.org/T190370) (owner: 10MarcoAurelio) [23:55:17] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421194 (https://phabricator.wikimedia.org/T190370) (owner: 10MarcoAurelio) [23:56:13] 10Operations, 10Cassandra, 10Services (doing), 10User-Eevans, 10User-Elukey: Test/upload new cassandra 2.2.6 package (wmf3) - https://phabricator.wikimedia.org/T189529#4071349 (10Eevans) [23:56:20] done [23:56:49] (03Merged) 10jenkins-bot: guwiki: fix rollback -> rollbacker (group) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421194 (https://phabricator.wikimedia.org/T190370) (owner: 10MarcoAurelio) [23:57:03] 10Operations, 10Cassandra, 10Services (doing), 10User-Eevans, 10User-Elukey: Test/upload new cassandra 2.2.6 package (wmf3) - https://phabricator.wikimedia.org/T189529#4044780 (10Eevans) A new package that actually applies the patch within has been uploaded; Sorry for the wasted cycles! [23:57:27] Hauskatze: live in mwdebug1002 [23:57:49] now the 'rollbacker' group is there [23:57:56] lgtm [23:58:00] cool [23:58:09] so it was not the /* */ things but the spelling [23:58:10] 10Operations, 10Cassandra, 10Services (doing), 10User-Eevans, 10User-Elukey: Test/upload new cassandra 2.2.6 package (wmf3) - https://phabricator.wikimedia.org/T189529#4071354 (10Eevans) [23:58:12] damn [23:58:23] I could have fixed everything in one patch [23:59:11] Hauskatze: It's okay, you fixed it, that's what matters [23:59:57] (03CR) 10jenkins-bot: guwiki: fix rollback -> rollbacker (group) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421194 (https://phabricator.wikimedia.org/T190370) (owner: 10MarcoAurelio)