[00:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180103T0000). [00:00:04] Smalyshev, mooeypoo, and MatmaRex: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:15] here [00:00:19] hi. a lot of patches. [00:00:21] \o [00:00:39] * mooeypoo has a typo fix to a swat patch [00:01:27] Who's deploying today? [00:03:04] *crickets* [00:03:27] cricket deploy [00:03:48] everybody is still vacationing? [00:03:52] Is that the british cricket? I hear that could take days [00:03:54] (03PS5) 10Reedy: Lower ElasticSearch index refresh interval for Wikidata to 5s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399466 (https://phabricator.wikimedia.org/T183053) (owner: 10Smalyshev) [00:03:58] (03CR) 10Reedy: [C: 032] Lower ElasticSearch index refresh interval for Wikidata to 5s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399466 (https://phabricator.wikimedia.org/T183053) (owner: 10Smalyshev) [00:04:15] thanks Reedy :) [00:04:23] Reedy to the rescue \o/ [00:04:47] (03PS1) 10Andrew Bogott: bootstrap_vz: remove lots of resolve.conf magic from firstboot script [puppet] - 10https://gerrit.wikimedia.org/r/401636 (https://phabricator.wikimedia.org/T181375) [00:04:51] this one doesn't need to be tested, it'd just work when reindexing next time, so can be deployed as is probably [00:05:06] yeah [00:05:28] (03Merged) 10jenkins-bot: Lower ElasticSearch index refresh interval for Wikidata to 5s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399466 (https://phabricator.wikimedia.org/T183053) (owner: 10Smalyshev) [00:05:56] Reedy, I have 2 pairs; you can merge both together and then I'll test -- they're both two parts of the same fix, deployed twice to wmf12 and wmf15. [00:06:21] Let me know if that's too terrible, I can test and retest each time if that makes more sense. [00:06:59] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: Add wmgCirrusSearchRefreshInterval (duration: 01m 02s) [00:07:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:07:34] (03CR) 10jenkins-bot: Lower ElasticSearch index refresh interval for Wikidata to 5s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399466 (https://phabricator.wikimedia.org/T183053) (owner: 10Smalyshev) [00:08:25] all of the log spam [00:08:30] (from scap) [00:09:09] !log reedy@tin Synchronized wmf-config/CirrusSearch-common.php: Lower ElasticSearch index refresh interval for Wikidata to 5s (duration: 01m 02s) [00:09:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:14:48] (03PS2) 10Smalyshev: [WIP] Add loading DCAT-AP data into dcatap namespace on WDQS [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) [00:20:23] c'mon jerkins [00:24:45] i need to go away for a minute but i'll probably be back before it's my turn, eh [00:25:06] It's taking nearly 10 minutes to merge a patch [00:25:09] So probably :P [00:25:13] Meh [00:25:37] The typo one?? ... it's... one ... character... [00:25:46] jerkins runs all the tests [00:25:53] Even for a RELEASE-NOTES change [00:25:59] It has no idea of context or anything else [00:26:10] Yeah [00:26:55] If it was urgent, could just force it of course [00:27:01] He's nearly done wasting our time [00:27:14] I'm staring at Zuul Status too [00:28:09] done [00:28:25] Yay, k, testing on both wmf12 and wmf15 [00:28:30] no, jerkins is done [00:28:31] hang on [00:28:32] :P [00:28:33] oh [00:28:40] yeah I forgot I need to know the server too [00:28:40] haha [00:28:43] I'm so eager! [00:30:56] jenkins doesn't deploy..... yet [00:30:58] ;) [00:31:10] mooeypoo: should be on mwdebug1002 now [00:31:26] awesome, testing [00:32:16] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Allow "releasers-mediawiki" sudo rights to manage Jenkins - https://phabricator.wikimedia.org/T183972#3870174 (10greg) [00:32:28] 10Operations, 10LDAP, 10Release-Engineering-Team (Watching / External): Create 'releng' LDAP group - https://phabricator.wikimedia.org/T183507#3870178 (10greg) [00:33:02] mediawiki (wmf15) tested and works; testing on wmf12 now [00:35:17] wmf12 approved as well [00:35:27] all works \o/ [00:36:34] !log reedy@tin Synchronized php-1.31.0-wmf.15/resources/src/mediawiki.rcfilters/dm/: RCFilters (duration: 01m 02s) [00:36:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:38:07] !log reedy@tin Synchronized php-1.31.0-wmf.12/resources/src/mediawiki.rcfilters/dm/: RCFilters (duration: 01m 02s) [00:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:50:02] Reedy: i think you can safely just sync 'em [00:51:52] k [00:53:53] 10Operations, 10Analytics, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3870196 (10Nuria) >Flipping these Edges/Safaris to origin is going to deny us information on internal referrers that we currently get from... [00:58:21] !log reedy@tin Synchronized php-1.31.0-wmf.15/extensions/CodeMirror: T182320 (duration: 00m 59s) [00:58:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:58:32] T182320: Fix textarea editor's edit font size - https://phabricator.wikimedia.org/T182320 [00:59:58] !log reedy@tin Synchronized php-1.31.0-wmf.15/extensions/Flow: T182320 (duration: 01m 18s) [01:00:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:01:15] !log reedy@tin Synchronized php-1.31.0-wmf.15/resources/src/mediawiki/mediawiki.editfont.css: T182320 (duration: 01m 01s) [01:01:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:02:06] thanks Reedy. you're the best [01:19:49] (03PS2) 10Legoktm: mediawiki: Remove unused python-pygments package [puppet] - 10https://gerrit.wikimedia.org/r/400458 (https://phabricator.wikimedia.org/T182851) [02:36:20] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.12) (duration: 06m 56s) [02:36:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:43:50] (03CR) 10Dzahn: icinga: script to send custom SMS to Icinga contacts (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/400615 (https://phabricator.wikimedia.org/T82937) (owner: 10Dzahn) [04:44:06] (03PS6) 10Dzahn: icinga: script to send custom SMS to Icinga contacts [puppet] - 10https://gerrit.wikimedia.org/r/400615 (https://phabricator.wikimedia.org/T82937) [06:10:38] (03PS1) 10Legoktm: contint: Install php-xdebug (disabled by default) on jessie [puppet] - 10https://gerrit.wikimedia.org/r/401677 [06:11:22] mutante: still around? interested in reviewing https://gerrit.wikimedia.org/r/401677 ? :) [06:12:00] yea, but not in real-time now :) just add me and it will be in my queue [06:17:38] ok :) [06:34:06] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T183708#3870596 (10Marostegui) 05Open>03Resolved All good - thank you! ``` root@db1001:~# megacli -LDPDInfo -aAll Adapter #0 Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name : RAI... [06:37:12] !log Deploy schema change on s1 master db1052 - T174569 [06:37:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:37:24] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [06:43:09] !log kartik@tin Started deploy [cxserver/deploy@66e384e]: Update cxserver to cc01477 [06:43:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:22] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401681 [06:43:39] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401681 [06:47:50] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401681 (owner: 10Marostegui) [06:47:58] !log kartik@tin Finished deploy [cxserver/deploy@66e384e]: Update cxserver to cc01477 (duration: 04m 49s) [06:48:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:49:25] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401681 (owner: 10Marostegui) [06:49:36] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401681 (owner: 10Marostegui) [06:50:49] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 - T174569 (duration: 01m 10s) [06:50:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:58] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [06:52:13] (03PS1) 10Marostegui: db-eqiad.php: Depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401683 (https://phabricator.wikimedia.org/T174569) [06:54:26] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401683 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [06:55:56] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401683 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [06:57:10] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401683 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [06:57:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1101:3317 - T174569 (duration: 01m 01s) [06:57:15] !log Deploy schema change on db1101:3317 - T174569 [06:57:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:25] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [06:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:53:01] (03PS1) 10Chad: Add hooks plugin @ 2.13.9 [software/gerrit] - 10https://gerrit.wikimedia.org/r/401697 (https://phabricator.wikimedia.org/T183792) [08:14:57] (03PS4) 10Alexandros Kosiaris: tcpircbot: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/400250 (owner: 10Dzahn) [08:15:02] (03CR) 10Alexandros Kosiaris: [C: 032] tcpircbot: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/400250 (owner: 10Dzahn) [08:19:55] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3870731 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1336.eqiad.wmnet', 'mw1337.eqiad.wmnet... [08:21:26] (03PS1) 10Alexandros Kosiaris: admin: Use posix_name to make ci docker group unique [puppet] - 10https://gerrit.wikimedia.org/r/401698 [08:24:31] 10Operations, 10Performance-Team, 10HHVM: HHVM hangs on the API cluster - https://phabricator.wikimedia.org/T184048#3870738 (10Joe) p:05Triage>03High [08:25:07] (03PS2) 10Legoktm: contint: Install php-xdebug (disabled by default) for PHP 7 [puppet] - 10https://gerrit.wikimedia.org/r/401677 [08:28:32] (03CR) 10Alexandros Kosiaris: [C: 032] "@hashar: FYI. This solves the multitenancy problem of the docker group. PCC is at https://puppet-compiler.wmflabs.org/compiler02/9514/cont" [puppet] - 10https://gerrit.wikimedia.org/r/401698 (owner: 10Alexandros Kosiaris) [08:28:37] (03PS2) 10Alexandros Kosiaris: admin: Use posix_name to make ci docker group unique [puppet] - 10https://gerrit.wikimedia.org/r/401698 [08:28:39] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] admin: Use posix_name to make ci docker group unique [puppet] - 10https://gerrit.wikimedia.org/r/401698 (owner: 10Alexandros Kosiaris) [08:28:47] (03CR) 10Paladox: [C: 031] Add hooks plugin @ 2.13.9 [software/gerrit] - 10https://gerrit.wikimedia.org/r/401697 (https://phabricator.wikimedia.org/T183792) (owner: 10Chad) [08:39:09] (03CR) 10Gehel: [C: 031] "LGTM" [software/cumin] - 10https://gerrit.wikimedia.org/r/399829 (owner: 10Volans) [08:46:02] (03CR) 10Gehel: [C: 031] "LGTM for the elastic*, logstash* and maps* servers. Feel free to add wdqs1003 as well if you want." [puppet] - 10https://gerrit.wikimedia.org/r/401491 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [08:49:38] (03CR) 10Elukey: [C: 031] hieradata: partial eqiad SMART metrics rollout [puppet] - 10https://gerrit.wikimedia.org/r/401491 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [08:51:36] (03CR) 10Gehel: [C: 04-1] "minor issues, see comments inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) (owner: 10Smalyshev) [08:55:12] (03PS3) 10Gehel: thumbor: use the canonical definition of logstash host [puppet] - 10https://gerrit.wikimedia.org/r/399652 (https://phabricator.wikimedia.org/T182304) [08:56:04] (03PS35) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [08:59:13] !log stop eventlogging mysql insertion on eventlog1001 to allow db1107 maintenance - T168414 [08:59:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:25] T168414: Purge all old data from EventLogging master - https://phabricator.wikimedia.org/T168414 [09:11:20] (03CR) 10Gehel: [C: 032] thumbor: use the canonical definition of logstash host [puppet] - 10https://gerrit.wikimedia.org/r/399652 (https://phabricator.wikimedia.org/T182304) (owner: 10Gehel) [09:22:47] (03PS1) 10Alexandros Kosiaris: openldap: Amend to rename docker to contint-docker [puppet] - 10https://gerrit.wikimedia.org/r/401700 [09:22:49] (03PS1) 10Alexandros Kosiaris: admin: Flatten 2 levels of arrays in unique_users [puppet] - 10https://gerrit.wikimedia.org/r/401701 [09:23:38] (03PS1) 10Filippo Giunchedi: admin: add yubikey for ema [puppet] - 10https://gerrit.wikimedia.org/r/401702 [09:23:47] (03CR) 10Smalyshev: [WIP] Add loading DCAT-AP data into dcatap namespace on WDQS (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) (owner: 10Smalyshev) [09:23:51] ema: ^ [09:25:17] (03CR) 10Ema: [V: 032 C: 032] "Thanks Filippo!" [puppet] - 10https://gerrit.wikimedia.org/r/401702 (owner: 10Filippo Giunchedi) [09:25:26] (03PS2) 10Ema: admin: add yubikey for ema [puppet] - 10https://gerrit.wikimedia.org/r/401702 (owner: 10Filippo Giunchedi) [09:25:30] (03CR) 10Ema: [V: 032 C: 032] admin: add yubikey for ema [puppet] - 10https://gerrit.wikimedia.org/r/401702 (owner: 10Filippo Giunchedi) [09:25:46] godog: merged, thanks! [09:26:03] godog: oh, could you puppet-merge too? :) [09:26:32] heheh indeed, done [09:26:37] cheers [09:28:21] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Icinga check for WDQS should do an actual query - https://phabricator.wikimedia.org/T181989#3870823 (10Gehel) Yep, this is actually done and deployed. [09:28:56] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Icinga check for WDQS should do an actual query - https://phabricator.wikimedia.org/T181989#3870825 (10Smalyshev) 05Open>03Resolved [09:32:13] (03PS3) 10Filippo Giunchedi: hieradata: partial eqiad SMART metrics rollout [puppet] - 10https://gerrit.wikimedia.org/r/401491 (https://phabricator.wikimedia.org/T86552) [09:32:51] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: partial eqiad SMART metrics rollout [puppet] - 10https://gerrit.wikimedia.org/r/401491 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [09:37:49] (03CR) 10Muehlenhoff: [C: 031] openldap: Amend to rename docker to contint-docker [puppet] - 10https://gerrit.wikimedia.org/r/401700 (owner: 10Alexandros Kosiaris) [09:42:21] (03CR) 10Gehel: [C: 04-1] [WIP] Add loading DCAT-AP data into dcatap namespace on WDQS (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) (owner: 10Smalyshev) [09:49:38] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3870843 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1336.eqiad.wmnet'] ``` The log can be... [09:54:57] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3870851 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1336.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['mw1336.eqiad.wmnet'] ``` [09:57:03] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1337.*.eqiad.wmnet [09:57:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:23] godog: apparently the smart monitoring doesn't work on ubuntu? [10:04:23] jynus: likely, I'm taking a look [10:06:05] (03PS2) 10Jcrespo: mariadb: Decommission db1029, former x1 replica [puppet] - 10https://gerrit.wikimedia.org/r/401529 (https://phabricator.wikimedia.org/T183469) [10:06:07] (03PS1) 10Jcrespo: mariadb: Promote db2039 to be the new master of s6-codfw [puppet] - 10https://gerrit.wikimedia.org/r/401706 (https://phabricator.wikimedia.org/T176243) [10:07:09] (03CR) 10Jcrespo: [C: 032] mariadb: Decommission db1029, former x1 replica [puppet] - 10https://gerrit.wikimedia.org/r/401529 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:07:39] godog: I am ok with silently ignoring the check on ubuntu [10:08:13] we should get rid of ubuntus by the end of the quarter [10:08:42] jynus: nice! yeah unfortunately we'll still have to cater for ubuntu cases for a while anyways [10:08:56] (03CR) 10Marostegui: [C: 031] mariadb: Promote db2039 to be the new master of s6-codfw [puppet] - 10https://gerrit.wikimedia.org/r/401706 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [10:10:35] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: puppet4: The following unknown setting(s) are being ignored: parser - https://phabricator.wikimedia.org/T179721#3734109 (10Joe) 05Open>03Resolved a:03Joe [10:10:37] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe, 10cloud-services-team (FY2017-18): Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3870875 (10Joe) [10:12:18] (03PS2) 10Jcrespo: mariadb: Promote db2039 to be the new master of s6-codfw [puppet] - 10https://gerrit.wikimedia.org/r/401706 (https://phabricator.wikimedia.org/T176243) [10:12:36] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1013 - https://phabricator.wikimedia.org/T184053#3870877 (10ops-monitoring-bot) [10:17:02] (03PS1) 10Filippo Giunchedi: smart: special case for backports on Ubuntu [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) [10:20:31] (03CR) 10Hashar: contint: Install php-xdebug (disabled by default) for PHP 7 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/401677 (owner: 10Legoktm) [10:22:23] (03CR) 10Legoktm: contint: Install php-xdebug (disabled by default) for PHP 7 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401677 (owner: 10Legoktm) [10:22:25] (03PS1) 10Jcrespo: mariadb: Set db1029 as a spare and ready to decommission [puppet] - 10https://gerrit.wikimedia.org/r/401708 (https://phabricator.wikimedia.org/T184054) [10:22:46] (03PS3) 10Hashar: contint: Install php-xdebug (disabled by default) for PHP 7 [puppet] - 10https://gerrit.wikimedia.org/r/401677 (owner: 10Legoktm) [10:24:22] (03CR) 10Volans: [C: 032] ClusterShell backend: fix execute() return code [software/cumin] - 10https://gerrit.wikimedia.org/r/399829 (owner: 10Volans) [10:26:49] (03Merged) 10jenkins-bot: ClusterShell backend: fix execute() return code [software/cumin] - 10https://gerrit.wikimedia.org/r/399829 (owner: 10Volans) [10:27:36] (03CR) 10jenkins-bot: ClusterShell backend: fix execute() return code [software/cumin] - 10https://gerrit.wikimedia.org/r/399829 (owner: 10Volans) [10:28:44] (03CR) 10Hashar: [C: 031] "Cherry picked on CI puppet master. The result can be tested on eg integration-slave-jessie1001.integration.eqiad.wmflabs" [puppet] - 10https://gerrit.wikimedia.org/r/401677 (owner: 10Legoktm) [10:29:29] (03PS1) 10Jcrespo: dblist: Reorder db2028 & db2039 because master switchover [software] - 10https://gerrit.wikimedia.org/r/401710 (https://phabricator.wikimedia.org/T176243) [10:29:38] (03CR) 10Legoktm: "Thanks :)" [puppet] - 10https://gerrit.wikimedia.org/r/401677 (owner: 10Legoktm) [10:33:17] (03PS1) 10Jcrespo: mariadb: Point x1-slave.eqiad.wmnet to db1056 [dns] - 10https://gerrit.wikimedia.org/r/401711 (https://phabricator.wikimedia.org/T184054) [10:33:19] (03PS1) 10Jcrespo: mariadb: Point x1-master.eqiad.wmnet to db1055 [dns] - 10https://gerrit.wikimedia.org/r/401712 (https://phabricator.wikimedia.org/T184054) [10:34:19] (03CR) 10Hashar: contint: Install php-xdebug (disabled by default) for PHP 7 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401677 (owner: 10Legoktm) [10:35:03] (03CR) 10Jcrespo: [C: 031] mariadb: Point x1-slave.eqiad.wmnet to db1056 [dns] - 10https://gerrit.wikimedia.org/r/401711 (https://phabricator.wikimedia.org/T184054) (owner: 10Jcrespo) [10:35:26] (03CR) 10Marostegui: [C: 031] mariadb: Point x1-slave.eqiad.wmnet to db1056 [dns] - 10https://gerrit.wikimedia.org/r/401711 (https://phabricator.wikimedia.org/T184054) (owner: 10Jcrespo) [10:35:34] (03CR) 10Jcrespo: [C: 04-1] "Not until switchover." [dns] - 10https://gerrit.wikimedia.org/r/401712 (https://phabricator.wikimedia.org/T184054) (owner: 10Jcrespo) [10:35:58] (03CR) 10Jcrespo: [C: 032] mariadb: Set db1029 as a spare and ready to decommission [puppet] - 10https://gerrit.wikimedia.org/r/401708 (https://phabricator.wikimedia.org/T184054) (owner: 10Jcrespo) [10:36:07] (03PS2) 10Jcrespo: mariadb: Set db1029 as a spare and ready to decommission [puppet] - 10https://gerrit.wikimedia.org/r/401708 (https://phabricator.wikimedia.org/T184054) [10:37:45] (03CR) 10Muehlenhoff: smart: special case for backports on Ubuntu (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [10:38:09] (03PS3) 10Jcrespo: mariadb: Set db1029 as a spare and ready to decommission [puppet] - 10https://gerrit.wikimedia.org/r/401708 (https://phabricator.wikimedia.org/T184054) [10:39:10] (03CR) 10Jcrespo: [C: 032] mariadb: Set db1029 as a spare and ready to decommission [puppet] - 10https://gerrit.wikimedia.org/r/401708 (https://phabricator.wikimedia.org/T184054) (owner: 10Jcrespo) [10:41:21] 10Operations, 10ops-eqiad, 10Analytics-Kanban: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3870926 (10Marostegui) I would consider fixing mgmt the first thing address here. If the server breaks, even with OOM, we would need to wait for Chris to reboot it for instance. [10:43:00] 10Operations, 10ops-eqiad, 10Analytics-Kanban: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3870928 (10elukey) >>! In T183771#3870926, @Marostegui wrote: > I would consider fixing mgmt the first thing address here. If the server breaks, even with OOM, we would need to wait f... [10:44:41] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1101:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401713 [10:45:25] !log oblivian@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=restbase,name=codfw [10:45:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:38] <_joe_> mobrovac: ^^ [10:45:39] <_joe_> done [10:46:49] working! thnx _joe_ [10:48:27] (03CR) 10Jcrespo: [C: 032] mariadb: Point x1-slave.eqiad.wmnet to db1056 [dns] - 10https://gerrit.wikimedia.org/r/401711 (https://phabricator.wikimedia.org/T184054) (owner: 10Jcrespo) [10:48:49] !log mobrovac@tin Started restart [mobileapps/deploy@bf85a55]: Pick up the new RESTBase DNS [10:48:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:55] (03CR) 10Jcrespo: [C: 032] mariadb: Remove db1029 from dblists [software] - 10https://gerrit.wikimedia.org/r/401527 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:52:04] (03PS4) 10Arturo Borrero Gonzalez: role::puppetmaster::standalone: add ferm rules to allow connecting to tcp/8140 [puppet] - 10https://gerrit.wikimedia.org/r/394101 (https://phabricator.wikimedia.org/T154150) [10:52:29] (03CR) 10jerkins-bot: [V: 04-1] role::puppetmaster::standalone: add ferm rules to allow connecting to tcp/8140 [puppet] - 10https://gerrit.wikimedia.org/r/394101 (https://phabricator.wikimedia.org/T154150) (owner: 10Arturo Borrero Gonzalez) [10:53:11] (03PS1) 10Giuseppe Lavagetto: mediawiki::appserver::api: add load monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) [10:53:22] <_joe_> elukey: ^^ [10:53:25] <_joe_> also moritzm ^^ [10:53:37] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::appserver::api: add load monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) (owner: 10Giuseppe Lavagetto) [10:53:49] <_joe_> meh [10:54:01] <_joe_> bitten by my own rules [10:54:12] having a look in a bit, currently on HHVM 3.18.6 [10:54:18] <_joe_> you're a prickly snob, _joe_ [10:54:28] <_joe_> moritzm: ack, thanks [10:57:22] _joe_ could you please review https://gerrit.wikimedia.org/r/#/c/394101/ ? :-) [10:58:18] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3870942 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1336.eqiad.wmnet'] ``` and were **ALL** successful. [10:59:01] <_joe_> arturo: sure, gimme ~ 10 seconds [10:59:58] (03PS2) 10Giuseppe Lavagetto: mediawiki::appserver::api: add load monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) [11:00:43] !log mobrovac@tin Started restart [changeprop/deploy@3c4f51d]: Pick up the new RESTBase DNS [11:00:47] (03CR) 10Muehlenhoff: role::puppetmaster::standalone: add ferm rules to allow connecting to tcp/8140 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/394101 (https://phabricator.wikimedia.org/T154150) (owner: 10Arturo Borrero Gonzalez) [11:00:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:04] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1336.*.eqiad.wmnet [11:03:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:15] last jobrunner in service! [11:03:19] thanks moritzm [11:04:18] (03PS5) 10Arturo Borrero Gonzalez: role::puppetmaster::standalone: add ferm rules to allow connecting to tcp/8140 [puppet] - 10https://gerrit.wikimedia.org/r/394101 (https://phabricator.wikimedia.org/T154150) [11:04:24] <_joe_> arturo: so, your problem is that in our coding style guide, that is more appropriately a profile than a role. so either you override jenkins-bot and merge, or you convert the class to role/profile [11:04:39] (03CR) 10jerkins-bot: [V: 04-1] role::puppetmaster::standalone: add ferm rules to allow connecting to tcp/8140 [puppet] - 10https://gerrit.wikimedia.org/r/394101 (https://phabricator.wikimedia.org/T154150) (owner: 10Arturo Borrero Gonzalez) [11:04:58] _joe_: how to override jenkins-bot ? is there a simple way to do it? [11:05:21] I mean, is this simply ignoring it? [11:05:28] <_joe_> arturo: remove jenkins-bot from the reviers and give R+2 AND V+2 [11:05:32] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3870959 (10elukey) >>! In T165519#3847554, @elukey wrote: > Next steps: > > 1) image all the hosts in https://gerrit.wikimedia.org/r/397749 and put them in pro... [11:05:35] <_joe_> *reviewers [11:05:40] ok thanks [11:05:56] <_joe_> arturo: that not something you should do normally, btw [11:06:18] _joe_ yeah, I guess [11:06:20] <_joe_> the check is there in order to annoy people into converting our codebase after all :P [11:06:28] :-) [11:06:51] (03CR) 10Arturo Borrero Gonzalez: [V: 032 C: 032] role::puppetmaster::standalone: add ferm rules to allow connecting to tcp/8140 [puppet] - 10https://gerrit.wikimedia.org/r/394101 (https://phabricator.wikimedia.org/T154150) (owner: 10Arturo Borrero Gonzalez) [11:08:18] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1013 - https://phabricator.wikimedia.org/T184053#3870962 (10fgiunchedi) a:03Cmjohnson `sdf` failed, @Cmjohnson please replace, thanks! ``` Jan 3 09:44:52 ms-be1013 kernel: [17084211.661701] sd 0:2:5:0: [sdf] tag#30 FAILED Result: hostbyte=DID_BAD_TARGET driv... [11:08:24] (03CR) 10Muehlenhoff: role::puppetmaster::standalone: add ferm rules to allow connecting to tcp/8140 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/394101 (https://phabricator.wikimedia.org/T154150) (owner: 10Arturo Borrero Gonzalez) [11:17:07] (03CR) 10Filippo Giunchedi: smart: special case for backports on Ubuntu (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [11:17:10] (03CR) 10Elukey: mediawiki::appserver::api: add load monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) (owner: 10Giuseppe Lavagetto) [11:24:56] (03PS1) 10Arturo Borrero Gonzalez: role::puppetmaster::standalone: rename ferm rule to be more explicit [puppet] - 10https://gerrit.wikimedia.org/r/401716 [11:25:50] moritzm: ^^^ [11:26:33] <_joe_> elukey: yeah, meh [11:27:04] (03CR) 10Muehlenhoff: [C: 031] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/401716 (owner: 10Arturo Borrero Gonzalez) [11:27:38] (03CR) 10Arturo Borrero Gonzalez: [C: 032] role::puppetmaster::standalone: rename ferm rule to be more explicit [puppet] - 10https://gerrit.wikimedia.org/r/401716 (owner: 10Arturo Borrero Gonzalez) [11:28:08] (03PS2) 10Alexandros Kosiaris: openldap: Amend to rename docker to contint-docker [puppet] - 10https://gerrit.wikimedia.org/r/401700 [11:28:10] (03PS2) 10Alexandros Kosiaris: admin: Flatten 2 levels of arrays in unique_users [puppet] - 10https://gerrit.wikimedia.org/r/401701 [11:28:12] (03PS3) 10Alexandros Kosiaris: Add all ops members to docker group [puppet] - 10https://gerrit.wikimedia.org/r/401492 [11:28:16] !log mobrovac@tin Started deploy [mathoid/deploy@91648aa]: (no justification provided) [11:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:28:47] (03CR) 10jerkins-bot: [V: 04-1] Add all ops members to docker group [puppet] - 10https://gerrit.wikimedia.org/r/401492 (owner: 10Alexandros Kosiaris) [11:28:56] !log mobrovac@tin Finished deploy [mathoid/deploy@91648aa]: (no justification provided) (duration: 00m 40s) [11:29:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:33] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1101:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401713 (owner: 10Marostegui) [11:29:48] !log mobrovac@tin Started deploy [mathoid/deploy@91648aa]: Update to Mathoid v0.7.0 in codfw only for T183557 [11:29:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:59] T183557: Mathoid v0.7.0 not accepting chem formula - https://phabricator.wikimedia.org/T183557 [11:30:53] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401713 (owner: 10Marostegui) [11:31:05] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401713 (owner: 10Marostegui) [11:31:07] (03CR) 10Alexandros Kosiaris: [C: 032] openldap: Amend to rename docker to contint-docker [puppet] - 10https://gerrit.wikimedia.org/r/401700 (owner: 10Alexandros Kosiaris) [11:32:03] !log mobrovac@tin Finished deploy [mathoid/deploy@91648aa]: Update to Mathoid v0.7.0 in codfw only for T183557 (duration: 02m 15s) [11:32:11] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1101:3317 - T174569 (duration: 01m 02s) [11:32:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:23] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [11:33:00] (03PS3) 10Giuseppe Lavagetto: mediawiki::appserver::api: add load monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) [11:33:44] (03PS1) 10Marostegui: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401717 (https://phabricator.wikimedia.org/T174569) [11:35:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401717 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [11:36:37] 10Puppet, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): role::puppetmaster::standalone has no firewall rule for port 8140 - https://phabricator.wikimedia.org/T154150#3870989 (10aborrero) 05Open>03Resolved Merged both: https://gerrit.wikimedia.org/r/394101 https://gerrit.wikimedia.or... [11:37:14] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401717 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [11:37:36] (03PS4) 10Giuseppe Lavagetto: mediawiki::appserver::api: add load monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) [11:38:15] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401717 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [11:38:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1086 - T174569 (duration: 01m 01s) [11:38:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:50] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [11:39:15] !log Deploy schema change on db1086 - T174569 [11:39:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:37] (03PS4) 10Alexandros Kosiaris: Add all ops members to docker group [puppet] - 10https://gerrit.wikimedia.org/r/401492 [11:41:32] (03CR) 10Alexandros Kosiaris: [C: 031] "Meh, but ok as an intermediate step while we make all these roles profiles." [puppet] - 10https://gerrit.wikimedia.org/r/401553 (owner: 10Giuseppe Lavagetto) [11:42:29] (03CR) 10Alexandros Kosiaris: [C: 031] rancid: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/399968 (owner: 10Dzahn) [11:43:17] (03CR) 10Elukey: [C: 031] "https://puppet-compiler.wmflabs.org/compiler03/9523/mw1189.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) (owner: 10Giuseppe Lavagetto) [11:44:01] <_joe_> elukey: it's still wrong [11:44:03] <_joe_> :P [11:44:10] (03PS5) 10Giuseppe Lavagetto: mediawiki::appserver::api: add load monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) [11:47:11] !log boot ganeti1006. It exhibited page allocation stalls on Jan 1. T181121 [11:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:21] T181121: Hardware errors on ganeti1005- ganeti1008 - https://phabricator.wikimedia.org/T181121 [11:47:36] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::appserver::api: add load monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401714 (https://phabricator.wikimedia.org/T182568) (owner: 10Giuseppe Lavagetto) [11:48:13] _joe_ if it was I didn't see it :D [11:48:21] (03PS3) 10Jcrespo: mariadb: Promote db2039 to be the new master of s6-codfw [puppet] - 10https://gerrit.wikimedia.org/r/401706 (https://phabricator.wikimedia.org/T176243) [11:49:23] !log disabling puppet on db2039 and db2028 in preparation for gerrit:401706 deployment [11:49:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:47] (03CR) 10Jcrespo: [C: 032] mariadb: Promote db2039 to be the new master of s6-codfw [puppet] - 10https://gerrit.wikimedia.org/r/401706 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [11:52:06] !log upgrade and restart db2039 [11:52:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:47] (03PS14) 10MarcoAurelio: Extension:Translate default permissions for Wikimedia wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385953 (https://phabricator.wikimedia.org/T178793) [11:53:07] (03PS6) 10MarcoAurelio: Close wikimania2017.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396581 (https://phabricator.wikimedia.org/T182493) [11:53:18] (03PS3) 10MarcoAurelio: Setup some namespace aliases for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399982 (https://phabricator.wikimedia.org/T183612) [11:53:20] (03PS3) 10MarcoAurelio: Set category collation to uca-es-u-kn for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401081 (https://phabricator.wikimedia.org/T183802) [11:57:08] (03PS2) 10Giuseppe Lavagetto: appservers: move mw1180-1188 to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/401479 (https://phabricator.wikimedia.org/T183895) [11:57:30] !log upgrading app servers in deployment-prep to hhvm 3.18.5+dfsg-1+wmf2 (which contains the patches from 3.18.6) [11:57:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:45] (03CR) 10Giuseppe Lavagetto: [C: 032] appservers: move mw1180-1188 to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/401479 (https://phabricator.wikimedia.org/T183895) (owner: 10Giuseppe Lavagetto) [11:59:44] 10Operations, 10Goal, 10User-Elukey, 10User-fgiunchedi: Add Prometheus exporter to Jenkins instances - https://phabricator.wikimedia.org/T182759#3871043 (10fgiunchedi) @hashar on what url are the metrics available? I tried localhost:8080/monitoring on contint1001 but yields 404: ``` contint1001:~$ curl lo... [12:00:37] 10Operations, 10Goal, 10User-Elukey, 10User-fgiunchedi: Add Prometheus exporter to Jenkins instances - https://phabricator.wikimedia.org/T182759#3871045 (10fgiunchedi) re: jmx-exporter we can deploy that in addition to jenkins' one, so we get standard/uniform jvm metrics as per parent task [12:02:47] (03PS2) 10Giuseppe Lavagetto: hieradata: remove old leftovers [puppet] - 10https://gerrit.wikimedia.org/r/401480 [12:03:20] (03CR) 10Giuseppe Lavagetto: [C: 032] hieradata: remove old leftovers [puppet] - 10https://gerrit.wikimedia.org/r/401480 (owner: 10Giuseppe Lavagetto) [12:12:20] !log mobrovac@tin Started deploy [mathoid/deploy@63b2ddc]: Bring back codfw in sync with eqiad - T183557 [12:12:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:32] T183557: Mathoid v0.7.0 not accepting chem formula - https://phabricator.wikimedia.org/T183557 [12:13:09] Reedy: could you please run a dry-run script for me? mwscript namespaceDupes.php --wiki=eswiki and Phabricator paste its contents? I've got a change to merge for eswiki namespaces today and I want to know the namespace status first, just in case. Thanks. [12:14:30] !log mobrovac@tin Finished deploy [mathoid/deploy@63b2ddc]: Bring back codfw in sync with eqiad - T183557 (duration: 02m 10s) [12:14:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:16] (03CR) 10Alexandros Kosiaris: "https://puppet-compiler.wmflabs.org/compiler02/9519/ says ok" [puppet] - 10https://gerrit.wikimedia.org/r/401701 (owner: 10Alexandros Kosiaris) [12:17:46] Hauskatze: Nothing to pastebin [12:22:41] Reedy: well I guess that's good [12:22:44] thanks [12:22:47] Makes a change ;P [12:26:09] (03CR) 10Bmansurov: "https://phabricator.wikimedia.org/T183982 has been resolved." [puppet] - 10https://gerrit.wikimedia.org/r/401597 (https://phabricator.wikimedia.org/T183916) (owner: 10Dzahn) [12:27:34] 10Operations, 10Continuous-Integration-Config: tox 2.5.0 on phabricator-jessie-diffs fails with ERROR: Commands not specified - https://phabricator.wikimedia.org/T184060#3871088 (10fgiunchedi) [12:53:22] !log importing linux 4.9.65-3+deb9u1~bpo8+1 for jessie-wikimedia to apt.wikimedia.org [12:53:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:54:50] (03PS1) 10Gilles: Upgrade to 1.8 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/401725 (https://phabricator.wikimedia.org/T183907) [12:59:49] 10Operations, 10ops-esams, 10Epic: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking) - https://phabricator.wikimedia.org/T184061#3871139 (10mark) p:05Triage>03Normal [13:02:25] 10Operations, 10ops-esams, 10Epic: Remove all decommissioned hardware - https://phabricator.wikimedia.org/T184063#3871169 (10mark) p:05Triage>03Normal [13:04:34] 10Operations, 10ops-esams, 10hardware-requests: Decommission bast3001 - https://phabricator.wikimedia.org/T159480#3871188 (10mark) [13:04:35] 10Operations, 10ops-esams, 10Patch-For-Review, 10User-fgiunchedi: Decommission esams ms-fe / ms-be - https://phabricator.wikimedia.org/T169518#3871187 (10mark) [13:04:39] 10Operations, 10ops-esams, 10hardware-requests: decom cp3011-22 (12 machines) - https://phabricator.wikimedia.org/T130883#3871189 (10mark) [13:04:41] 10Operations, 10ops-esams, 10DC-Ops, 10hardware-requests: Decomission amssq31-62 (32 hosts) - https://phabricator.wikimedia.org/T95742#3871190 (10mark) [13:04:43] 10Operations, 10ops-esams, 10Epic: Remove all decommissioned hardware - https://phabricator.wikimedia.org/T184063#3871186 (10mark) [13:04:45] 10Operations, 10ops-esams, 10DC-Ops: decom amslvs1-4 (dc work) - https://phabricator.wikimedia.org/T87790#3871192 (10mark) [13:04:47] 10Operations, 10ops-esams, 10DC-Ops, 10Patch-For-Review: decommission cp3001 & cp3002 - https://phabricator.wikimedia.org/T94215#3871191 (10mark) [13:06:09] 10Operations, 10ops-esams, 10Epic: Remove all decommissioned hardware - https://phabricator.wikimedia.org/T184063#3871169 (10mark) [13:06:11] 10Operations, 10ops-esams, 10hardware-requests: Decommission cp300[3456] - https://phabricator.wikimedia.org/T167376#3871198 (10mark) [13:07:00] 10Operations, 10ops-esams, 10Epic: Remove all decommissioned hardware - https://phabricator.wikimedia.org/T184063#3871201 (10mark) [13:07:12] !log uploaded hhvm 3.18.5+dfsg-1+wmf2 (including the fixes from 3.18.6) for jessie-wikimedia to apt.wikimedia.org [13:07:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:30] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401726 [13:09:33] 10Operations, 10ops-esams: Prepare racks OE14, OE15 and OE16 with new infrastructure - https://phabricator.wikimedia.org/T184064#3871205 (10mark) p:05Triage>03Normal [13:11:19] 10Operations, 10ops-esams: Setup new access switches - https://phabricator.wikimedia.org/T184065#3871216 (10mark) p:05Triage>03Normal [13:13:10] 10Operations, 10ops-esams: Procure and install new PDUs - https://phabricator.wikimedia.org/T184066#3871227 (10mark) p:05Triage>03Normal [13:15:43] 10Operations, 10ops-esams, 10netops: set up cr3-esams - https://phabricator.wikimedia.org/T174616#3871242 (10mark) [13:15:55] 10Operations, 10ops-esams, 10Epic: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking) - https://phabricator.wikimedia.org/T184061#3871244 (10mark) [13:15:58] 10Operations, 10ops-esams, 10netops: set up cr3-esams - https://phabricator.wikimedia.org/T174616#3567954 (10mark) [13:17:16] 10Operations, 10ops-esams, 10netops: Complete router migration from cr1-esams to cr3-esams - https://phabricator.wikimedia.org/T184067#3871245 (10mark) p:05Triage>03Normal [13:17:32] !log upgrading mw1261-mw1265 to HHVM 3.18.5+dfsg-1+wmf2 [13:17:35] 10Operations, 10ops-esams, 10netops: set up cr3-esams - https://phabricator.wikimedia.org/T174616#3567954 (10mark) [13:17:38] 10Operations, 10ops-esams, 10netops: Complete router migration from cr1-esams to cr3-esams - https://phabricator.wikimedia.org/T184067#3871245 (10mark) [13:17:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:52] 10Operations, 10ops-esams, 10Epic: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking) - https://phabricator.wikimedia.org/T184061#3871139 (10mark) [13:17:55] 10Operations, 10ops-esams, 10netops: set up cr3-esams - https://phabricator.wikimedia.org/T174616#3871260 (10mark) [13:18:59] 10Operations, 10ops-esams, 10Epic: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking) - https://phabricator.wikimedia.org/T184061#3871262 (10mark) [13:19:01] 10Operations, 10ops-esams, 10netops: Setup esams atlas anchor - https://phabricator.wikimedia.org/T174637#3871261 (10mark) [13:22:46] 10Operations, 10ops-esams, 10hardware-requests: Procure and install LVS and miscellaneous servers - https://phabricator.wikimedia.org/T184068#3871263 (10mark) p:05Triage>03Normal [13:28:43] 10Operations, 10ops-esams, 10DC-Ops, 10netops: cr2-esams temperature warning - https://phabricator.wikimedia.org/T176816#3871292 (10mark) p:05Normal>03High [13:37:52] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401726 (owner: 10Marostegui) [13:39:18] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401726 (owner: 10Marostegui) [13:39:29] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401726 (owner: 10Marostegui) [13:40:30] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1086 - T174569 (duration: 01m 02s) [13:40:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:41] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [13:41:31] 10Operations, 10ops-esams, 10netops: Setup esams atlas anchor - https://phabricator.wikimedia.org/T174637#3871317 (10mark) Have we acquired a new image for AS14907 yet? [14:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180103T1400). [14:00:05] Urbanecm and revi: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:20] hoi [14:00:57] (03PS3) 10Revi: Add patrol to Image-reviewer on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 (https://phabricator.wikimedia.org/T183835) [14:01:05] (03PS2) 10Giuseppe Lavagetto: api: move mw1189-1200 to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/401482 (https://phabricator.wikimedia.org/T183895) [14:01:37] (03PS2) 10Revi: Add Translation NS for kowikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401174 (https://phabricator.wikimedia.org/T183836) [14:01:55] I can SWAT today [14:02:12] great [14:02:28] Urbanecm: around for swat? [14:05:37] revi: looks like Urbanecm is not around yet, I will start with your patches, I will let you know when each of them is at mwdebug1002, ready for testing [14:05:44] okok [14:06:29] (03PS1) 10Elukey: role::graphite::alerts: add Druid realtime ingestion monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401730 [14:06:52] (03CR) 10jerkins-bot: [V: 04-1] role::graphite::alerts: add Druid realtime ingestion monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [14:07:04] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 (https://phabricator.wikimedia.org/T183835) (owner: 10Revi) [14:08:03] (03PS2) 10Elukey: role::graphite::alerts: add Druid realtime ingestion monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401730 [14:08:29] (03Merged) 10jenkins-bot: Add patrol to Image-reviewer on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 (https://phabricator.wikimedia.org/T183835) (owner: 10Revi) [14:08:41] (03CR) 10jenkins-bot: Add patrol to Image-reviewer on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 (https://phabricator.wikimedia.org/T183835) (owner: 10Revi) [14:11:06] revi: 401160 is at mwdebug1002, please test and let me know if I can deploy it [14:12:04] ok, wait a min... [14:13:39] hmmm weird I don't see patrol in Image-reviewer inhttps://commons.wikimedia.org/wiki/Special:ListGroupRights even when I'm on mwdebug1002.eqiad.wmnet [14:13:57] revi: oh, wait, I made a mistake, just a minute [14:14:05] (03PS3) 10Elukey: role::graphite::alerts: add Druid realtime ingestion monitoring [puppet] - 10https://gerrit.wikimedia.org/r/401730 [14:14:29] ok then meanwhile... [14:14:32] revi: sorry, try now, should be there [14:14:50] Mark others' edits as patrolled (patrol) yup. GTG [14:15:10] revi: ok to deploy? [14:15:14] zeljkof: yes [14:15:24] deploying [14:16:31] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:401160|Add patrol to Image-reviewer on Commons (T183835)]] (duration: 01m 02s) [14:16:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:43] T183835: Let Image-reviewers on Commons patrol changes - https://phabricator.wikimedia.org/T183835 [14:17:02] revi: deployed, please check on production [14:17:04] (03PS1) 10Ottomata: Add klog alias to otto's bash aliases for tailing kafka logs [puppet] - 10https://gerrit.wikimedia.org/r/401732 [14:17:14] (03PS3) 10Zfilipin: Add Translation NS for kowikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401174 (https://phabricator.wikimedia.org/T183836) (owner: 10Revi) [14:17:41] (03CR) 10Elukey: Add klog alias to otto's bash aliases for tailing kafka logs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401732 (owner: 10Ottomata) [14:17:50] zeljkof: 401160 (Commons one) confirmed [14:18:04] revi: thanks [14:18:46] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401174 (https://phabricator.wikimedia.org/T183836) (owner: 10Revi) [14:19:49] (03PS2) 10Ottomata: Add klog alias to otto's bash aliases for tailing kafka logs [puppet] - 10https://gerrit.wikimedia.org/r/401732 [14:19:53] (03PS3) 10Giuseppe Lavagetto: api: move mw1189-1200 to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/401482 (https://phabricator.wikimedia.org/T183895) [14:20:11] (03Merged) 10jenkins-bot: Add Translation NS for kowikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401174 (https://phabricator.wikimedia.org/T183836) (owner: 10Revi) [14:20:22] (03CR) 10jenkins-bot: Add Translation NS for kowikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401174 (https://phabricator.wikimedia.org/T183836) (owner: 10Revi) [14:20:49] (03CR) 10Ottomata: [C: 032] Add klog alias to otto's bash aliases for tailing kafka logs [puppet] - 10https://gerrit.wikimedia.org/r/401732 (owner: 10Ottomata) [14:21:31] revi: 401174 is at mwdebug1002, please test and let me know if I can deploy it [14:21:52] Urbanecm: around for EU SWAT? [14:24:25] zeljkof: I don't see the new NS despite being in mwdebug1002 [14:24:48] (03PS3) 10Ottomata: Parameterize kafka.ssl.cipher.suites [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/399689 (https://phabricator.wikimedia.org/T177225) [14:24:50] revi: hm... [14:24:59] https://ko.wikisource.org/w/index.php?title=%ED%8A%B9%EC%88%98:%EA%B2%80%EC%83%89&profile=advanced&search=&fulltext=1 there should be "번역" and "번역토론" [14:25:27] https://tppr.me/ftamm [14:26:09] (03CR) 10Ottomata: [C: 032] Parameterize kafka.ssl.cipher.suites [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/399689 (https://phabricator.wikimedia.org/T177225) (owner: 10Ottomata) [14:27:21] revi: try again, I can see them now [14:27:31] yup [14:27:34] zeljkof: good to deploy [14:27:44] revi: ok, deploying [14:29:02] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:401174|Add Translation NS for kowikisource (T183836)]] (duration: 01m 00s) [14:29:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:14] T183836: Create a namespace for Korean Wikisource - https://phabricator.wikimedia.org/T183836 [14:29:20] revi: 401174 is deployed, please check on production [14:29:35] zeljkof: confirmed, thansk! [14:29:37] thanks* [14:29:52] and that's all for me today, maybe Urbanecm is not around for today? :P [14:30:06] revi: thanks for deploying with #releng! :) [14:30:15] yes, looks like he is not around [14:30:19] (03CR) 10Elukey: [C: 031] Add ssl_array and ssl_string entries to kafka_config [puppet] - 10https://gerrit.wikimedia.org/r/398863 (owner: 10Ottomata) [14:30:21] !log EU SWAT finished [14:30:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:35] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 1.8 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/401725 (https://phabricator.wikimedia.org/T183907) (owner: 10Gilles) [14:31:57] (03PS2) 10Ottomata: Set ssl.cipher.suites and ssl.enabled.protocols for Kafka jumbo and varnishkafka (canary) [puppet] - 10https://gerrit.wikimedia.org/r/399700 (https://phabricator.wikimedia.org/T167304) [14:32:28] (03CR) 10jerkins-bot: [V: 04-1] Set ssl.cipher.suites and ssl.enabled.protocols for Kafka jumbo and varnishkafka (canary) [puppet] - 10https://gerrit.wikimedia.org/r/399700 (https://phabricator.wikimedia.org/T167304) (owner: 10Ottomata) [14:35:24] (03PS3) 10Ottomata: Set cipher.suites and ssl.enabled.protocols for jumbo and varnishkafka (canary) [puppet] - 10https://gerrit.wikimedia.org/r/399700 (https://phabricator.wikimedia.org/T167304) [14:37:11] (03CR) 10Ottomata: "PCC is happy https://puppet-compiler.wmflabs.org/compiler02/9525/" [puppet] - 10https://gerrit.wikimedia.org/r/398863 (owner: 10Ottomata) [14:39:02] (03PS1) 10Jcrespo: mariadb: Switchover s6-codfw master to db2039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401734 (https://phabricator.wikimedia.org/T176243) [14:39:27] !log rollout python-thumbor-wikimedia 1.8 - T183907 [14:39:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:40] T183907: Thumbor 500 while thumbnailing some webm files - https://phabricator.wikimedia.org/T183907 [14:40:15] (03CR) 10Giuseppe Lavagetto: [C: 032] api: move mw1189-1200 to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/401482 (https://phabricator.wikimedia.org/T183895) (owner: 10Giuseppe Lavagetto) [14:40:21] (03PS4) 10Giuseppe Lavagetto: api: move mw1189-1200 to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/401482 (https://phabricator.wikimedia.org/T183895) [14:41:17] (03PS1) 10Rush: labstore: add comment to interval options for nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/401735 [14:44:32] (03CR) 10Ottomata: [C: 032] Add ssl_array and ssl_string entries to kafka_config [puppet] - 10https://gerrit.wikimedia.org/r/398863 (owner: 10Ottomata) [14:44:36] (03PS3) 10Ottomata: Add ssl_array and ssl_string entries to kafka_config [puppet] - 10https://gerrit.wikimedia.org/r/398863 [14:44:38] (03CR) 10Ottomata: [V: 032 C: 032] Add ssl_array and ssl_string entries to kafka_config [puppet] - 10https://gerrit.wikimedia.org/r/398863 (owner: 10Ottomata) [14:46:57] (03PS4) 10Ottomata: Set cipher.suites and ssl.enabled.protocols for jumbo and varnishkafka (canary) [puppet] - 10https://gerrit.wikimedia.org/r/399700 (https://phabricator.wikimedia.org/T167304) [14:47:39] wow https://gerrit.wikimedia.org/r/#/c/401160 (just deployed in latest swat) reduced the 404s on thumbor by 9/10 cc gilles [14:47:46] (03CR) 10Elukey: [C: 04-1] "Since it is a prometheus base check, might be good elsewhere?" [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [14:48:29] initially I thought something was broken but doesn't look like it [14:48:39] godog: by having this new set of users clean things up? [14:49:40] seems a bit odd, I don't see the direct link with that deployment [14:49:42] (03PS1) 10Ottomata: Bump specified kafka version to 1.0.0-1 [puppet] - 10https://gerrit.wikimedia.org/r/401737 [14:49:46] I thought most of our 404s were from hotlinked stuff [14:50:14] (03CR) 10Ottomata: [C: 032] Bump specified kafka version to 1.0.0-1 [puppet] - 10https://gerrit.wikimedia.org/r/401737 (owner: 10Ottomata) [14:51:34] gilles: fair enough, I'll take a closer look [14:52:30] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=4&from=1514904740927&to=1514991140927 [14:52:57] It seems there was an increase at 11:17 and a decrease at 14:20 [14:54:04] (03PS5) 10Ottomata: Set cipher.suites and ssl.enabled.protocols for jumbo and varnishkafka (canary) [puppet] - 10https://gerrit.wikimedia.org/r/399700 (https://phabricator.wikimedia.org/T167304) [14:54:05] indeed [14:55:24] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: Decommission mw1180-1200 - https://phabricator.wikimedia.org/T183895#3871527 (10Joe) [14:55:35] !log switchover db2028 to db2039 as codfw-s6-master [14:55:37] 10Operations, 10ops-eqiad, 10User-Elukey, 10User-Joe: Decommission mw1180-1200 - https://phabricator.wikimedia.org/T183895#3866486 (10Joe) p:05Normal>03High [14:55:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:19] 10Operations, 10ops-eqiad, 10User-Elukey, 10User-Joe: Decommission mw1180-1200 - https://phabricator.wikimedia.org/T183895#3866486 (10Joe) @Cmjohnson whenever you're able to complete the decommission of these systems, we will have room in row C for the remaining new appservers. [14:56:43] (03CR) 10Ottomata: [C: 032] "Seems good, the canary fail is due to missing fake secret https://puppet-compiler.wmflabs.org/compiler02/9528/" [puppet] - 10https://gerrit.wikimedia.org/r/399700 (https://phabricator.wikimedia.org/T167304) (owner: 10Ottomata) [14:56:52] (03Abandoned) 10Elukey: role::graphite::alerts::reqstat: render correctly the dashboard qs [puppet] - 10https://gerrit.wikimedia.org/r/397350 (owner: 10Elukey) [14:57:11] (03Abandoned) 10Elukey: profile::hive::server: set hive port [puppet] - 10https://gerrit.wikimedia.org/r/395980 (owner: 10Elukey) [14:59:49] yeah that was all curl [15:00:06] ECDHE-ECDSA-AES256-GCM-SHA384 [15:00:07] oops [15:00:15] !log restarting kafka-jumbo brokers to enable tls version and cipher suite restrictions [15:00:16] :) [15:00:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:07] !log roll-restart thumbor in eqiad after upgrade - T183907 [15:01:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:17] T183907: Thumbor 500 while thumbnailing some webm files - https://phabricator.wikimedia.org/T183907 [15:01:56] (03PS1) 10Rush: nova: add compute profiles for labtestn virt role [puppet] - 10https://gerrit.wikimedia.org/r/401740 [15:03:02] (03CR) 10Elukey: "I like the idea and the code looks good but I have not enough experience with the code to do a valid review. I volunteer to test it during" [puppet] - 10https://gerrit.wikimedia.org/r/399161 (https://phabricator.wikimedia.org/T182702) (owner: 10Volans) [15:03:09] 10Operations, 10ops-eqiad, 10User-Elukey, 10User-Joe: Decommission mw1180-1200 - https://phabricator.wikimedia.org/T183895#3871542 (10Joe) p:05High>03Normal [15:03:38] 10Operations, 10ops-eqiad, 10User-Elukey, 10User-Joe: Decommission mw1180-1200 - https://phabricator.wikimedia.org/T183895#3866486 (10Joe) [15:04:15] (03PS2) 10Rush: labstore: add comment to interval options for nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/401735 [15:05:00] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, 10User-Elukey: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3871545 (10Ottomata) > The sigalgs lists being negotiated for mutual certificate-based auth seem to include some weak options Ah I just real... [15:05:57] (03CR) 10Rush: [C: 032] labstore: add comment to interval options for nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/401735 (owner: 10Rush) [15:06:46] (03CR) 10Rush: [C: 032] nova: add compute profiles for labtestn virt role [puppet] - 10https://gerrit.wikimedia.org/r/401740 (owner: 10Rush) [15:06:54] (03PS2) 10Rush: nova: add compute profiles for labtestn virt role [puppet] - 10https://gerrit.wikimedia.org/r/401740 [15:08:35] !log stopping db2028's mysql to apply new config [15:08:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:52] (03CR) 10Jcrespo: [C: 032] mariadb: Switchover s6-codfw master to db2039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401734 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [15:12:18] (03Merged) 10jenkins-bot: mariadb: Switchover s6-codfw master to db2039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401734 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [15:12:33] (03CR) 10jenkins-bot: mariadb: Switchover s6-codfw master to db2039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401734 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [15:14:53] !log jynus@tin Synchronized wmf-config/db-codfw.php: Switchover s6-master db2028 to db2039 (duration: 01m 01s) [15:15:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:20] (03PS1) 10Rush: nova: fix dependency order on dir and mount for instances [puppet] - 10https://gerrit.wikimedia.org/r/401742 (https://phabricator.wikimedia.org/T171494) [15:20:47] (03CR) 10jerkins-bot: [V: 04-1] nova: fix dependency order on dir and mount for instances [puppet] - 10https://gerrit.wikimedia.org/r/401742 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:22:19] (03PS2) 10Rush: nova: fix dependency order on dir and mount for instances [puppet] - 10https://gerrit.wikimedia.org/r/401742 (https://phabricator.wikimedia.org/T171494) [15:25:30] (03CR) 10Muehlenhoff: [C: 031] smart: special case for backports on Ubuntu (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [15:28:22] !log niharika29@tin Started deploy [scholarships/scholarships@ec05ae7]: Update i18n files [15:28:24] !log niharika29@tin Finished deploy [scholarships/scholarships@ec05ae7]: Update i18n files (duration: 00m 02s) [15:28:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:09] (03PS2) 10Arturo Borrero Gonzalez: aptly: Update published repo distributions [puppet] - 10https://gerrit.wikimedia.org/r/399667 (https://phabricator.wikimedia.org/T183235) (owner: 10BryanDavis) [15:34:10] * Hauskatze .. González :P [15:35:52] (03PS3) 10Rush: nova: fix dependency order on dir and mount for instances [puppet] - 10https://gerrit.wikimedia.org/r/401742 (https://phabricator.wikimedia.org/T171494) [15:36:10] (03PS1) 10Jcrespo: mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) [15:36:56] (03CR) 10Marostegui: [C: 031] mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) (owner: 10Jcrespo) [15:37:30] (03CR) 10jerkins-bot: [V: 04-1] mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) (owner: 10Jcrespo) [15:38:11] !log elukey@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=mw2251.*.wmnet [15:38:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:44] (03PS2) 10Jcrespo: mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) [15:40:09] (03CR) 10jerkins-bot: [V: 04-1] mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) (owner: 10Jcrespo) [15:40:17] (03PS4) 10Rush: nova: fix dependency order on dir and mount for instances [puppet] - 10https://gerrit.wikimedia.org/r/401742 (https://phabricator.wikimedia.org/T171494) [15:40:32] (03PS2) 10Filippo Giunchedi: smart: pin smartmontools to backports on Debian [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) [15:40:53] (03PS3) 10Jcrespo: mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) [15:40:55] (03CR) 10Rush: [C: 032] nova: fix dependency order on dir and mount for instances [puppet] - 10https://gerrit.wikimedia.org/r/401742 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:42:42] (03CR) 10Jcrespo: [C: 032] mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) (owner: 10Jcrespo) [15:44:18] (03Merged) 10jenkins-bot: mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) (owner: 10Jcrespo) [15:45:42] (03PS3) 10Marostegui: db-eqiad.php: Set s5 on read_only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401434 (https://phabricator.wikimedia.org/T177208) [15:46:19] !log jynus@tin Synchronized wmf-config/db-codfw.php: "Depool" pc2005 (duration: 01m 02s) [15:46:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:09] herron: I think that 'Error: /Stage[main]/Apparmor/Service[apparmor]: Provider init is not functional on this host' popped up on labvirts with the puppet client package upgrade, seen anything like that before? [15:47:13] (03CR) 10jenkins-bot: mariadb: "Depool" pc2005 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401744 (https://phabricator.wikimedia.org/T183750) (owner: 10Jcrespo) [15:47:40] herron: actually it may not be be limited to virts I see it on labcontrol* too, taking a look [15:48:30] !log stop pc2005's database for maintenance T183750 [15:48:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:41] T183750: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750 [15:49:42] maybe we could take the time to upgrade pc2005 to stretch... [15:51:19] !log niharika29@tin Started deploy [scholarships/scholarships@ec05ae7]: Remove outdated i18n files [15:51:20] !log niharika29@tin Finished deploy [scholarships/scholarships@ec05ae7]: Remove outdated i18n files (duration: 00m 02s) [15:51:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:51:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:02] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3871669 (10jcrespo) @papaul pc2005 server is up, but mysql is depooled and down, downtime'd for a day on incinga and can be brought down at any time now [15:59:05] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/9529/" [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [15:59:39] (03CR) 10Filippo Giunchedi: smart: pin smartmontools to backports on Debian (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [16:02:05] !log drop unused keyspaces in legacy restbase cluster - T183745 [16:02:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:02:16] T183745: FY17/18 Q3 Program 7 Services Goal: Full migration to Cassandra 3 - https://phabricator.wikimedia.org/T183745 [16:02:35] ottomata[m]: you may have the wrong ticket linked? [16:02:48] (just noticed update on ganglia ticket but it's about TLS ) [16:04:26] 10Operations, 10Puppet: Puppet - Error: /Stage[main]/Apparmor/Service[apparmor]: Provider init is not functional on this host - https://phabricator.wikimedia.org/T184017#3869845 (10chasemp) I noticed this on the labvirts and labcontrol servers this morning too. I believe there is no reason for the init specif... [16:14:04] !log powering down mw2251 for memory replacement and firmware uprade [16:14:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:13] 10Operations, 10Puppet: Puppet - Error: /Stage[main]/Apparmor/Service[apparmor]: Provider init is not functional on this host - https://phabricator.wikimedia.org/T184017#3871751 (10herron) Ok, I'll prep a patch to remove the provider attribute [16:16:13] (03PS3) 10Filippo Giunchedi: smart: pin smartmontools to backports on Debian [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) [16:17:10] (03CR) 10Filippo Giunchedi: [C: 032] smart: pin smartmontools to backports on Debian [puppet] - 10https://gerrit.wikimedia.org/r/401707 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [16:18:08] !log otto@tin Started deploy [statsv/statsv@c390cdf]: no-op deployment of statsv with support for multiple topics [16:18:11] !log otto@tin Finished deploy [statsv/statsv@c390cdf]: no-op deployment of statsv with support for multiple topics (duration: 00m 03s) [16:18:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:39] !log otto@tin Started deploy [statsv/statsv@0a86be8]: revert [16:18:41] !log otto@tin Finished deploy [statsv/statsv@0a86be8]: revert (duration: 00m 02s) [16:18:45] 10Operations, 10Puppet, 10cloud-services-team: Puppet - Error: /Stage[main]/Apparmor/Service[apparmor]: Provider init is not functional on this host - https://phabricator.wikimedia.org/T184017#3871768 (10chasemp) [16:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:41] (03PS1) 10Herron: apparmor: remove service provider attribute [puppet] - 10https://gerrit.wikimedia.org/r/401748 (https://phabricator.wikimedia.org/T184017) [16:20:25] (03PS4) 10Elukey: role::icinga: add prometheus alerts profile [puppet] - 10https://gerrit.wikimedia.org/r/401730 [16:25:35] (03CR) 10Elukey: "From the raw pcc point of view (https://puppet-compiler.wmflabs.org/compiler02/9530/einsteinium.wikimedia.org/) the change looks good, but" [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [16:26:04] (03PS3) 10Dzahn: rancid: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/399968 [16:30:41] (03CR) 10Herron: [C: 032] apparmor: remove service provider attribute [puppet] - 10https://gerrit.wikimedia.org/r/401748 (https://phabricator.wikimedia.org/T184017) (owner: 10Herron) [16:30:57] (03CR) 10Jonas Kress (WMDE): [C: 031] Don’t check constraints on example properties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399825 (https://phabricator.wikimedia.org/T183267) (owner: 10Lucas Werkmeister (WMDE)) [16:31:58] (03CR) 10Filippo Giunchedi: role::icinga: add prometheus alerts profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [16:33:48] (03CR) 10Elukey: role::icinga: add prometheus alerts profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [16:36:46] 10Operations, 10Puppet: Trusty puppet 4 approach - https://phabricator.wikimedia.org/T182894#3871824 (10herron) [16:36:50] 10Operations, 10Puppet, 10cloud-services-team, 10Patch-For-Review: Puppet - Error: /Stage[main]/Apparmor/Service[apparmor]: Provider init is not functional on this host - https://phabricator.wikimedia.org/T184017#3871821 (10herron) 05Open>03Resolved a:03herron Looking better now! ``` Info: Using con... [16:37:02] (03PS5) 10Elukey: role::icinga: add prometheus alerts profile [puppet] - 10https://gerrit.wikimedia.org/r/401730 [16:42:03] (03PS4) 10Dzahn: rancid: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/399968 [16:45:41] (03CR) 10Elukey: "new pcc: https://puppet-compiler.wmflabs.org/compiler02/9531/einsteinium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [16:51:47] !log powering down pc2005 for maintenance [16:51:52] 10Operations, 10Performance-Team, 10HHVM, 10Patch-For-Review: HHVM hangs on the API cluster - https://phabricator.wikimedia.org/T184048#3871895 (10Imarlier) @Joe Anything that you need from us on this? Or are you all set as far as further tracing and getting a bug reported? [16:51:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:57:24] 10Operations, 10ops-codfw: mw2251 failed memory dimm - https://phabricator.wikimedia.org/T181263#3871908 (10Papaul) a:05Papaul>03elukey Memory replacement complete Upgrade IDRAC from version 2.41 to 2.50 Upgrade BIOS from version 2.3.4 to 2.6.0 Server is back up @elukey [17:02:03] !log performing schema change on db2039 (s6) [17:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:38] (03CR) 10Filippo Giunchedi: [C: 031] role::icinga: add prometheus alerts profile [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [17:07:59] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw2251.*.wmnet [17:08:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:04] 10Operations, 10ops-codfw: mw2251 failed memory dimm - https://phabricator.wikimedia.org/T181263#3871954 (10elukey) 05Open>03Resolved Did a scap pull, set the host to pooled=yes and checked apache metrics. Everything looks good! Closing the task, let's re-open if it gives problems again. [17:22:22] (03CR) 10Volans: "small typo" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [17:22:45] ahhaha [17:23:18] fixing thanks! [17:23:28] ;) [17:23:40] sooner or later I'll get a +1 from you without comments [17:23:46] rotfl [17:23:50] it is in my achievement list for 2018 [17:24:03] *wishilist [17:24:11] uff can't write today [17:24:39] ahahah [17:25:00] (03PS6) 10Elukey: role::icinga: add prometheus alerts profile [puppet] - 10https://gerrit.wikimedia.org/r/401730 [17:25:10] volans: --^ [17:26:31] elukey: eqiad only? [17:27:19] yep [17:27:29] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [17:27:34] \o/ [17:27:41] (03CR) 10Elukey: [C: 032] role::icinga: add prometheus alerts profile [puppet] - 10https://gerrit.wikimedia.org/r/401730 (owner: 10Elukey) [17:28:00] (03PS1) 10Filippo Giunchedi: debian: force maven repo directory [debs/prometheus-jmx-exporter] - 10https://gerrit.wikimedia.org/r/401758 [17:28:02] (03PS1) 10Filippo Giunchedi: debian: tweak gbp config [debs/prometheus-jmx-exporter] - 10https://gerrit.wikimedia.org/r/401759 [17:28:16] elukey: thanks for the review of the reimage stuff, if you ping me before reimaging the next server we can merge it and test it [17:28:34] (or feel free to merge it yourself ;) ) [17:29:21] volans: I'll do some appservers next week probably, will ping you beforehand! [17:29:30] great! thanks a lot [17:29:53] I tried this week the --no-pxe option but had to use install-console to generate the ssl cert request, it would be super good to get rid of that [17:30:02] so I am super happy to test your code :) [17:30:33] eheheh [17:33:05] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Allow "releasers-mediawiki" sudo rights to manage Jenkins - https://phabricator.wikimedia.org/T183972#3868664 (10RobH) So this technically is sudo, and thus requires ops meeting approval. We'll make... [17:33:27] gotta love hwen someone requests access change and then provides their own patchset so its perfectly clear what they want ;] [17:33:42] no_justification: thx ;D [17:34:01] Well, patch set came before access request :p [17:34:04] But you're welcome, heh [17:34:25] so yeah, its sudo so chris is on duty this week and im next week [17:34:33] so between the two of us one of us will ensure its listed in meeting [17:35:02] !log restart and upgrade db2046 [17:35:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:40] !log demon@tin Synchronized php-1.31.0-wmf.15/extensions/Wikibase: I9da46c36 (duration: 02m 00s) [17:35:44] !log upload prometheus-jmx-exporter 0.10-3 to jessie/stretch [17:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:36:01] Amir1: Backport sync'd ^ [17:36:22] coolio, Thanks [17:36:42] godog: Sooooo, I just came across a 2.14.x feature of gerrit: we can export some JVM stats via JMX. Would that be useful post-upgrade? :) [17:36:46] I'm around to babysit the deployment if anything related to wikibase happens [17:36:50] or ORES [17:37:02] 10Operations, 10Ops-Access-Requests: Requesting extended access to stat1005 for jdcc - https://phabricator.wikimedia.org/T184085#3872028 (10Jdcc-berkman) [17:37:12] both of them heavily changed in the past month [17:38:14] no_justification: heheh yeah definitely useful! speaking of which, I updated T182759 just today [17:38:16] T182759: Add Prometheus exporter to Jenkins instances - https://phabricator.wikimedia.org/T182759 [17:38:27] Amir1: Awesome thx, I'll ping you before I start [17:39:23] godog: Coolio. I'll file a similar task for Gerrit [17:41:47] 10Operations, 10Gerrit, 10Release-Engineering-Team: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086#3872047 (10demon) p:05Triage>03Normal [17:42:06] 10Operations, 10Gerrit, 10Release-Engineering-Team: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086#3872061 (10demon) Right after posting, I notice there's a stable-2.13 branch, so maybe I don't need to wait for the upgrade? [17:42:45] Ahhh, it is a 2.13.x feature! [17:42:49] We could use it now if we wanted [17:43:22] (except the whole building-2.13-is-a-pain-because-Buck-sucks-as-a-build-tool) [17:43:31] godog: is your 'screen' session on labcontrol1001 anything? [17:44:18] andrewbogott: no, I killed it now tho [17:44:28] 'k thanks :) [17:44:44] 10Operations, 10Goal, 10User-Elukey, 10User-fgiunchedi: Export Prometheus-compatible JVM metrics from JVMs in production - https://phabricator.wikimedia.org/T177197#3872076 (10demon) [17:44:50] 10Operations, 10Gerrit, 10Release-Engineering-Team: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086#3872075 (10demon) [17:47:04] no_justification: ack, yeah shouldn't be too hard to get sth going once gerrit is built, I'll reply on the task [17:47:34] (03PS2) 10Gehel: elasticsearch: move prometheus classes to conform with best practice [puppet] - 10https://gerrit.wikimedia.org/r/398444 [17:47:35] !log otto@tin Started deploy [statsv/statsv@362d1a9]: statsv [17:47:37] !log otto@tin Finished deploy [statsv/statsv@362d1a9]: statsv (duration: 00m 02s) [17:47:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:53] 10Operations, 10Ops-Access-Requests: Requesting extended access to stat1005 for jdcc - https://phabricator.wikimedia.org/T184085#3872091 (10RobH) TL;DR: [] - need @Slaporte to confirm access extension (detailed below) [] - need @RStallman-legalteam to confirm the NDA has no expiry. (jdcc is listed on the nda... [17:47:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:07] (03PS1) 10Chad: Group1 to wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401763 [17:48:09] (03CR) 10Chad: [C: 04-2] Group1 to wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401763 (owner: 10Chad) [17:48:42] 10Operations, 10Ops-Access-Requests: Requesting extended access to stat1005 for jdcc - https://phabricator.wikimedia.org/T184085#3872093 (10RobH) Also, it may be @nuria who can sign off on the access, not sure! (I'd think either would be enough.) [17:51:00] 10Operations, 10Gerrit, 10Release-Engineering-Team: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086#3872132 (10fgiunchedi) To give an outline of what's needed: * install prometheus-jmx-exporter debian package * tweak jvm's options to include `-javaagent ...` * ship an empty con... [17:54:26] (03PS1) 10Jcrespo: mariadb: Decommission db2028- retire from mediawiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401765 (https://phabricator.wikimedia.org/T184090) [17:59:50] (03PS1) 10Jcrespo: mariadb: Decommission db2028 - set as spare [puppet] - 10https://gerrit.wikimedia.org/r/401766 (https://phabricator.wikimedia.org/T184090) [18:00:05] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3872212 (10Papaul) Moved CPU1 to CPU2 Upgrade IDRAC from version 2.21 to 2.50 Upgrade BIOS from version 2.1.7 to 2.6.0 leaving the task open for now to see if the problem... [18:01:49] (03CR) 10Gehel: [C: 032] elasticsearch: move prometheus classes to conform with best practice [puppet] - 10https://gerrit.wikimedia.org/r/398444 (owner: 10Gehel) [18:02:47] (03PS2) 10Jcrespo: dblist: Reorder db2028 & db2039 because master switchover [software] - 10https://gerrit.wikimedia.org/r/401710 (https://phabricator.wikimedia.org/T176243) [18:02:50] (03PS1) 10Jcrespo: dblist: Remove db2028 for decommission [software] - 10https://gerrit.wikimedia.org/r/401767 (https://phabricator.wikimedia.org/T184090) [18:02:55] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3872265 (10jcrespo) Pooling it back, as it will not be too dangerous. [18:03:14] (03PS1) 10Jcrespo: Revert "mariadb: "Depool" pc2005 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401768 [18:03:53] I am going to deploy a couple of configuration changes [18:05:53] 10Operations, 10Goal, 10User-Elukey, 10User-fgiunchedi: Add Prometheus exporter to Jenkins instances - https://phabricator.wikimedia.org/T182759#3872298 (10hashar) The Jenkins instance uses `/ci` path prefix, and from https://github.com/javamelody/javamelody/wiki/UserGuideAdvanced#exposing-metrics-to-prome... [18:06:29] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Goal, and 2 others: Add Prometheus exporter to Jenkins instances - https://phabricator.wikimedia.org/T182759#3872299 (10hashar) [18:07:35] (03PS9) 10Ottomata: Move statsv varnishkafka and service to use main Kafka cluster(s) [puppet] - 10https://gerrit.wikimedia.org/r/391705 (https://phabricator.wikimedia.org/T179093) [18:09:13] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: "Depool" pc2005 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401768 (owner: 10Jcrespo) [18:10:04] (03CR) 10Jcrespo: [C: 032] dblist: Remove db2028 for decommission [software] - 10https://gerrit.wikimedia.org/r/401767 (https://phabricator.wikimedia.org/T184090) (owner: 10Jcrespo) [18:10:19] (03CR) 10Jcrespo: [C: 032] dblist: Reorder db2028 & db2039 because master switchover [software] - 10https://gerrit.wikimedia.org/r/401710 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [18:10:34] (03Merged) 10jenkins-bot: Revert "mariadb: "Depool" pc2005 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401768 (owner: 10Jcrespo) [18:10:46] (03CR) 10jenkins-bot: Revert "mariadb: "Depool" pc2005 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401768 (owner: 10Jcrespo) [18:10:56] (03CR) 10Jcrespo: [V: 032 C: 032] dblist: Remove db2028 for decommission [software] - 10https://gerrit.wikimedia.org/r/401767 (https://phabricator.wikimedia.org/T184090) (owner: 10Jcrespo) [18:11:21] (03PS2) 10Jcrespo: mariadb: Decommission db2028- retire from mediawiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401765 (https://phabricator.wikimedia.org/T184090) [18:13:17] (03PS10) 10Ottomata: Move statsv varnishkafka and service to use main Kafka cluster(s) [puppet] - 10https://gerrit.wikimedia.org/r/391705 (https://phabricator.wikimedia.org/T179093) [18:13:57] hola people - can someone with merge access to /zull can merge https://gerrit.wikimedia.org/r/#/c/401407/ ? we have GCI students blocked on this :-( sad [18:14:02] (03CR) 10Jcrespo: [C: 032] mariadb: Decommission db2028- retire from mediawiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401765 (https://phabricator.wikimedia.org/T184090) (owner: 10Jcrespo) [18:14:33] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3872321 (10Marostegui) >>! In T183750#3872212, @Papaul wrote: > Moved CPU1 to CPU2 > Upgrade IDRAC from version 2.21 to 2.50 > Upgrade BIOS from version 2.1.7 to 2.6.0 >... [18:15:27] (03Merged) 10jenkins-bot: mariadb: Decommission db2028- retire from mediawiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401765 (https://phabricator.wikimedia.org/T184090) (owner: 10Jcrespo) [18:16:27] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3872328 (10Papaul) @Marostegui that works for me [18:17:44] !log jynus@tin Synchronized wmf-config/db-codfw.php: Decom db2028, repool pc2005 (duration: 01m 01s) [18:17:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:25] 10Operations, 10Continuous-Integration-Config: tox 2.5.0 on phabricator-jessie-diffs fails with ERROR: Commands not specified - https://phabricator.wikimedia.org/T184060#3871088 (10hashar) D940 added a command to the env. Previously the tox.ini looked something like: ``` lang=ini [tox] skipsdist = True envlist... [18:18:51] (03CR) 10jenkins-bot: mariadb: Decommission db2028- retire from mediawiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401765 (https://phabricator.wikimedia.org/T184090) (owner: 10Jcrespo) [18:19:00] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Decom db2028 (duration: 01m 01s) [18:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:43] 10Operations, 10ORES, 10Graphite, 10Scoring-platform-team (Current), 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3872358 (10fgiunchedi) >>! In T169969#3867919, @Halfak wrote: > @fgiunchedi, can you help me figure out what our next step should b... [18:21:37] (03CR) 10Ottomata: "Looking good. This might cause a bump in statsv based metrics while varnishkafkas start producing to main." [puppet] - 10https://gerrit.wikimedia.org/r/391705 (https://phabricator.wikimedia.org/T179093) (owner: 10Ottomata) [18:21:39] (03PS2) 10Jcrespo: mariadb: Decommission db2028 - set as spare [puppet] - 10https://gerrit.wikimedia.org/r/401766 (https://phabricator.wikimedia.org/T184090) [18:21:43] (03CR) 10Ottomata: [C: 032] Move statsv varnishkafka and service to use main Kafka cluster(s) [puppet] - 10https://gerrit.wikimedia.org/r/391705 (https://phabricator.wikimedia.org/T179093) (owner: 10Ottomata) [18:22:42] (03PS3) 10Jcrespo: mariadb: Decommission db2028 - set as spare [puppet] - 10https://gerrit.wikimedia.org/r/401766 (https://phabricator.wikimedia.org/T184090) [18:23:34] (03CR) 10Jcrespo: [C: 032] mariadb: Decommission db2028 - set as spare [puppet] - 10https://gerrit.wikimedia.org/r/401766 (https://phabricator.wikimedia.org/T184090) (owner: 10Jcrespo) [18:25:18] !log deploying change to produce statsv metrics to main kafka clusters from varnishkafka. statsv on hafnium will be restarted to consume from main. might cause a short blip in statsv metrics. [18:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:41] !log uploaded hhvm 3.18.5+dfsg-1+wmf2+deb9u1 for stretch-wikimedia to apt.wikimedia.org [18:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:43:04] jouncebot: next [18:43:04] In 0 hour(s) and 16 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180103T1900) [18:53:13] !log restarted ircecho on einsteinium [18:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:31] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:55:08] (03PS3) 10Hashar: Update portals submodule to master [puppet] - 10https://gerrit.wikimedia.org/r/394555 (https://phabricator.wikimedia.org/T181799) (owner: 10Jdrewniak) [18:55:28] (03CR) 10Hashar: [C: 031] Update portals submodule to master [puppet] - 10https://gerrit.wikimedia.org/r/394555 (https://phabricator.wikimedia.org/T181799) (owner: 10Jdrewniak) [18:55:51] RECOVERY - puppet last run on cp2023 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:56:01] RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:56:01] (03CR) 10Bartosz Dziewoński: Add configuration deboosting scientific articles (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) (owner: 10Smalyshev) [18:56:11] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:56:21] RECOVERY - puppet last run on cp2004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:56:31] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:56:46] jouncebot: refresh [18:56:49] I refreshed my knowledge about deployments. [18:56:57] (03PS3) 10Bartosz Dziewoński: Add configuration deboosting scientific articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) (owner: 10Smalyshev) [18:57:10] (03CR) 10Bartosz Dziewoński: "I tweaked the comments a little bit." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) (owner: 10Smalyshev) [18:57:21] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:57:22] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:57:32] (03CR) 10Smalyshev: Add configuration deboosting scientific articles (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) (owner: 10Smalyshev) [18:57:34] (03PS4) 10Smalyshev: Add configuration deboosting scientific articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) [18:58:02] RECOVERY - puppet last run on cp2010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:58:27] (03PS5) 10Smalyshev: Add configuration deboosting scientific articles on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) [18:58:55] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Morning SWAT (Max 8 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180103T1900). [19:00:04] Smalyshev, Hauskatze, Amir1, MatmaRex, and eddiegp: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:09] sup [19:00:15] here [19:00:49] Present [19:01:08] I'm around [19:02:08] (03PS1) 10Elukey: profile::prometheus::alerts: fix Druid ingestion Prometheus query [puppet] - 10https://gerrit.wikimedia.org/r/401773 [19:02:56] (03CR) 10Hashar: "I have removed this change from the beta cluster puppetmaster. It was breaking puppet on deployment-tin." [puppet] - 10https://gerrit.wikimedia.org/r/392221 (owner: 10Aaron Schulz) [19:04:05] (03PS2) 10Elukey: profile::prometheus::alerts: fix Druid ingestion Prometheus query [puppet] - 10https://gerrit.wikimedia.org/r/401773 [19:04:33] who can SWAT today? :) [19:04:41] (03CR) 10Elukey: [C: 032] profile::prometheus::alerts: fix Druid ingestion Prometheus query [puppet] - 10https://gerrit.wikimedia.org/r/401773 (owner: 10Elukey) [19:06:00] (03CR) 10Hashar: "Actually I have restored the cherry pick and restored the old cherry picks :)" [puppet] - 10https://gerrit.wikimedia.org/r/392221 (owner: 10Aaron Schulz) [19:06:23] 10Operations, 10monitoring: ircecho doesn't reconnect on failure - https://phabricator.wikimedia.org/T184103#3872460 (10Volans) [19:06:26] soooo.... everybody's asleep? :) [19:06:31] ... [19:06:36] 10Operations, 10monitoring: ircecho doesn't reconnect on failure - https://phabricator.wikimedia.org/T184103#3872470 (10Volans) p:05Triage>03High [19:06:40] Niharika: wanna SWAT for us? :) [19:06:51] Here too, for the unlikely case we'll have time to do my patch (#8) ;) [19:07:27] 10Operations, 10monitoring: ircecho doesn't reconnect on failure - https://phabricator.wikimedia.org/T184103#3872460 (10Volans) Thanks to @cwdent for notifying us. [19:07:33] Hauskatze: I'd love to but it's way past midnight for me. I'm sure Reedy could be convinced to do it... :P [19:07:49] Niharika: :D [19:07:50] I can SWAT [19:07:53] yay [19:08:00] Awesome. [19:08:06] (03CR) 10Hashar: [C: 031] "Cherry picked on the beta cluster puppet master. The next Jenkins job should thus update the repo to the tip of the branch." [puppet] - 10https://gerrit.wikimedia.org/r/394555 (https://phabricator.wikimedia.org/T181799) (owner: 10Jdrewniak) [19:08:09] love y'all [19:08:18] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) (owner: 10Smalyshev) [19:08:23] coolio [19:09:18] (03CR) 10Ottomata: "Will respond to inline comments, but about general idea:" [puppet] - 10https://gerrit.wikimedia.org/r/379004 (https://phabricator.wikimedia.org/T174465) (owner: 10Ottomata) [19:09:50] (03Merged) 10jenkins-bot: Add configuration deboosting scientific articles on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) (owner: 10Smalyshev) [19:10:04] (03CR) 10jenkins-bot: Add configuration deboosting scientific articles on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401612 (https://phabricator.wikimedia.org/T183510) (owner: 10Smalyshev) [19:10:16] (03CR) 10Ottomata: [C: 032] debian: force maven repo directory [debs/prometheus-jmx-exporter] - 10https://gerrit.wikimedia.org/r/401758 (owner: 10Filippo Giunchedi) [19:10:32] SMalyshev: your change is live on mwdebug1002, check please [19:10:38] (03CR) 10Ottomata: [C: 032] debian: tweak gbp config [debs/prometheus-jmx-exporter] - 10https://gerrit.wikimedia.org/r/401759 (owner: 10Filippo Giunchedi) [19:10:38] checking [19:11:09] (03PS4) 10Thcipriani: Setup some namespace aliases for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399982 (https://phabricator.wikimedia.org/T183612) (owner: 10MarcoAurelio) [19:11:36] eswiki, that's me [19:11:45] :) [19:12:06] thcipriani: yep, seems to be fine [19:12:22] ok going live [19:13:56] phew scap logging [19:14:08] too verbose? [19:14:15] !log thcipriani@tin Synchronized wmf-config/Wikibase-production.php: SWAT: [[gerrit:401612|Add configuration deboosting scientific articles on Wikidata]] T183510 (duration: 01m 02s) [19:14:20] ^ SMalyshev [19:14:25] live now [19:14:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:26] T183510: Deboost Q13442814 (scientific article) in wikidata search - https://phabricator.wikimedia.org/T183510 [19:14:29] thcipriani: great, thanks [19:15:27] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1050 - https://phabricator.wikimedia.org/T178162#3872550 (10Cmjohnson) [19:15:37] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3872552 (10Cmjohnson) [19:15:41] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1050 - https://phabricator.wikimedia.org/T178162#3682923 (10Cmjohnson) 05Open>03Resolved [19:15:49] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399982 (https://phabricator.wikimedia.org/T183612) (owner: 10MarcoAurelio) [19:17:47] (03Merged) 10jenkins-bot: Setup some namespace aliases for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399982 (https://phabricator.wikimedia.org/T183612) (owner: 10MarcoAurelio) [19:18:59] Hauskatze: ^ is live on mwdebug1002, check please [19:19:08] thcipriani: ack, checking [19:19:14] (03PS4) 10Thcipriani: Set category collation to uca-es-u-kn for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401081 (https://phabricator.wikimedia.org/T183802) (owner: 10MarcoAurelio) [19:20:01] thcipriani: seems to work just fine [19:20:16] ok, going live then I'll run namespacedupes on terbium [19:20:26] thcipriani: awesome, thanks [19:21:42] (03CR) 10jenkins-bot: Setup some namespace aliases for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399982 (https://phabricator.wikimedia.org/T183612) (owner: 10MarcoAurelio) [19:22:25] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:399982|Setup some namespace aliases for eswiki]] T183612 (duration: 01m 02s) [19:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:36] T183612: Setup various namespace aliases for eswiki - https://phabricator.wikimedia.org/T183612 [19:23:27] PROBLEM - EventBus HTTP Error Rate -4xx + 5xx- on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [10.0] https://grafana.wikimedia.org/dashboard/db/eventbus?panelId=1fullscreenorgId=1 [19:23:45] ^ [19:23:46] hm [19:24:23] thcipriani: all conflicts resolvable? [19:24:31] Hauskatze: still running [19:24:36] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: CRITICAL - Destination Unreachable (2607:f6f0:205::153) [19:24:56] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [19:25:02] Pchelolo: MessageSizeTooLargeError: The message is 5175333 bytes when serialized which is larger than the maximum request size you have configured with the max_request_size configuration [19:25:32] hmm, no that's old. [19:25:34] not that old [19:25:35] did happen [19:26:02] ottomata: which kafka box is it on? [19:26:12] 10Operations, 10RESTBase, 10Services (next), 10User-mobrovac: Set up RESTBase on Cassandra 3 nodes - https://phabricator.wikimedia.org/T184110#3872628 (10mobrovac) p:05Triage>03High [19:26:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401081 (https://phabricator.wikimedia.org/T183802) (owner: 10MarcoAurelio) [19:26:53] i'm also seeing kafka timeouts [19:26:57] on 1002 [19:26:57] 10Operations, 10RESTBase, 10Services (next), 10User-mobrovac: Set up RESTBase on Cassandra 3 nodes - https://phabricator.wikimedia.org/T184110#3872642 (10mobrovac) [19:27:01] the max message was on 1001 [19:27:03] but was from yesterday [19:27:44] 10Operations, 10RESTBase, 10Services (next), 10User-mobrovac: Set up RESTBase on Cassandra 3 nodes - https://phabricator.wikimedia.org/T184110#3872628 (10mobrovac) [19:27:46] PROBLEM - Check size of conntrack table on mw1336 is CRITICAL: CRITICAL: nf_conntrack is 92 % full [19:28:12] 10Operations, 10RESTBase, 10Services (blocked), 10User-mobrovac: Set up RESTBase on Cassandra 3 nodes - https://phabricator.wikimedia.org/T184110#3872628 (10mobrovac) This is effectively blocked on moving Mathoid over. [19:28:19] ottomata: let's chat in -services, too much notifications going on here [19:28:49] ok [19:29:01] Hauskatze: aaaand still running... :) [19:29:46] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 9.01 ms [19:30:07] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.58 ms [19:30:41] thcipriani: hmm, we maybe should've screened it [19:30:47] RECOVERY - Check size of conntrack table on mw1336 is OK: OK: nf_conntrack is 73 % full [19:30:50] but it usually does not take so long [19:30:57] namespaceDupes I mean [19:31:09] we did a dry-run hours ago and there was no conflicts [19:31:27] RECOVERY - EventBus HTTP Error Rate -4xx + 5xx- on graphite1001 is OK: OK: Less than 50.00% above the threshold [1.0] https://grafana.wikimedia.org/dashboard/db/eventbus?panelId=1fullscreenorgId=1 [19:32:10] seems to be plugging along just fune [19:32:12] fine [19:33:08] (03CR) 10Herron: "Hey Jeff! The prod mx cluster is not yet configured to handle mail for this domain, so a corresponding exim change would be necessary if " [dns] - 10https://gerrit.wikimedia.org/r/401604 (owner: 10Jgreen) [19:33:21] (03Merged) 10jenkins-bot: Set category collation to uca-es-u-kn for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401081 (https://phabricator.wikimedia.org/T183802) (owner: 10MarcoAurelio) [19:33:31] (03CR) 10jenkins-bot: Set category collation to uca-es-u-kn for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401081 (https://phabricator.wikimedia.org/T183802) (owner: 10MarcoAurelio) [19:34:21] Hauskatze: in the interim I've pulled ^ to mwdebug1002, check please [19:35:00] thcipriani: I'm not sure that change is at all testable; it changes the category collation and unless the script is run I don't think I'll see a difference [19:35:07] PROBLEM - MariaDB Slave Lag: s7 on db2040 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 407.18 seconds [19:35:57] PROBLEM - MariaDB Slave Lag: s7 on db2047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.87 seconds [19:36:06] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: CRITICAL - Destination Unreachable (2607:f6f0:205::153) [19:36:26] (03CR) 10Hashar: [V: 031 C: 031] "Should be good:" [puppet] - 10https://gerrit.wikimedia.org/r/394555 (https://phabricator.wikimedia.org/T181799) (owner: 10Jdrewniak) [19:36:47] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [19:36:56] PROBLEM - Check size of conntrack table on mw1337 is CRITICAL: CRITICAL: nf_conntrack is 92 % full [19:36:57] PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 315.65 seconds [19:37:24] Hauskatze: thcipriani: i think the only thing you can test without the update script is to see if category pages render at all. if you made a typo in the config, they will throw exceptions :) [19:37:46] (03PS1) 10Ottomata: Force more exact protocol version for varnishkafka statsv [puppet] - 10https://gerrit.wikimedia.org/r/401778 (https://phabricator.wikimedia.org/T179093) [19:37:50] just did that and the categories do display MatmaRex thcipriani [19:38:03] will do some more random cats on mwdebug1002 [19:38:08] k, I'll sync it out and run the update script after namespacedupes [19:38:16] (03CR) 10jerkins-bot: [V: 04-1] Force more exact protocol version for varnishkafka statsv [puppet] - 10https://gerrit.wikimedia.org/r/401778 (https://phabricator.wikimedia.org/T179093) (owner: 10Ottomata) [19:39:06] PROBLEM - MariaDB Slave Lag: s7 on db2054 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.94 seconds [19:39:09] thcipriani: ok, no category errors so far on 1002 [19:39:20] (03PS2) 10Ottomata: Force more exact protocol version for varnishkafka statsv [puppet] - 10https://gerrit.wikimedia.org/r/401778 (https://phabricator.wikimedia.org/T179093) [19:39:26] thcipriani: this script will take much longer btw (a few hours probably) [19:39:27] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 747.01 seconds [19:39:49] (03CR) 10jerkins-bot: [V: 04-1] Force more exact protocol version for varnishkafka statsv [puppet] - 10https://gerrit.wikimedia.org/r/401778 (https://phabricator.wikimedia.org/T179093) (owner: 10Ottomata) [19:40:26] (03CR) 10Ottomata: [V: 032 C: 032] Force more exact protocol version for varnishkafka statsv [puppet] - 10https://gerrit.wikimedia.org/r/401778 (https://phabricator.wikimedia.org/T179093) (owner: 10Ottomata) [19:40:32] MatmaRex: good looking out, I can start in a screen :) [19:40:42] thcipriani: I pointed that in a task [19:40:47] *in the [19:41:16] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 229.73 ms [19:41:19] and if I knew namespaceDupes would take so much time I'd have suggested that as well [19:41:20] (03PS1) 10Mobrovac: RESTBase: Set up RESTBase for the production_ng role as well [puppet] - 10https://gerrit.wikimedia.org/r/401784 (https://phabricator.wikimedia.org/T184110) [19:41:56] RECOVERY - Check size of conntrack table on mw1337 is OK: OK: nf_conntrack is 78 % full [19:41:56] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.63 ms [19:42:32] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:401081|Set category collation to uca-es-u-kn for eswiki]] T183802 (duration: 01m 02s) [19:42:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:43] T183802: Set uca-es-u-kn as category collation for es.wikipedia - https://phabricator.wikimedia.org/T183802 [19:42:46] ^ Hauskatze live now, FYI [19:43:03] thcipriani: thanks [19:43:13] screen mwscript updateCollation.php --wiki=eswiki --force [19:43:15] i think [19:43:21] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385953 (https://phabricator.wikimedia.org/T178793) (owner: 10MarcoAurelio) [19:43:31] thanks :) [19:44:25] (03PS2) 10Mobrovac: RESTBase: Set up RESTBase for the production_ng role as well [puppet] - 10https://gerrit.wikimedia.org/r/401784 (https://phabricator.wikimedia.org/T184110) [19:46:28] 10Operations, 10Ops-Access-Requests: Requesting extended access to stat1005 for jdcc - https://phabricator.wikimedia.org/T184085#3872750 (10Slaporte) >>! In T184085#3872091, @RobH wrote: > @slaporte: I don't want to assume things. Are you the WMF staff responsible for overseeing and managing @Jdcc-berkman's c... [19:46:37] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3872756 (10hashar) [19:47:02] (03Merged) 10jenkins-bot: Extension:Translate default permissions for Wikimedia wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385953 (https://phabricator.wikimedia.org/T178793) (owner: 10MarcoAurelio) [19:47:12] (03CR) 10jenkins-bot: Extension:Translate default permissions for Wikimedia wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385953 (https://phabricator.wikimedia.org/T178793) (owner: 10MarcoAurelio) [19:48:15] Hauskatze: ^ is live on mwdebug1002, check if possible please [19:48:22] ack, doing [19:49:11] thcipriani: worksforme [19:49:13] Hauskatze: also, namespacedupes is done, but I got > 719131 links to fix, 719128 were resolvable [19:49:27] o.m.g [19:49:27] (03CR) 10Mobrovac: [C: 04-1] "PCC looks ok in theory - https://puppet-compiler.wmflabs.org/compiler02/9537/" [puppet] - 10https://gerrit.wikimedia.org/r/401784 (https://phabricator.wikimedia.org/T184110) (owner: 10Mobrovac) [19:49:41] Hauskatze: ok, going live with 385953 [19:50:01] 719131 links to fix, 719128 were resolvable <-- Platonides: esto no es normal, o sí? [19:50:36] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Move deployment-prep redis instances to stretch - https://phabricator.wikimedia.org/T179371#3722645 (10hashar) deployment-redis01 and deployment-redis02 have puppet failure due to the prometheus redis_exporter requiring s... [19:52:59] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:385953|Extension:Translate default permissions for Wikimedia wikis]] T178793 (duration: 01m 02s) [19:53:07] ^ Hauskatze live now [19:53:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:10] T178793: Bureaucrats on WMF wikis to add and remove 'translationadmin' by default - https://phabricator.wikimedia.org/T178793 [19:53:24] (03PS7) 10Thcipriani: Close wikimania2017.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396581 (https://phabricator.wikimedia.org/T182493) (owner: 10MarcoAurelio) [19:53:24] thcipriani: thanks [19:55:20] MatmaRex: does your change require a full scap? [19:55:42] thcipriani: no, but it has a bunch of files [19:55:45] sync-dir i guess [19:55:47] k [19:55:48] if that's a thing [19:56:59] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396581 (https://phabricator.wikimedia.org/T182493) (owner: 10MarcoAurelio) [19:57:16] RECOVERY - MariaDB Slave Lag: s7 on db2054 is OK: OK slave_sql_lag Replication lag: 0.26 seconds [19:57:18] it's a thing, mostly :) [19:57:47] sync-file and sync-dir are the same thing but they are both things [19:58:29] (03Merged) 10jenkins-bot: Close wikimania2017.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396581 (https://phabricator.wikimedia.org/T182493) (owner: 10MarcoAurelio) [19:58:39] (03CR) 10jenkins-bot: Close wikimania2017.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396581 (https://phabricator.wikimedia.org/T182493) (owner: 10MarcoAurelio) [19:59:04] Amir1: is Eranroz -1 on your change still valid? [19:59:17] thcipriani: nope, PM decision [19:59:20] k [19:59:27] not testable [19:59:57] ok [20:00:05] no_justification: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180103T2000). [20:00:05] No GERRIT patches in the queue for this window AFAICS. [20:00:16] Hauskatze: wikimania2017 patch is live on mwdebug1002, check please [20:00:24] thcipriani: could you provide the list of non-resolvable links for eswiki when possible? thanks! [20:00:31] ack, checking [20:00:34] no_justification: SWAT is running a bit long :( [20:00:59] thcipriani: lgtm [20:01:05] Hauskatze: ok going live [20:01:16] I'll amend later the sitenotice [20:01:30] Hauskatze: I lost my scrollback from the namespacedupes output, is there another way to get that list? [20:01:33] stewards can edit the closed wikis [20:01:53] thcipriani: I guess namespaceDupes.php --wiki=eswiki will output the links to be fixed? [20:02:08] (03CR) 10Dzahn: [C: 032] rancid: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/399968 (owner: 10Dzahn) [20:02:11] (03PS5) 10Dzahn: rancid: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/399968 [20:02:18] it's in dry-run mode so no changes will be done [20:03:04] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1033 - https://phabricator.wikimedia.org/T183896#3872815 (10Cmjohnson) a case has been opened with HP Support. Your case was successfully submitted. Please note your Case ID: 5325901818 for future reference. An email confirmation will be sent to the case conta... [20:03:10] but in any case it can wait until you finish swat [20:04:36] !log thcipriani@tin Synchronized dblists/closed.dblist: SWAT: [[gerrit:396581|Close wikimania2017.wikimedia.org]] PART I T182493 (duration: 01m 02s) [20:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:47] T182493: wikimania2017wiki: close the wiki - https://phabricator.wikimedia.org/T182493 [20:06:06] RECOVERY - MariaDB Slave Lag: s7 on db2047 is OK: OK slave_sql_lag Replication lag: 46.60 seconds [20:06:16] (03PS2) 10Thcipriani: Don't enable lua fine grained tracking for any wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/398823 (https://phabricator.wikimedia.org/T172914) (owner: 10Ladsgroup) [20:06:21] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:396581|Close wikimania2017.wikimedia.org]] PART II T182493 (duration: 01m 04s) [20:06:22] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/398823 (https://phabricator.wikimedia.org/T172914) (owner: 10Ladsgroup) [20:06:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:36] Hauskatze: last patch should be live, FYI [20:06:51] thcipriani: thanks [20:07:00] and sorry that my patches took so long :( [20:07:07] RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [20:07:51] (03Merged) 10jenkins-bot: Don't enable lua fine grained tracking for any wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/398823 (https://phabricator.wikimedia.org/T172914) (owner: 10Ladsgroup) [20:08:01] (03CR) 10jenkins-bot: Don't enable lua fine grained tracking for any wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/398823 (https://phabricator.wikimedia.org/T172914) (owner: 10Ladsgroup) [20:09:09] Hauskatze: no worries :) [20:09:25] MatmaRex: you patch is live on mwdebug1002, check please [20:09:29] *your [20:10:34] thcipriani: that's fine I'm running late anyway [20:11:52] looking [20:13:18] thcipriani: looks good. (sorry, took me a while because i tested on a wrong wiki at first) [20:13:40] MatmaRex: np, going live now [20:15:37] PROBLEM - Check size of conntrack table on mw1335 is CRITICAL: CRITICAL: nf_conntrack is 92 % full [20:16:07] !log thcipriani@tin Synchronized php-1.31.0-wmf.15/extensions/VisualEditor/lib/ve: SWAT: [[gerrit:401771|Update VE core submodule to master]] T182907 T183590 (duration: 01m 06s) [20:16:16] ^ MatmaRex live now [20:16:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:18] T183590: Multi-part templates are rendered backwards and have a slug inside them - https://phabricator.wikimedia.org/T183590 [20:16:18] T182907: Link inspector overlapped by toolbar and extends past the screen when it opens upwards (only when editing links in tables and references) - https://phabricator.wikimedia.org/T182907 [20:16:26] thanks [20:17:11] Amir1: can't check yours, correct? Just go live? [20:17:22] thcipriani: yup [20:17:26] k going [20:17:46] That's basically turning off a feature before it going live [20:18:37] RECOVERY - Check size of conntrack table on mw1335 is OK: OK: nf_conntrack is 78 % full [20:19:08] (03PS4) 10Thcipriani: Restrict sending mails to new users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/397768 (https://phabricator.wikimedia.org/T182541) (owner: 10EddieGP) [20:19:15] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:398823|Do not enable lua fine grained tracking for any wiki]] T172914 (duration: 01m 02s) [20:19:21] Amir1: live now ^ [20:19:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:25] T172914: [Tracking] Fine-grained change notifications based on tracking from Lua getters via __index - https://phabricator.wikimedia.org/T172914 [20:19:29] eddiegp: still around? [20:19:30] great thanks [20:19:54] thcipriani: Yep, but I've already moved that patch to EU swat tomorrow. [20:19:56] thcipriani: I can babysit if he's not around [20:20:08] Wanted to do it then originally anyways. [20:20:20] eddiegp: ok, thank you, that will work better since I ran over by a lot, sorry [20:20:35] no_justification: should be all clear for train [20:20:51] thanks both for expanding the time a bit [20:21:00] sorry for eddiegp to wait another day [20:21:11] thcipriani: That's fine, I expected that I might have to do that as it was patch #8 ;) [20:21:35] proves that maybe swat should be 1,30 hours [20:22:04] depends on the patches, deployers, and zuul: sometimes 8 is just fine [20:22:14] thcipriani: how's updateCollation going? [20:22:24] although most times 8 is too much :( [20:22:26] RECOVERY - MariaDB Slave Lag: s7 on db2040 is OK: OK slave_sql_lag Replication lag: 0.22 seconds [20:22:37] Hauskatze: just started [20:22:45] what?!? :) [20:22:49] :D [20:23:21] !log updateCollation for eswiki running in screen as thcipriani on terbium [20:23:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:38] won't that block the train? [20:23:51] oh well, eswiki is not on group0 anyways [20:24:11] we're on group1 today [20:24:11] it won't [20:24:17] but eswiki is not group1 either :) [20:24:28] it can run in the background happily, i'm pretty sure [20:24:35] even during a deployment [20:24:47] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [20:24:55] thcipriani: well you can list the broken links then :) [20:25:14] (03PS1) 10Cmjohnson: Adding mgmt dns labvirt1021/22 [dns] - 10https://gerrit.wikimedia.org/r/401793 (https://phabricator.wikimedia.org/T183937) [20:25:45] * Hauskatze finds strange that namespaceDupes had to check so many links... there were a couple of thousand WP: link links, not nearly 80k [20:26:05] (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns labvirt1021/22 [dns] - 10https://gerrit.wikimedia.org/r/401793 (https://phabricator.wikimedia.org/T183937) (owner: 10Cmjohnson) [20:26:10] sure lemme dump that to a paste [20:29:56] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.59 ms [20:30:37] Hauskatze: https://phabricator.wikimedia.org/P6522 [20:31:24] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3872906 (10chasemp) [20:31:26] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Port nutcracker statistics to Prometheus - https://phabricator.wikimedia.org/T181995#3872903 (10chasemp) 05Resolved>03Open Tentatively reopening as I'm not sure what's up with this but want to keep this tasks narrative together. It seems the... [20:31:28] xionox did you see mr1-eqiad? [20:31:43] 10Operations, 10IRCecho, 10monitoring: ircecho doesn't reconnect on failure - https://phabricator.wikimedia.org/T184103#3872907 (10Peachey88) [20:34:56] (03PS1) 10Cmjohnson: Adding mgmt dns for notebook1003/4 [dns] - 10https://gerrit.wikimedia.org/r/401797 [20:35:16] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: CRITICAL - Destination Unreachable (2607:f6f0:205::153) [20:36:36] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [20:38:55] thcipriani: https://phabricator.wikimedia.org/P6522#36656 [20:38:59] bbl [20:40:26] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 3.12 ms [20:40:42] (03CR) 10Dzahn: [C: 032] "yup, per "labs" and "already cherry-picked"" [puppet] - 10https://gerrit.wikimedia.org/r/401677 (owner: 10Legoktm) [20:40:50] (03PS4) 10Dzahn: contint: Install php-xdebug (disabled by default) for PHP 7 [puppet] - 10https://gerrit.wikimedia.org/r/401677 (owner: 10Legoktm) [20:41:46] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.61 ms [20:41:50] (03CR) 10Chad: [C: 032] Group1 to wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401763 (owner: 10Chad) [20:44:10] (03PS17) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221 [20:45:01] (03Merged) 10jenkins-bot: Group1 to wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401763 (owner: 10Chad) [20:46:43] (03PS3) 10Cmjohnson: decom: remove uranium from site,DHCP,netboot [puppet] - 10https://gerrit.wikimedia.org/r/399684 (https://phabricator.wikimedia.org/T183209) (owner: 10Dzahn) [20:46:48] (03CR) 10jenkins-bot: Group1 to wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401763 (owner: 10Chad) [20:47:04] (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns for notebook1003/4 [dns] - 10https://gerrit.wikimedia.org/r/401797 (owner: 10Cmjohnson) [20:47:08] (03CR) 10Dzahn: [C: 032] decom: remove uranium from site,DHCP,netboot [puppet] - 10https://gerrit.wikimedia.org/r/399684 (https://phabricator.wikimedia.org/T183209) (owner: 10Dzahn) [20:50:35] !log uranium (ex-ganglia-web) is going into eternal downtime on Icinga.. shutdown -h RIP (T183209) [20:50:41] (03CR) 10Jgreen: "> Hey Jeff! The prod mx cluster is not yet configured to handle mail" [dns] - 10https://gerrit.wikimedia.org/r/401604 (owner: 10Jgreen) [20:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:50:48] T183209: decom uranium - https://phabricator.wikimedia.org/T183209 [20:52:51] 10Operations, 10ops-eqiad, 10hardware-requests, 10monitoring, and 2 others: decom uranium - https://phabricator.wikimedia.org/T183209#3872999 (10Dzahn) [20:55:48] (03PS2) 10Dzahn: remove uranium.wikimedia.org, v4 + v6 [dns] - 10https://gerrit.wikimedia.org/r/399125 (https://phabricator.wikimedia.org/T183209) [20:56:09] !log uranium - revoked puppet cert, node deactivate, removing from DNS (T183209) [20:56:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:56:21] T183209: decom uranium - https://phabricator.wikimedia.org/T183209 [20:56:31] (03Draft1) 10Paladox: Gerrit: [puppet] - 10https://gerrit.wikimedia.org/r/401799 [20:56:34] (03PS2) 10Paladox: Gerrit: Set gitiles configuation to be used as the repo viewer [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) [20:57:47] (03CR) 10Dzahn: [C: 032] remove uranium.wikimedia.org, v4 + v6 [dns] - 10https://gerrit.wikimedia.org/r/399125 (https://phabricator.wikimedia.org/T183209) (owner: 10Dzahn) [20:59:30] 10Operations, 10ops-eqiad, 10hardware-requests, 10monitoring, and 2 others: decom uranium - https://phabricator.wikimedia.org/T183209#3873006 (10Dzahn) [20:59:53] 10Operations, 10ops-eqiad, 10hardware-requests, 10monitoring, and 2 others: decom uranium - https://phabricator.wikimedia.org/T183209#3846949 (10Dzahn) a:03Cmjohnson [21:00:05] cscott, arlolra, subbu, bearND, halfak, and Amir1: (Dis)respected human, time to deploy Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180103T2100). Please do the needful. [21:00:05] No GERRIT patches in the queue for this window AFAICS. [21:01:00] 10Operations, 10ops-eqiad, 10hardware-requests, 10monitoring, and 2 others: decom uranium - https://phabricator.wikimedia.org/T183209#3846949 (10Dzahn) [21:01:06] Nothing for ORES [21:01:20] !log deleting stale topics from main kafka clusters: T149594 [21:01:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:31] T149594: Delete stale topics from main Kafka clusters - https://phabricator.wikimedia.org/T149594 [21:03:21] nothing for MCS today [21:04:38] (03PS2) 10Dzahn: releases-jenkins-apache: Remove trailing slash, $prefix already does that [puppet] - 10https://gerrit.wikimedia.org/r/399892 (owner: 10Chad) [21:04:56] (03PS3) 10Dzahn: releases-jenkins-apache: Remove trailing slash, $prefix already does that [puppet] - 10https://gerrit.wikimedia.org/r/399892 (owner: 10Chad) [21:07:28] (03PS3) 10Smalyshev: Add loading DCAT-AP data into dcatap namespace on WDQS [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) [21:07:50] (03CR) 10jerkins-bot: [V: 04-1] Add loading DCAT-AP data into dcatap namespace on WDQS [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) (owner: 10Smalyshev) [21:09:01] (03PS4) 10Smalyshev: Add loading DCAT-AP data into dcatap namespace on WDQS [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) [21:09:47] (03CR) 10Dzahn: [C: 032] releases-jenkins-apache: Remove trailing slash, $prefix already does that [puppet] - 10https://gerrit.wikimedia.org/r/399892 (owner: 10Chad) [21:10:13] (03PS1) 1020after4: group1 wikis to 1.31.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401803 [21:10:15] (03CR) 1020after4: [C: 032] group1 wikis to 1.31.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401803 (owner: 1020after4) [21:11:50] (03CR) 10Dzahn: "deployed on releases1001" [puppet] - 10https://gerrit.wikimedia.org/r/399892 (owner: 10Chad) [21:12:18] mutante: Ty for the merge. Guess it needs an apache restart too [21:12:38] (03PS4) 10Dzahn: librenms: convert role to profile, variables to params [puppet] - 10https://gerrit.wikimedia.org/r/399966 [21:12:48] no_justification: ehm.. Notice: /Stage[main]/Apache/Service[apache2]: Triggered 'refresh' from 1 events [21:12:52] ok.. [21:12:52] Ah ok [21:12:55] Automagic [21:12:55] :) [21:13:00] Didn't realize that [21:13:04] well i just assumed [21:13:10] and it's not broken, heh [21:13:26] - ProxyPass / http://127.0.0.1:8080// retry=0 nocanon [21:13:27] + ProxyPass / http://127.0.0.1:8080/ retry=0 nocanon [21:13:32] :) [21:13:44] Yeah. Basically the extra slash was breaking some of Jenkins' URL building [21:13:54] Sometimes you'd end up with URLs that were like https:///foo [21:13:58] Where it stripped the hostname [21:14:15] PROBLEM - Check size of conntrack table on mw1337 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [21:14:47] yea, *nod*. sounded like it would break URLs [21:15:46] (03Merged) 10jenkins-bot: group1 wikis to 1.31.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401803 (owner: 1020after4) [21:16:45] (03CR) 10jenkins-bot: group1 wikis to 1.31.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401803 (owner: 1020after4) [21:19:14] RECOVERY - Check size of conntrack table on mw1337 is OK: OK: nf_conntrack is 77 % full [21:20:41] (03CR) 10Dzahn: [C: 032] librenms: convert role to profile, variables to params [puppet] - 10https://gerrit.wikimedia.org/r/399966 (owner: 10Dzahn) [21:21:10] (03PS1) 10Chad: Releases: Install composer alongside Jenkins [puppet] - 10https://gerrit.wikimedia.org/r/401804 [21:21:13] and i see my own typo one second later.. duh [21:21:23] "hirea" lookup [21:22:06] 10Operations, 10Cloud-Services: create-dbusers service failing on labstore1004 - https://phabricator.wikimedia.org/T151310#3873066 (10chasemp) 05Open>03Resolved a:03chasemp Haven't seen this for a long time now. [21:23:19] mutante: add it to the blacklist ;) [21:23:38] yes, i was thinking that, there is already "heira", i will [21:25:05] (03PS5) 10Dzahn: librenms: convert role to profile, variables to params [puppet] - 10https://gerrit.wikimedia.org/r/399966 [21:26:04] !log deploying 1.31.0-wmf.15 to "Group 1" wikis [21:26:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:26:50] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: group1 wikis to 1.31.0-wmf.15 [21:26:54] (03PS1) 10Dzahn: typos: add 'hirea', another variant of heira/hiera [puppet] - 10https://gerrit.wikimedia.org/r/401806 [21:27:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:52] !log twentyafterfour@tin Synchronized php: group1 wikis to 1.31.0-wmf.15 (duration: 01m 01s) [21:28:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:25] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [21:28:52] (03CR) 10Dzahn: [C: 032] librenms: convert role to profile, variables to params [puppet] - 10https://gerrit.wikimedia.org/r/399966 (owner: 10Dzahn) [21:29:45] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [21:31:12] (03CR) 10Dzahn: "no changes on netmon1002,netmon2001. the puppet issues there are related to Netbox and psql for it" [puppet] - 10https://gerrit.wikimedia.org/r/399966 (owner: 10Dzahn) [21:31:45] RECOVERY - Long running screen/tmux on labcontrol1001 is OK: OK: No SCREEN or tmux processes detected. [21:32:03] (03CR) 10Dzahn: [C: 032] typos: add 'hirea', another variant of heira/hiera [puppet] - 10https://gerrit.wikimedia.org/r/401806 (owner: 10Dzahn) [21:32:08] (03PS2) 10Dzahn: typos: add 'hirea', another variant of heira/hiera [puppet] - 10https://gerrit.wikimedia.org/r/401806 [21:33:34] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 96.30 ms [21:34:25] (03CR) 10Dzahn: "how do we install the plugin?" [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [21:34:54] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.63 ms [21:35:06] (03CR) 10Paladox: "Chad would build the plugin and upload it to archiva as git-fat and it would be in https://gerrit.wikimedia.org/r/#/admin/projects/operati" [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [21:37:13] (03CR) 10Dzahn: "gotcha, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [21:37:45] (03CR) 10Chad: [C: 04-1] "Not doing it until 2.14.x" [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [21:44:54] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:47:15] californium eh, that's Horizon [21:48:05] false alert again [21:49:54] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:52:40] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: rack/setup/install noteboot100[34] - https://phabricator.wikimedia.org/T183935#3873157 (10Cmjohnson) [21:55:00] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#3873165 (10Cmjohnson) [21:56:12] 10Operations, 10netops, 10Patch-For-Review: Evaluate NetBox as a Racktables replacement & IPAM - https://phabricator.wikimedia.org/T170144#3420547 (10Dzahn) resolved? (https://netbox.wikimedia.org/login/?next=/) P.S. just a note that there is a puppet error on netmon2001 that doesn't exist on netmon1002, w... [22:05:01] 10Operations, 10ops-codfw, 10DBA: db2054: Disk with predictive failure - https://phabricator.wikimedia.org/T183887#3873174 (10Papaul) Dear Mr Papaul Tshibamba, Hewlett Packard Enterprise Reference Number: 5325864400 STATUS: Customer Self Repair Part has been shipped Part/s shipped: 653952-001 Part descrip... [22:10:05] 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3873186 (10RobH) a:03Papaul @papaul, please document which ports you connect to so we can update the switch config accordingly. thanks! [22:10:53] (03PS3) 10Dzahn: microsites::peopleweb: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/400245 [22:12:54] RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 265.12 seconds [22:14:03] (03CR) 10Dzahn: [C: 032] microsites::peopleweb: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/400245 (owner: 10Dzahn) [22:16:07] (03PS4) 10Dzahn: Update portals submodule to master [puppet] - 10https://gerrit.wikimedia.org/r/394555 (https://phabricator.wikimedia.org/T181799) (owner: 10Jdrewniak) [22:18:25] (03CR) 10Dzahn: "nothing on rutherfordium" [puppet] - 10https://gerrit.wikimedia.org/r/400245 (owner: 10Dzahn) [22:20:01] (03PS5) 10Dzahn: beta: Update portals submodule to master [puppet] - 10https://gerrit.wikimedia.org/r/394555 (https://phabricator.wikimedia.org/T181799) (owner: 10Jdrewniak) [22:21:10] Reedy: thcipriani - does --fix resolve P6522 issues? [22:21:22] pagelinks I mean [22:22:33] (03CR) 10Dzahn: [C: 032] beta: Update portals submodule to master [puppet] - 10https://gerrit.wikimedia.org/r/394555 (https://phabricator.wikimedia.org/T181799) (owner: 10Jdrewniak) [22:26:58] jouncebot: next [22:26:59] In 1 hour(s) and 33 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180104T0000) [22:31:18] !log bd808@tin Started deploy [striker/deploy@69f1b15]: Enhance membership request workflow and fix Diffusion repo creation (T168027, T182142) [22:31:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:31:29] T168027: Enhance workflow for toolforge membership requests - https://phabricator.wikimedia.org/T168027 [22:31:29] T182142: Diffusion repository creation fails via toolsadmin - https://phabricator.wikimedia.org/T182142 [22:31:43] 10Operations, 10ORES, 10Graphite, 10Scoring-platform-team (Current), 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3873249 (10Halfak) 1 day aggregation for 5 years is practically indefinite to me. That's OK. I'm a fan of removing anything tha... [22:31:48] !log bd808@tin Finished deploy [striker/deploy@69f1b15]: Enhance membership request workflow and fix Diffusion repo creation (T168027, T182142) (duration: 00m 31s) [22:31:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:33:25] Hauskatze: ni idea... [22:33:49] aunque que fallen 3 no parece demasiado preocupante :P [22:34:22] Platonides: ahora fallan bastantes más [22:39:00] ¿eso dónde es? [22:39:14] Platonides: te digo en -es-ops [22:52:15] PROBLEM - puppet last run on pc1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:00:20] Hauskatze: I did run it with --fix initially, FWIW [23:00:36] I could try running it with --fix again: should I try that? [23:01:24] PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:01:39] thcipriani: that'd fix pagelinks I think, please do [23:01:44] * thcipriani does [23:01:56] I'm slowly fixing the other section using API [23:02:05] PROBLEM - puppet last run on nitrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:02:15] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:02:15] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:02:15] PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:02:15] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:02:15] PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:02:26] has category collation script finished already? [23:02:29] !log restarted apache on phab1001 to clear hung workers (refs T182832) [23:02:37] Hauskatze: hrm > 460 links to fix, 457 were resolvable [23:02:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:41] T182832: Phab is having issues - https://phabricator.wikimedia.org/T182832 [23:02:48] collation still running [23:02:54] PROBLEM - puppet last run on ms-be1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:03] nitrogen may be puppetdb again. [23:03:05] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:05] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:24] PROBLEM - puppet last run on logstash1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:54] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:55] PROBLEM - puppet last run on conf1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:55] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:04:05] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:04:05] PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:04:14] PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:04:14] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:04:24] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:04:44] PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:04:47] ops ^^ [23:05:24] PROBLEM - Check size of conntrack table on mw1336 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [23:05:54] thcipriani: hmm, okay thanks :| [23:06:14] PROBLEM - Check size of conntrack table on mw1335 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [23:07:24] RECOVERY - Check size of conntrack table on mw1336 is OK: OK: nf_conntrack is 76 % full [23:07:28] 10Operations, 10Phabricator: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3873439 (10mmodell) Adding #operations because I could use some Apache expertise on this one. [23:07:45] 10Operations, 10Phabricator: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3873441 (10mmodell) @paladox: 9-27 was the previous deployment [23:07:49] paladox: yea, it's fine when running mannualy [23:08:21] 10Operations, 10Phabricator: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3873443 (10Paladox) Ok thanks. So between september 27 and movement 15. [23:09:22] 10Operations, 10Phabricator: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3873448 (10mmodell) oh there was a deploy on October 11th as well. [23:09:44] RECOVERY - puppet last run on conf1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:10:14] RECOVERY - Check size of conntrack table on mw1335 is OK: OK: nf_conntrack is 77 % full [23:10:36] 10Operations, 10Phabricator: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3873451 (10mmodell) rPHAB893664b...rPHAB7d3ec13 [23:11:19] thcipriani: I think I know what's happening -- WP:OTRS is also a broken link so once I get that deleted the list of pagelinks should be greatly improved [23:12:01] Hi people; I just noticed I have a wrong e-mail subscribed to one of the mailing lists, ie. I can see a message bounced back from one of my e-mail addresses. [23:12:16] Any chance someone could check what list that was so I could update my e-mail there? [23:13:54] you can see a message bounced back from one of your e-mail addresses? [23:14:27] Yes, my hosting provider allows me to see this [23:15:34] So I can see I responded to a message with "No Such User Here" at 1 January, and that response was sent to mailman-bounces@lists.wikimedia.org [23:15:49] Meaning I have an old e-mail subscribed to a list somewhere [23:16:17] so you what, own a domain, and your provider shows a bounce against a random user in it, and you think you have subscribed the wrong username to a mailing list? [23:17:00] Exactly. [23:17:18] I believe you should be able to see this in the bounce log, too, from your side [23:17:32] odder, I think mailman-bounces@ identifies the list as 'mailman' [23:18:21] no idea who is on it [23:19:00] odder: do you just want to know if your email is subscribed to any specific list? [23:19:15] sorry, i have like 5 things open and i may have misparsed your question [23:19:33] I think he'd like to give you an address and have you look up the list of lists it's subscribed to [23:19:51] Yes. [23:20:02] Then I'll just contact people to update it [23:21:07] 10Operations, 10Phabricator: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3873497 (10Paladox) Maybe this https://serverfault.com/questions/513448/httpd-processes-using-more-memory-over-time will help? [23:21:25] if you file a task for it under #wikimedia-mailing-lists i can take care of it. [23:21:36] that way i can check your email against phab's list of emails, ensure its the same [23:21:41] and then strip it off any lists you want [23:21:53] you wont need to contact each mail list admin if its done via the phab task =] [23:22:12] if it's bouncing email I doubt it'll be his phab address [23:22:15] RECOVERY - puppet last run on pc1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:22:15] having the task to audit your request against your email and account seems a lot nicer than 'i did all this stuff cuz it was asked in irc!) [23:22:24] What Krenair said [23:22:26] oh, i misunderstood [23:22:34] its just some rando and you noticed the bounce [23:22:39] Yup [23:22:52] ok, same thing and seems like i can just strip it off everything i find, ill look now but file a task so there is audit trail? [23:22:59] Sure thing [23:27:33] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#3873533 (10Nuria) I think this ticket can be closed, while redirect might have been valid we do not seem to miss i... [23:27:45] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#3873536 (10Nuria) 05Open>03declined [23:28:10] echoing here just for others following along: seems that user isnt on any mailing list at this time. [23:28:55] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:28:55] RECOVERY - puppet last run on conf1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:28:55] RECOVERY - puppet last run on dysprosium is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [23:29:05] RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [23:29:05] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [23:29:05] RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:29:14] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [23:29:24] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [23:31:24] RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:32:05] RECOVERY - puppet last run on nitrogen is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:32:14] RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:32:15] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:32:15] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:32:15] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:32:15] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:32:45] RECOVERY - puppet last run on ms-be1018 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:33:05] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:33:05] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:33:24] RECOVERY - puppet last run on logstash1005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:33:28] 10Operations, 10Analytics, 10Discovery, 10EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#3873565 (10Krinkle) [23:35:34] cmjohnson1: hello [23:39:03] I am sure ops are aware of https://spectreattack.com/ [23:39:09] Yes, they are [23:39:21] thanks Reedy [23:39:55] matanya: legoktm already uploaded their images to commons [23:40:17] sheesh, that was quick! [23:40:32] i got back to reboot to death [23:40:50] *i'll [23:43:16] matanya: https://en.wikipedia.org/wiki/Meltdown_(security_bug) [23:43:36] oh, no, another attack... [23:48:10] Not just any attack ;) [23:49:34] legoktm, could've merged into Intel KPTI flaw... [23:50:10] Krenair: I merged the content that was useful, but that page was mostly outdated (it was written yesterday) and everyone is going to be searching for "meltdown" [23:50:23] bawolff: can it be patched? [23:51:05] I haven't read the google post yet, but signs seem to point to it can be sort of mitigated, but really you want a new CPU [23:51:19] one that hasn't been released yet [23:51:22] And possibly not even designed [23:51:25] yeah [23:51:38] lawsuits flying to microsoft in 3, 2, 1... [23:51:46] I mean, Intel [23:52:10] And AMD [23:52:13] And Qualcomm [23:52:14] hell no, I'm not buying a new PC; I've just got a new one [23:52:16] And Samsung [23:52:17] And... [23:52:51] Hauskatze: As far as these things go, its all pretty bad [23:52:57] and there is very little you can do about it [23:53:14] turn your computer off [23:53:20] And your phone [23:53:39] Go to library, get a book [23:54:11] I already do that [23:54:13] meh [23:54:29] I still use my typewritter sometimes [23:54:35] that can't be hacked :P [23:54:37] Reedy, thought AMD had said they weren't vulnerable [23:54:41] (03PS21) 10TerraCodes: Add wikidata and mediawiki.org to $wgLocalVirtualHosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392999 (https://phabricator.wikimedia.org/T117302) [23:54:53] Krenair: If they've said it... It's been proved wrong since [23:54:55] (03PS36) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [23:54:56] Hauskatze: what about emission security and audio eavesdropping [23:55:06] Krenair: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html [23:55:12] Hauskatze: https://arstechnica.com/information-technology/2015/10/how-soviets-used-ibm-selectric-keyloggers-to-spy-on-us-diplomats/ [23:55:12] https://newsroom.intel.com/news/intel-responds-to-security-research-findings/ [23:55:19] My impression is that intel is super vulnerable, and amd is less vulnerable but still vulnerable [23:55:21] "Recent reports that these exploits are caused by a “bug” or a “flaw” and are unique to Intel products are incorrect." [23:55:23] what would WMF do on its servers using affected chips? [23:55:24] how on earth has that LocalVirtualHosts change got to 21 PSes [23:55:42] I only skimmed the google page though [23:56:15] Hauskatze: Kernel updates [23:56:43] Skimming forum comments it seems like intel you can read kernel memory, or maybe even a different OS host in a virtual situation, where AMD its more potential of user mode memory but not kernel [23:56:55] but I haven't even read the thing yet, so I may be wrong [23:57:25] I am reading google post [23:57:29] there are several attacks [23:57:37] and Intel seems vulnerable to more than AMD [23:58:06] basically, the workaround seems to be: use different hardware :P [23:58:15] Hauskatze: from what I understand, the dangers are mostly to desktop systems which can be attacked by things like javascript exploits and to shared use computers like AWS or our Cloud VPS hosts. The wikis are not directly at risk unless we install some kind of backdoored app that uses the exploit and somehow sends data out of the internal network. [23:58:42] I don't think the wmf is specially affected [23:59:08] Although if people can exploit via javascript jit engines, then maybe LUA jit engines would be a possible vector [23:59:15] maybe the VM machine... [23:59:24] use Itanium [23:59:42] WMF has OpenStack and some other VM thing going in prod