[00:24:36] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:30:48] (03PS1) 10Muehlenhoff: Record extended account validity for nithum [puppet] - 10https://gerrit.wikimedia.org/r/359365 [00:40:38] (03CR) 10Muehlenhoff: [C: 032] Record extended account validity for nithum [puppet] - 10https://gerrit.wikimedia.org/r/359365 (owner: 10Muehlenhoff) [00:58:38] 10Operations, 10Labs, 10hardware-requests: Eqiad: (2) hardware access request for labnet1003/1004 - https://phabricator.wikimedia.org/T158204#3353676 (10faidon) [00:58:41] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Labs, 10hardware-requests: Eqiad: Hardware request for labstore1006/7, dataset1002/3 - https://phabricator.wikimedia.org/T161311#3353679 (10faidon) [00:58:44] 10Operations, 10hardware-requests: eqiad: (2) hardware access request for dedicated Labs puppetmasters - https://phabricator.wikimedia.org/T147053#3353682 (10faidon) [00:58:47] 10Operations, 10Labs, 10hardware-requests: Codfw: (1) hardware access request for labtestvirt2003 [region 2] - https://phabricator.wikimedia.org/T161765#3353685 (10faidon) [01:22:55] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant root access for Bryan Davis on labstore* and admin for maintain scripts for labsdb* - https://phabricator.wikimedia.org/T166310#3353701 (10Dzahn) user has also been created on labsdb1001 and labsdb1003 [01:24:40] (03PS1) 10Andrew Bogott: openstack: add 'puppetleaks' script [puppet] - 10https://gerrit.wikimedia.org/r/359370 [01:31:26] 10Operations, 10netops: Find a new PIM RP IP - https://phabricator.wikimedia.org/T167842#3353703 (10faidon) Oh, thanks for that, that audit is great! These two are indeed surprising, and I think the fact that they are surprising is a good argument for us to get rid of multicast :)) The 239.77.124.213 one is [... [01:49:41] 10Operations, 10vm-requests, 10Patch-For-Review: Site: 2 VM request for tendril (switch tendril from einsteinium to dbmonitor*) - https://phabricator.wikimedia.org/T149557#3353712 (10Dzahn) ``` 34 -- Grants for 'tendril'@'10.%' (tendril) 35 36 GRANT PROCESS, REPLICATION CLIENT, SELECT, SHOW DATABASES 37... [01:58:40] 10Operations, 10vm-requests, 10Patch-For-Review: Site: 2 VM request for tendril (switch tendril from einsteinium to dbmonitor*) - https://phabricator.wikimedia.org/T149557#3353731 (10Dzahn) >>! In T149557#3350742, @akosiaris wrote: > * Make sure dbmonitor1001, dbmonitor2001 run the same tendril version as te... [02:01:19] (03PS1) 10Dzahn: switch tendril from einsteinium to dbmonitor1001 [dns] - 10https://gerrit.wikimedia.org/r/359372 (https://phabricator.wikimedia.org/T149557) [02:05:20] (03PS1) 10Dzahn: mariadb: add GRANT for tendril@dbmonitor1001 [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) [02:15:15] mutante: thanks for merging that permissions patch for me [02:30:41] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 02s) [02:30:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:34:40] (03PS2) 10Reedy: Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 [02:35:30] (03CR) 10jerkins-bot: [V: 04-1] Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 (owner: 10Reedy) [02:36:38] (03PS3) 10Reedy: Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 [02:37:05] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Jun 16 02:37:05 UTC 2017 (duration 6m 25s) [02:37:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:37:37] (03CR) 10jerkins-bot: [V: 04-1] Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 (owner: 10Reedy) [02:39:04] (03PS4) 10Reedy: Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 [02:39:53] (03CR) 10jerkins-bot: [V: 04-1] Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 (owner: 10Reedy) [02:40:41] srsly [02:40:41] (03PS5) 10Reedy: Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 [02:46:39] (03PS1) 10Andrew Bogott: designate: Clean up puppet config for deleted instances. [puppet] - 10https://gerrit.wikimedia.org/r/359374 (https://phabricator.wikimedia.org/T147878) [02:49:24] (03CR) 10Reedy: [C: 031] "Woo, now it does what was expected originally" [puppet] - 10https://gerrit.wikimedia.org/r/354932 (owner: 10Reedy) [02:49:35] (03CR) 10Andrew Bogott: [C: 032] openstack: add 'puppetleaks' script [puppet] - 10https://gerrit.wikimedia.org/r/359370 (owner: 10Andrew Bogott) [03:19:23] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3353800 (10Reedy) What are you querying? The MW API? Are you making a lot of requests simultaneously? [03:24:08] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3353803 (10Yurivict) I use MW API. I run requests from several threads. How do you define "a lot"? [03:27:03] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3353804 (10Reedy) >>! In T167920#3350376, @BBlack wrote: > For the API you're using, we currently have a per-client-IP ratelimiter in place that will limit at 600 reqs/min/clientip, w... [03:28:55] 10Operations, 10media-storage: uploads.wm.o commons archive 20170615014039!Adsalm.webm visible despite file deleted on Commons - https://phabricator.wikimedia.org/T168002#3353808 (10Reedy) [03:32:32] (03PS6) 10Reedy: Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 [04:27:15] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=450.70 Read Requests/Sec=679.40 Write Requests/Sec=29.70 KBytes Read/Sec=41069.60 KBytes_Written/Sec=144.40 [04:36:25] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=42.00 Read Requests/Sec=19.70 Write Requests/Sec=138.70 KBytes Read/Sec=326.40 KBytes_Written/Sec=964.00 [04:59:56] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3353837 (10Marostegui) p:05Triage>03Normal [05:12:25] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2070 - https://phabricator.wikimedia.org/T167667#3353844 (10Marostegui) 05Open>03Resolved The rebuilt finished correctly: ``` logicaldrive 1 (3.3 TB, RAID 1+0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK) physicald... [05:13:04] 10Operations, 10media-storage: uploads.wm.o commons archive 20170615014039!Adsalm.webm visible despite file deleted on Commons - https://phabricator.wikimedia.org/T168002#3353846 (10Marostegui) p:05Triage>03Normal [05:19:31] (03Abandoned) 10Tim Starling: Rename all WMF-specific configuration variables to have a wgWMF prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347541 (owner: 10Tim Starling) [05:24:36] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10MW-1.30-release-notes (WMF-deploy-2017-06-20_(1.30.0-wmf.6)), 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3353851 (10Marostegui) [05:27:29] 10Operations, 10MobileFrontend, 10Reading-Web-Backlog, 10Traffic: Remove disableImages handling from VCL - https://phabricator.wikimedia.org/T168013#3353852 (10Marostegui) p:05Triage>03Normal [05:45:27] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3353061 (10Legoktm) The search function on graphite is private (requires NDA) but if there's a specific metric you want, all the data is publicly accessible if y... [05:45:41] !log increase enwiki_content replicas on codfw from 2 to 3 to match eqiad [05:45:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:49:26] (03PS1) 10EBernhardson: Update cirrus server counts to match reality [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359376 [05:50:29] (03CR) 10jerkins-bot: [V: 04-1] Update cirrus server counts to match reality [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359376 (owner: 10EBernhardson) [05:57:23] (03PS2) 10EBernhardson: [WIP] Update cirrus server counts to match reality [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359376 [05:58:24] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update cirrus server counts to match reality [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359376 (owner: 10EBernhardson) [05:59:28] (03PS3) 10EBernhardson: [WIP] Update cirrus server counts to match reality [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359376 [06:24:26] (03PS3) 10KartikMistry: Explicitly set cookieDomain for ContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320200 (https://phabricator.wikimedia.org/T149879) [06:50:25] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3353881 (10Marostegui) p:05Triage>03Normal @MoritzMuehlenhoff can you shed some light on the NDA requirement? [07:06:35] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:08:35] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 78645 bytes in 4.705 second response time [07:11:35] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3353910 (10MoritzMuehlenhoff) Is is very similar to the process listed here (which applies to for adding people to the WMF-NDA project in Phabricator): https://w... [07:16:45] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:16:45] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:16:45] PROBLEM - Nginx local proxy to apache on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:17:35] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.205 second response time [07:17:35] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 78613 bytes in 1.009 second response time [07:17:35] RECOVERY - Nginx local proxy to apache on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.222 second response time [07:26:02] (03CR) 10Hashar: [C: 031] Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 (owner: 10Reedy) [07:28:12] (03PS1) 10Jcrespo: mariadb: Pool db1099 & db1101 as eqiad's temp. pc2 and pc3 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359379 (https://phabricator.wikimedia.org/T167784) [07:29:02] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Pool db1099 & db1101 as eqiad's temp. pc2 and pc3 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359379 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [07:30:23] (03PS2) 10Jcrespo: mariadb: Pool db1099 & db1101 as eqiad's temp. pc2 and pc3 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359379 (https://phabricator.wikimedia.org/T167784) [07:48:58] (03CR) 10Marostegui: [C: 031] mariadb: Pool db1099 & db1101 as eqiad's temp. pc2 and pc3 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359379 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [07:58:06] (03PS1) 10Jcrespo: prometheus-mysqld-exporter: add db1099 and db1101 as parsercaches [puppet] - 10https://gerrit.wikimedia.org/r/359383 (https://phabricator.wikimedia.org/T167784) [07:58:14] (03Draft1) 10Paladox: contint python: use libmariadb-dev over libmysqlclient-dev on stretch [puppet] - 10https://gerrit.wikimedia.org/r/359382 (https://phabricator.wikimedia.org/T166611) [07:58:17] (03PS2) 10Paladox: contint python: use libmariadb-dev over libmysqlclient-dev on stretch [puppet] - 10https://gerrit.wikimedia.org/r/359382 (https://phabricator.wikimedia.org/T166611) [07:59:39] (03CR) 10Jcrespo: [C: 032] prometheus-mysqld-exporter: add db1099 and db1101 as parsercaches [puppet] - 10https://gerrit.wikimedia.org/r/359383 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [08:07:15] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3353970 (10Aklapper) @Yurivict: Please provide clear steps and information to reproduce the problem. See https://mediawiki.org/wiki/How_to_report_a_bug for more information. Thanks!... [08:07:50] (03CR) 10Paladox: mysql: Fix installing package on stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/354131 (owner: 10Paladox) [08:07:56] (03PS6) 10Paladox: mysql: Fix installing package on stretch [puppet] - 10https://gerrit.wikimedia.org/r/354131 [08:13:46] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3353988 (10Marostegui) Thanks @MoritzMuehlenhoff!  @halfak and @Zppix can you guys follow https://wikitech.wikimedia.org/wiki/Volunteer_NDA remaining steps? Tha... [08:18:24] (03CR) 10Jcrespo: [C: 032] mariadb: Pool db1099 & db1101 as eqiad's temp. pc2 and pc3 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359379 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [08:19:42] (03Merged) 10jenkins-bot: mariadb: Pool db1099 & db1101 as eqiad's temp. pc2 and pc3 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359379 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [08:19:52] (03CR) 10jenkins-bot: mariadb: Pool db1099 & db1101 as eqiad's temp. pc2 and pc3 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359379 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [08:20:01] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3353991 (10MoritzMuehlenhoff) Wait, that wikitech page is for getting access to teh WMF-NDA group in Phabricator (I'll make a note to update it when I find some... [08:20:16] !log about to swithover pc1005 and pc1006 to db1099 and db1001 [08:20:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:01] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Switchover pc1005 and pc1006 to db1099 and db1001 (duration: 00m 45s) [08:23:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:11] 10Operations, 10Discovery, 10Traffic, 10Wikidata, 10Wikidata-Query-Service: runUpdate.sh script in wikidata stand-alone has abruptly started incurring numerous 429 errors. - https://phabricator.wikimedia.org/T168019#3354007 (10Peachey88) [08:33:13] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3354012 (10MoritzMuehlenhoff) I've updated the docs to differentiate between Phabricator NDA access and LDAP/shell access: https://wikitech.wikimedia.org/wiki/Vo... [08:35:19] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3354014 (10Marostegui) Thank you! [08:36:51] (03PS1) 10Marostegui: Revert "db-eqiad.php: Add comment to db1018 status" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359386 [08:36:56] (03PS2) 10Marostegui: Revert "db-eqiad.php: Add comment to db1018 status" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359386 [08:37:05] (03CR) 10Marostegui: [C: 04-2] "Wait until maintenance is done" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359386 (owner: 10Marostegui) [08:38:21] (03PS1) 10DatGuy: Add sandbox link for dtywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359387 (https://phabricator.wikimedia.org/T168038) [08:40:14] !log jynus@tin Synchronized wmf-config/db-codfw.php: Add db1099 and db1001 hosts to config (duration: 00m 41s) [08:40:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:49] (03CR) 10Marostegui: "Cloud Team - let us know if you agree with this so we can deploy it and fix: T167961" [puppet] - 10https://gerrit.wikimedia.org/r/359152 (https://phabricator.wikimedia.org/T167961) (owner: 10Marostegui) [08:50:53] !log bringing down pc1005 and pc1006 for maintenance T167567 [08:51:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:04] T167567: Migrate parsercache hosts to file per table - https://phabricator.wikimedia.org/T167567 [08:53:22] (03CR) 10Hashar: "This is a dupe of https://gerrit.wikimedia.org/r/#/c/356246/ you did a couple weeks ago -:-]" [puppet] - 10https://gerrit.wikimedia.org/r/359382 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [08:54:08] (03PS6) 10Paladox: contint: Only install libmysqlclient-dev if on trusty or jessie [puppet] - 10https://gerrit.wikimedia.org/r/356246 (https://phabricator.wikimedia.org/T166611) [08:54:11] (03PS7) 10Paladox: contint: Only install libmysqlclient-dev if on trusty or jessie [puppet] - 10https://gerrit.wikimedia.org/r/356246 (https://phabricator.wikimedia.org/T166611) [08:54:14] (03Abandoned) 10Paladox: contint python: use libmariadb-dev over libmysqlclient-dev on stretch [puppet] - 10https://gerrit.wikimedia.org/r/359382 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [08:54:28] (03CR) 10Paladox: "> This is a dupe of https://gerrit.wikimedia.org/r/#/c/356246/ you" [puppet] - 10https://gerrit.wikimedia.org/r/359382 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [09:02:21] 10Operations, 10Discovery, 10Traffic, 10Wikidata, 10Wikidata-Query-Service: runUpdate.sh script in wikidata stand-alone has abruptly started incurring numerous 429 errors. - https://phabricator.wikimedia.org/T168019#3354044 (10Peachey88) [09:13:35] !log re-enabled puppet on mw2129 (no reason was given why it was disabled( [09:13:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:07] 10Operations, 10Discovery, 10Traffic, 10Wikidata, 10Wikidata-Query-Service: runUpdate.sh script in wikidata stand-alone has abruptly started incurring numerous 429 errors. - https://phabricator.wikimedia.org/T168019#3354055 (10ema) p:05Triage>03Normal [09:14:22] 10Operations, 10Puppet, 10Continuous-Integration-Config: Get rid of "import realm.pp" in manifests/site.pp - https://phabricator.wikimedia.org/T154915#3354056 (10hashar) [09:17:29] 10Operations, 10Puppet, 10Continuous-Integration-Config: Get rid of "import realm.pp" in manifests/site.pp - https://phabricator.wikimedia.org/T154915#3354057 (10hashar) [09:23:05] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I don't follow why libtemplate-perl is also added. The rest LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/359304 (owner: 10BryanDavis) [09:25:00] 10Operations, 10Puppet, 10Continuous-Integration-Config, 10Release-Engineering-Team: Get rid of "import realm.pp" in manifests/site.pp - https://phabricator.wikimedia.org/T154915#3354059 (10hashar) I have poked the `ops` mailing list about it. [09:25:27] (03CR) 10Muehlenhoff: Add vim-scripts as a standard package (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/359304 (owner: 10BryanDavis) [09:32:13] 10Operations, 10MediaWiki-JobRunner, 10Release-Engineering-Team, 10Beta-Cluster-reproducible: jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044#3354078 (10hashar) [09:33:15] (03PS2) 10Hashar: jobrunner: add exit codes to services units [puppet] - 10https://gerrit.wikimedia.org/r/357362 (https://phabricator.wikimedia.org/T168044) [09:33:35] (03CR) 10Hashar: "Rebased and attached to T168044" [puppet] - 10https://gerrit.wikimedia.org/r/357362 (https://phabricator.wikimedia.org/T168044) (owner: 10Hashar) [09:34:14] 10Operations, 10MediaWiki-JobRunner, 10Release-Engineering-Team, 10Beta-Cluster-reproducible, 10Patch-For-Review: jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044#3354093 (10hashar) The alternative is to have `redisJobChronService` and `red... [09:37:03] (03CR) 10Hashar: "IIRC that was intended to help with some DNS migration. However CI does not have Maxmind GeoIP database and hence we can't really build an" [puppet] - 10https://gerrit.wikimedia.org/r/343747 (owner: 10Hashar) [09:39:57] (03PS4) 10Hashar: zuul: rspec tests [puppet] - 10https://gerrit.wikimedia.org/r/299151 [09:40:25] PROBLEM - Apache HTTP on mw2127 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:40:27] (03CR) 10Alexandros Kosiaris: "Yeah it will be a duplicate if 2 exactly the same backup::set resources exist." [puppet] - 10https://gerrit.wikimedia.org/r/359089 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [09:41:15] RECOVERY - Apache HTTP on mw2127 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.121 second response time [09:52:01] !log installing guile security updates [09:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:08] (03PS1) 10Jcrespo: mariadb: Depool db1091 for performance testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359394 (https://phabricator.wikimedia.org/T168010) [10:07:45] (03CR) 10Marostegui: [C: 031] mariadb: Depool db1091 for performance testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359394 (https://phabricator.wikimedia.org/T168010) (owner: 10Jcrespo) [10:07:56] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1091 for performance testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359394 (https://phabricator.wikimedia.org/T168010) (owner: 10Jcrespo) [10:09:22] (03Merged) 10jenkins-bot: mariadb: Depool db1091 for performance testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359394 (https://phabricator.wikimedia.org/T168010) (owner: 10Jcrespo) [10:09:31] (03CR) 10jenkins-bot: mariadb: Depool db1091 for performance testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359394 (https://phabricator.wikimedia.org/T168010) (owner: 10Jcrespo) [10:11:01] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1091 for performance testing (duration: 00m 42s) [10:11:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:32] !log running analyze on db1091 (depooled), may create lag [10:18:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:02] (03PS7) 10Hashar: Test the future parser in puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/322898 (https://phabricator.wikimedia.org/T154915) (owner: 10Alexandros Kosiaris) [10:24:24] (03CR) 10Hashar: "Attached to T154915 :-] What is the .configs file for ? Is that solely for the puppet compiler?" [puppet] - 10https://gerrit.wikimedia.org/r/322898 (https://phabricator.wikimedia.org/T154915) (owner: 10Alexandros Kosiaris) [10:27:20] (03PS1) 10Alexandros Kosiaris: Add ganeti100{5,6,7,8} to the fleet [puppet] - 10https://gerrit.wikimedia.org/r/359395 (https://phabricator.wikimedia.org/T166076) [10:31:48] (03PS1) 10Hashar: test puppet syntax with future parser [puppet] - 10https://gerrit.wikimedia.org/r/359396 [10:32:56] (03CR) 10Hashar: "That makes puppet-syntax to always use the future parser when on Puppet 3.x." [puppet] - 10https://gerrit.wikimedia.org/r/359396 (owner: 10Hashar) [10:41:40] 10Operations, 10MediaWiki-JobRunner, 10Release-Engineering-Team, 10Beta-Cluster-reproducible, 10Patch-For-Review: jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044#3354172 (10Marostegui) p:05Triage>03Normal I would rather go for for your... [10:44:09] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install ganeti1005-ganeti1008 - https://phabricator.wikimedia.org/T166076#3354177 (10akosiaris) >>! In T166076#3290280, @Cmjohnson wrote: > Racked and labeled 2 in row A (a4 and a6) 1 in row B4 and 1 in row B3. Racktables updated This is unfortunatel... [10:50:52] (03CR) 10Alexandros Kosiaris: "Yes please do. We 'd like to have both current AND future parser while the migration to the future parser is ongoing." [puppet] - 10https://gerrit.wikimedia.org/r/359396 (owner: 10Hashar) [10:51:47] (03CR) 10Hashar: "I guess I found a pet project for the week-end :-]" [puppet] - 10https://gerrit.wikimedia.org/r/359396 (owner: 10Hashar) [10:55:05] (03PS2) 10Hashar: wikidatabuilder: ship Gerrit ssh host key via a role [puppet] - 10https://gerrit.wikimedia.org/r/337284 (https://phabricator.wikimedia.org/T157912) [10:58:04] (03CR) 10Alexandros Kosiaris: [C: 032] build: allow usage of a different puppet version [puppet] - 10https://gerrit.wikimedia.org/r/338633 (owner: 10Hashar) [10:58:08] (03PS5) 10Alexandros Kosiaris: build: allow usage of a different puppet version [puppet] - 10https://gerrit.wikimedia.org/r/338633 (owner: 10Hashar) [10:58:11] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] build: allow usage of a different puppet version [puppet] - 10https://gerrit.wikimedia.org/r/338633 (owner: 10Hashar) [10:59:16] (03PS2) 10Alexandros Kosiaris: Add ganeti100{5,6,7,8} to the fleet [puppet] - 10https://gerrit.wikimedia.org/r/359395 (https://phabricator.wikimedia.org/T166076) [11:00:19] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add ganeti100{5,6,7,8} to the fleet [puppet] - 10https://gerrit.wikimedia.org/r/359395 (https://phabricator.wikimedia.org/T166076) (owner: 10Alexandros Kosiaris) [11:09:16] (03CR) 10Alexandros Kosiaris: [C: 031] mariadb: add GRANT for tendril@dbmonitor1001 [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) (owner: 10Dzahn) [11:14:44] 10Operations, 10Discovery, 10Traffic, 10Wikidata, 10Wikidata-Query-Service: runUpdate.sh script in wikidata stand-alone has abruptly started incurring numerous 429 errors. - https://phabricator.wikimedia.org/T168019#3353215 (10ema) I took a look at rate limited requests with User-Agent 'Wikidata Query Se... [11:18:12] 10Operations, 10HyperSwitch, 10RESTBase-API, 10Traffic, 10Services (next): Respect host header in RESTBase, and redirect /rest_v1 to /rest_v1/ - https://phabricator.wikimedia.org/T167972#3354312 (10ema) p:05Triage>03Normal [11:24:13] (03PS1) 10Ema: VCL: apply API rate limits to wikidata too [puppet] - 10https://gerrit.wikimedia.org/r/359401 (https://phabricator.wikimedia.org/T163233) [11:31:10] (03PS2) 10Ema: VCL: apply API rate limits to wikidata too [puppet] - 10https://gerrit.wikimedia.org/r/359401 (https://phabricator.wikimedia.org/T163233) [11:32:58] (03CR) 10Ema: [C: 032] VCL: apply API rate limits to wikidata too [puppet] - 10https://gerrit.wikimedia.org/r/359401 (https://phabricator.wikimedia.org/T163233) (owner: 10Ema) [11:53:41] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3354405 (10Xqt) [11:55:21] (03CR) 10Jcrespo: "The change is ok, but this seems to add dbmonitor1001.wikimedia.org, and not 2001, is that intended?" [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) (owner: 10Dzahn) [11:57:32] (03CR) 10Jcrespo: [C: 031] "I think it is intended, but can I ask why not adding 2001? Is it because cross-dc queries and that we are actually planning on have a tend" [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) (owner: 10Dzahn) [11:58:36] (03PS1) 10Cmjohnson: Updating netboot.cfg for new wtp1025-1048 [puppet] - 10https://gerrit.wikimedia.org/r/359406 [12:00:16] (03CR) 10Cmjohnson: [C: 032] Updating netboot.cfg for new wtp1025-1048 [puppet] - 10https://gerrit.wikimedia.org/r/359406 (owner: 10Cmjohnson) [12:38:45] (03CR) 10Faidon Liambotis: [C: 04-1] "Let's fix https://phabricator.wikimedia.org/T166888#3333018 first, too slow otherwise?" [puppet] - 10https://gerrit.wikimedia.org/r/359396 (owner: 10Hashar) [12:39:26] (03PS2) 10Faidon Liambotis: ssl: cleanup a bunch of expired/obsolete certs [puppet] - 10https://gerrit.wikimedia.org/r/359221 [12:40:17] (03Abandoned) 10Ema: Redirect /api/rest_v1 to RESTBase docs page [puppet] - 10https://gerrit.wikimedia.org/r/306979 (https://phabricator.wikimedia.org/T125226) (owner: 10Ppchelko) [12:41:55] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1091 for performance testing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359416 [12:42:06] (03CR) 10Faidon Liambotis: [C: 032] ssl: cleanup a bunch of expired/obsolete certs [puppet] - 10https://gerrit.wikimedia.org/r/359221 (owner: 10Faidon Liambotis) [12:43:43] (03PS1) 10Ema: VCL: remove disableImages handling [puppet] - 10https://gerrit.wikimedia.org/r/359417 (https://phabricator.wikimedia.org/T168013) [12:44:46] (03CR) 10Ema: [C: 04-1] "The disable images on mobile functionality is getting removed. This change should be merged one month after https://gerrit.wikimedia.org/r" [puppet] - 10https://gerrit.wikimedia.org/r/359417 (https://phabricator.wikimedia.org/T168013) (owner: 10Ema) [12:46:39] (03PS6) 10Faidon Liambotis: scap: fix rubocop warnings [puppet] - 10https://gerrit.wikimedia.org/r/357720 [12:46:41] (03PS6) 10Faidon Liambotis: rubocop: update rubocop to rubocop 0.49.1 [puppet] - 10https://gerrit.wikimedia.org/r/357722 [12:46:43] (03PS2) 10Faidon Liambotis: rubocop: remove stale comments from _todo.yml [puppet] - 10https://gerrit.wikimedia.org/r/357801 [12:46:45] (03PS3) 10Faidon Liambotis: Bump puppet & rake versions in the Gemfile [puppet] - 10https://gerrit.wikimedia.org/r/357810 [12:48:54] (03PS7) 10Faidon Liambotis: scap: fix rubocop warnings [puppet] - 10https://gerrit.wikimedia.org/r/357720 [12:55:33] (03CR) 10Faidon Liambotis: [C: 032] scap: fix rubocop warnings [puppet] - 10https://gerrit.wikimedia.org/r/357720 (owner: 10Faidon Liambotis) [12:55:46] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1091 for performance testing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359416 (owner: 10Jcrespo) [12:56:14] (03PS7) 10Faidon Liambotis: rubocop: update rubocop to rubocop 0.49.1 [puppet] - 10https://gerrit.wikimedia.org/r/357722 [12:56:16] (03PS3) 10Faidon Liambotis: rubocop: remove stale comments from _todo.yml [puppet] - 10https://gerrit.wikimedia.org/r/357801 [12:56:18] (03PS4) 10Faidon Liambotis: Bump puppet & rake versions in the Gemfile [puppet] - 10https://gerrit.wikimedia.org/r/357810 [12:57:24] (03CR) 10Hashar: "I had that task in mind. Though for the manifests syntax validation CI only lints files changed in HEAD (rake syntax:manifests_head). Su" [puppet] - 10https://gerrit.wikimedia.org/r/359396 (owner: 10Hashar) [12:58:50] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1091 for performance testing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359416 (owner: 10Jcrespo) [12:59:00] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1091 for performance testing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359416 (owner: 10Jcrespo) [12:59:09] hashar: "death by a thousand cuts" :) [12:59:31] i.e., every addition only has a minimal performance hit, takes a few seconds more or something [12:59:46] and then three years later, you look at it in aggregate and it takes 2 minutes to run [13:00:00] (03CR) 10Faidon Liambotis: [C: 032] rubocop: update rubocop to rubocop 0.49.1 [puppet] - 10https://gerrit.wikimedia.org/r/357722 (owner: 10Faidon Liambotis) [13:00:14] (03CR) 10Faidon Liambotis: [C: 032] rubocop: remove stale comments from _todo.yml [puppet] - 10https://gerrit.wikimedia.org/r/357801 (owner: 10Faidon Liambotis) [13:00:16] (03CR) 10Faidon Liambotis: [C: 032] Bump puppet & rake versions in the Gemfile [puppet] - 10https://gerrit.wikimedia.org/r/357810 (owner: 10Faidon Liambotis) [13:00:36] yeah, if jenkins will pass the list of modified files to tox also I can modify my change to lint the files without extensions [13:00:44] to cycle over the changed ones only [13:00:50] yup [13:01:23] hashar: do you know already if it will be a paramter, an ENV variable or what? [13:01:40] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1091 after performance testing (duration: 00m 41s) [13:01:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:16] (03PS2) 10Faidon Liambotis: Add vim-scripts as a standard package [puppet] - 10https://gerrit.wikimedia.org/r/359304 (owner: 10BryanDavis) [13:02:22] (03PS3) 10Faidon Liambotis: Add vim-scripts as a standard package [puppet] - 10https://gerrit.wikimedia.org/r/359304 (owner: 10BryanDavis) [13:02:42] given the indirection jenkins -> tox -> tox.ini where the script is defined and run [13:02:48] akosiaris, moritzm: any disagreements about vim-scripts? [13:03:20] volans: my idea was to run tox from the Rakefile, but if it's something that can happen at a different layer, that works too [13:03:56] that's also a possibility, only drawback that I see is that in jenkins will be a single longer job instead of multiple parallel ones [13:04:16] is it different jobs now? [13:04:16] paravoid: don't use it myself, but fine with me [13:04:39] paravoid: no right is the same, I though were splitted [13:04:42] moritzm: yeah me neither, I suggested it because bryan was adding copies of various plugins to his ~ [13:04:51] yep, saw the original task [13:04:57] volans: I think they were merged some time ago, but I may be misremembering [13:05:46] (03CR) 10Faidon Liambotis: [C: 032] Add vim-scripts as a standard package [puppet] - 10https://gerrit.wikimedia.org/r/359304 (owner: 10BryanDavis) [13:05:48] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: CI for operations/puppet is taking too long - https://phabricator.wikimedia.org/T166888#3354713 (10hashar) [13:05:51] could be, also multiple job means multiple git clone/pull so not forcely a win per se [13:06:03] (03PS11) 10Faidon Liambotis: bd808's dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/353937 (owner: 10BryanDavis) [13:06:15] (03CR) 10Faidon Liambotis: [V: 032 C: 032] bd808's dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/353937 (owner: 10BryanDavis) [13:07:20] empty /home/bd808/.vim/.gitignore could probably be dropped too ;) [13:07:48] the whole .vim dir i mean [13:07:59] (03CR) 10Faidon Liambotis: [C: 031] "LGTM, but I'll leave for #Traffic." [puppet] - 10https://gerrit.wikimedia.org/r/355869 (https://phabricator.wikimedia.org/T164810) (owner: 10Dzahn) [13:08:20] oops, good point [13:08:28] unless he symlinks stuff there [13:08:38] I think that was one of the options for enabling vim-scripts [13:09:00] right, got it [13:11:14] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2070 - https://phabricator.wikimedia.org/T167667#3354715 (10Papaul) Return label information {F8467699} [13:12:19] any package nstallation ongoing? [13:13:17] jynus: not sure what you mean, https://gerrit.wikimedia.org/r/#/c/359304/ was merged few minutes ago [13:13:21] I thik it is just faidons' [13:13:54] why then did dpkg complained? ah, becauese the check and the puppet run happend to be at the same time [13:14:28] akosiaris: none, I 've already +1ed it [13:15:59] wrong person pinged maybe? [13:16:43] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure: rack/setup/install labtestpuppetmaster2001 - https://phabricator.wikimedia.org/T167157#3354726 (10Papaul) I troubleshoot this with Daniel by replacing replacing eth0 with eth1 MAC address in the DHCP file but same problem can not boot also from eth... [13:17:00] talking to my self [13:17:07] sigh [13:17:14] first sign of dementia [13:17:16] I think it is a bug [13:17:16] it's friday [13:17:32] what is the problem you're seeing jynus? [13:17:33] mine client also autocompletes a reference top me [13:17:36] paravoid: no I am fine with vim-scripts, only reason I did not +1 bryan's change was libtemplate-perl [13:17:52] j writes "jynus: " [13:17:53] akosiaris: I amended it to add a comment for the explanation that moritzm gave [13:18:06] paravoid: sorry, not talking about the patch [13:18:22] PROBLEM - puppet last run on mendelevium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:18:29] paravoid: he means me [13:18:32] I am the bug :P [13:18:42] no, our clientes have a small bug [13:18:47] IRC clients [13:18:50] phab down? [13:19:05] seems so [13:19:09] indeed [13:19:14] jynus: says something about upgrade databases? [13:19:32] my fault [13:19:33] fixing [13:19:38] Id did not touch anything [13:19:41] ok, you had me scared there for a moment [13:19:45] that it was my scap change or something :) [13:19:45] me too [13:19:52] yeah, I thoght of something else [13:20:02] PROBLEM - https://phabricator.wikimedia.org on iridium is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string focus on bug not found on https://phabricator.wikimedia.org:443https://phabricator.wikimedia.org/ - 4728 bytes in 0.107 second response time [13:20:04] paravoid: I thought security [13:20:25] !log fixing phab database migrations [13:20:27] twentyafterfour: can I help, do you need a backup or something, or are you on it [13:20:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:42] PROBLEM - https://phabricator.wikimedia.org on phab2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string focus on bug not found on https://phabricator.wikimedia.org:443https://phabricator.wikimedia.org/ - 4729 bytes in 0.113 second response time [13:20:47] jynus: I don't think I need a backup ... gimme one sec [13:22:17] ACKNOWLEDGEMENT - https://phabricator.wikimedia.org on iridium is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string focus on bug not found on https://phabricator.wikimedia.org:443https://phabricator.wikimedia.org/ - 4729 bytes in 0.102 second response time 20after4 Im on it [13:22:37] running the script, but it's taking a minute to run :-/ [13:22:48] I didn't mean to need that [13:25:42] RECOVERY - https://phabricator.wikimedia.org on phab2001 is OK: HTTP OK: HTTP/1.1 200 OK - 33950 bytes in 0.224 second response time [13:26:02] RECOVERY - https://phabricator.wikimedia.org on iridium is OK: HTTP OK: HTTP/1.1 200 OK - 33951 bytes in 0.219 second response time [13:26:04] I know this was unintentional, but befor running a script or upgrade, no matter how trivial, ask me to stop the slave- it takes 0 seconds and avoids issues [13:26:07] !log fixed phabricator "upgrade database" error. [13:26:12] * twentyafterfour writes incident report [13:26:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:25] jynus: oh, ok [13:26:26] I can even program to stop it on thursdays [13:26:44] I didn't intend to need to run that, totally a mistake [13:26:54] yes, I know, no big deal :-) [13:27:05] but I will definitely remember that in the future, it would be useful to make some things run faster I think [13:27:08] I think we have m3 delayed [13:27:22] we can even failover to a read-only slave if it takes some time [13:27:30] Thanks for fixing Phab! :) [13:27:30] (03PS1) 10Ema: VCL: add support for X-Applayer-Cost [puppet] - 10https://gerrit.wikimedia.org/r/359419 [13:27:43] yes, thank you, twentyafterfour [13:29:26] hmm [13:29:32] i am getting these errors [13:29:34] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: Package[libtemplate-perl] is already declared; cannot redeclare at /etc/puppet/modules/contint/manifests/packages/analytics.pp:25 on node jenkins-slave-01.git.eqiad.wmflabs [13:30:06] yes, we have phabricator one day delatyed on a couple of hosts [13:30:27] but on doubt, ping me, and I will be glad to help! [13:30:49] (03PS3) 10Marostegui: Revert "db-eqiad.php: Add comment to db1018 status" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359386 [13:31:04] aha [13:31:05] https://github.com/wikimedia/puppet/commit/bdc731a7a826cb83cab9552dc8352caf8e7f969b [13:31:07] caused by [13:31:08] ^^ [13:31:13] paladox: it gets installed in standard_packages in base now, same package [13:31:22] yes, that [13:31:24] mutante yeh just figured that out :) [13:31:28] need to remove it from contint [13:32:00] or use require_package ? [13:32:02] thanks for your patience everybody, I'm writing incident report for ~8 minutes of unscheduled downtime [13:32:08] and OTRS too for that matters [13:32:13] akosiaris: ^^^ [13:32:55] damn, I 'll do a require_package for OTRS [13:33:03] paladox: ^ do that too for contint [13:33:14] Ok [13:33:34] mutante require_package? [13:33:44] yes, it will not cause the duplicate like that [13:34:01] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Add comment to db1018 status" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359386 (owner: 10Marostegui) [13:34:33] 10Operations: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354760 (10Peachey88) p:05Triage>03High [13:34:38] (03Draft1) 10Paladox: contint: Remove package libtemplate-perl from analytics.pp [puppet] - 10https://gerrit.wikimedia.org/r/359420 [13:34:40] (03PS2) 10Paladox: contint: Remove package libtemplate-perl from analytics.pp [puppet] - 10https://gerrit.wikimedia.org/r/359420 [13:34:43] mutante ^^ :) [13:34:49] thanks [13:34:55] 10Operations: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354765 (10Peachey88) p:05High>03Unbreak! [13:35:03] (03PS1) 10Faidon Liambotis: Fix 3 CommandLiteral and MultilineIfThen infractions [puppet] - 10https://gerrit.wikimedia.org/r/359421 [13:35:05] (03PS1) 10Faidon Liambotis: gridengine: fix rubocop infractions for the type [puppet] - 10https://gerrit.wikimedia.org/r/359422 [13:35:07] (03PS1) 10Faidon Liambotis: base: make rubocop happier with physicalcorecount [puppet] - 10https://gerrit.wikimedia.org/r/359423 [13:35:10] (03PS1) 10Faidon Liambotis: wmflib/ipresolve: fix multiple rubocop infractions [puppet] - 10https://gerrit.wikimedia.org/r/359424 [13:35:16] (03PS3) 10Paladox: contint: Remove package libtemplate-perl from analytics.pp [puppet] - 10https://gerrit.wikimedia.org/r/359420 [13:35:19] (03PS1) 10Alexandros Kosiaris: otrs: Use require_package instead of package resource [puppet] - 10https://gerrit.wikimedia.org/r/359425 [13:35:27] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Add comment to db1018 status" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359386 (owner: 10Marostegui) [13:35:43] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] otrs: Use require_package instead of package resource [puppet] - 10https://gerrit.wikimedia.org/r/359425 (owner: 10Alexandros Kosiaris) [13:36:05] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354769 (10Paladox) [13:36:33] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove comments from db1018 current status - T166205 (duration: 00m 41s) [13:36:35] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Add comment to db1018 status" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359386 (owner: 10Marostegui) [13:36:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:42] T166205: Convert unique keys into primary keys for some wiki tables on s2 - https://phabricator.wikimedia.org/T166205 [13:37:18] (03PS4) 10Alexandros Kosiaris: contint: Remove package libtemplate-perl from analytics.pp [puppet] - 10https://gerrit.wikimedia.org/r/359420 (owner: 10Paladox) [13:37:23] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] contint: Remove package libtemplate-perl from analytics.pp [puppet] - 10https://gerrit.wikimedia.org/r/359420 (owner: 10Paladox) [13:37:32] RECOVERY - puppet last run on mendelevium is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [13:38:01] thanks akosiaris :) [13:38:52] you are welcome [13:41:04] (03PS1) 10Faidon Liambotis: Remove lldp.rb references from .rubocop_todo.yml [puppet] - 10https://gerrit.wikimedia.org/r/359426 [13:41:52] (03CR) 10Faidon Liambotis: [C: 032] Remove lldp.rb references from .rubocop_todo.yml [puppet] - 10https://gerrit.wikimedia.org/r/359426 (owner: 10Faidon Liambotis) [13:44:31] (03PS2) 10Alexandros Kosiaris: Fix 3 CommandLiteral and MultilineIfThen infractions [puppet] - 10https://gerrit.wikimedia.org/r/359421 (owner: 10Faidon Liambotis) [13:44:35] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Fix 3 CommandLiteral and MultilineIfThen infractions [puppet] - 10https://gerrit.wikimedia.org/r/359421 (owner: 10Faidon Liambotis) [13:45:23] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354727 (10Anomie) > **Steps to reproduce** > 1. Log out > 2. Log back in (username + password) > 3. Be asked for token, use Google Auth to retrieve > 4. Enter token, get `Verification f... [13:48:00] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354816 (10TNTPublic) @Anomie for what it's worth Chrissymad uses FreeOTP as well. Thanks for clarity though, doesn't look like it's a direct issue with the 2FA functionality as there ar... [13:48:19] (03PS2) 10Alexandros Kosiaris: gridengine: fix rubocop infractions for the type [puppet] - 10https://gerrit.wikimedia.org/r/359422 (owner: 10Faidon Liambotis) [13:48:23] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] gridengine: fix rubocop infractions for the type [puppet] - 10https://gerrit.wikimedia.org/r/359422 (owner: 10Faidon Liambotis) [13:48:33] (03PS2) 10Alexandros Kosiaris: base: make rubocop happier with physicalcorecount [puppet] - 10https://gerrit.wikimedia.org/r/359423 (owner: 10Faidon Liambotis) [13:48:39] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] base: make rubocop happier with physicalcorecount [puppet] - 10https://gerrit.wikimedia.org/r/359423 (owner: 10Faidon Liambotis) [13:50:20] (03PS2) 10Alexandros Kosiaris: wmflib/ipresolve: fix multiple rubocop infractions [puppet] - 10https://gerrit.wikimedia.org/r/359424 (owner: 10Faidon Liambotis) [13:50:25] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] wmflib/ipresolve: fix multiple rubocop infractions [puppet] - 10https://gerrit.wikimedia.org/r/359424 (owner: 10Faidon Liambotis) [13:51:38] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354727 (10Marostegui) Hi, I just tried phabricator and wikitech and worked fine. Disabled it on wikitech, logged out, logged in, enabled it again, logged out and logged in finely. For... [13:55:19] (03PS2) 10Ema: VCL: add support for X-Applayer-Cost [puppet] - 10https://gerrit.wikimedia.org/r/359419 [13:58:25] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install ganeti1005-ganeti1008 - https://phabricator.wikimedia.org/T166076#3354872 (10akosiaris) [13:59:02] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install ganeti1005-ganeti1008 - https://phabricator.wikimedia.org/T166076#3284602 (10akosiaris) [13:59:57] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354881 (10Tgr) p:05Unbreak!>03Normal Works fine with Google Authenticator as well. I can think of too things: your clock is really off (minutes, at least) or you have updated your... [13:59:59] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install ganeti1005-ganeti1008 - https://phabricator.wikimedia.org/T166076#3284602 (10akosiaris) I 've update the task to point out that ganeti1005, ganeti1006 are fine. I am already adding them to the ganeti cluster. It's just ganeti1007, ganeti1008 t... [14:00:15] https://wikitech.wikimedia.org/wiki/Incident_documentation/20170616-PHAB [14:00:54] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3354884 (10Amire80) >>! In T167714#3350157, @Reedy wrote: > The wiki is now for all intents and purposes, created. > >... [14:01:45] (03PS1) 10Hashar: (DO NOT SUBMIT) cache save test [puppet] - 10https://gerrit.wikimedia.org/r/359436 (https://phabricator.wikimedia.org/T168063) [14:02:15] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3341896 (10DatGuy) Seems like it's already imported. [14:02:50] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3354893 (10Jeff_G) >>! In T167400#3350622, @Bawolff wrote: > Downsides: If anyone using a zero-rated connection is a file patroller, they... [14:04:07] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354727 (10TheDJ) For anyone who has made use of scratch codes and succeeded logging in: Please remember that because of {T131788} and {T150601} it's not yet possible to get a new set of... [14:04:25] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3354902 (10Benoit_Rochon) Interwiki link in Wikidata seems to not work. https://www.wikidata.org/w/index.php?title=Q151... [14:06:33] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354905 (10TNTPublic) @Tgr I'll show myself out, it was a timing issue with GA. For the future, the solution is; `Google Authenticator --> Settings --> Time correction for codes --> Syn... [14:07:23] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/359436 (https://phabricator.wikimedia.org/T168063) (owner: 10Hashar) [14:09:04] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: CI for operations/puppet is taking too long - https://phabricator.wikimedia.org/T166888#3354917 (10hashar) [14:09:17] 10Operations: Look into feasibility of disabling sha-1 host keys on our ssh daemons - https://phabricator.wikimedia.org/T167966#3354919 (10Marostegui) p:05Triage>03Normal [14:10:01] tgr: good call with timing, thank you - someone else is still having issues but I'm going to guess it's a similar thing [14:10:06] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3354921 (10Amire80) >>! In T167714#3354889, @DatGuy wrote: > Seems like it's already imported. Oh indeed! I really shou... [14:18:45] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/359436 (https://phabricator.wikimedia.org/T168063) (owner: 10Hashar) [14:20:20] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: CI for operations/puppet is taking too long - https://phabricator.wikimedia.org/T166888#3354931 (10hashar) > Faidon wrote: >> Runs xUnit and castor-save, both of which emit failures in the logs ("Result was FAILURE") (3s). > castor-sa... [14:26:58] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3354954 (10Benoit_Rochon) {F8468018} {F8468021} [14:28:57] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3331800 (10BBlack) >>! In T167400#3350622, @Bawolff wrote: >> Wikipedia Zero traffic is tied to IP addresses, not users. So it definitely... [14:32:14] PROBLEM - pdfrender on scb2004 is CRITICAL: connect to address 10.192.16.36 and port 5252: Connection refused [14:32:34] PROBLEM - pdfrender on scb2002 is CRITICAL: connect to address 10.192.48.43 and port 5252: Connection refused [14:53:06] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3355085 (10Nemo_bis) > Why restrict this mechanism to Zero, making Zero different from other access? We could instead deny access to unpa... [14:53:28] (03PS1) 10Mforns: [WIP] Modify EL purging script to not use limit/offset [puppet] - 10https://gerrit.wikimedia.org/r/359442 (https://phabricator.wikimedia.org/T168071) [14:54:31] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Modify EL purging script to not use limit/offset [puppet] - 10https://gerrit.wikimedia.org/r/359442 (https://phabricator.wikimedia.org/T168071) (owner: 10Mforns) [14:54:43] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/359436 (https://phabricator.wikimedia.org/T168063) (owner: 10Hashar) [14:58:09] (03PS2) 10Andrew Bogott: designate: Clean up puppet config for deleted instances. [puppet] - 10https://gerrit.wikimedia.org/r/359374 (https://phabricator.wikimedia.org/T147878) [14:58:11] (03PS1) 10Andrew Bogott: Labspuppetbackend: add endpoint to get all used roles. [puppet] - 10https://gerrit.wikimedia.org/r/359443 (https://phabricator.wikimedia.org/T151522) [14:58:46] 10Operations, 10ops-eqiad, 10Scoring-platform-team: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3355103 (10Halfak) [15:01:09] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3355105 (10BBlack) >>! In T167400#3355085, @Nemo_bis wrote: >> Why restrict this mechanism to Zero, making Zero different from other acce... [15:03:42] (03Abandoned) 10Hashar: (DO NOT SUBMIT) cache save test [puppet] - 10https://gerrit.wikimedia.org/r/359436 (https://phabricator.wikimedia.org/T168063) (owner: 10Hashar) [15:06:12] (03PS2) 10Andrew Bogott: Labspuppetbackend: add endpoint to get all used roles. [puppet] - 10https://gerrit.wikimedia.org/r/359443 (https://phabricator.wikimedia.org/T151522) [15:06:14] (03PS3) 10Andrew Bogott: designate: Clean up puppet config for deleted instances. [puppet] - 10https://gerrit.wikimedia.org/r/359374 (https://phabricator.wikimedia.org/T147878) [15:06:22] 10Operations, 10ops-eqiad, 10Scoring-platform-team: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3258960 (10Halfak) [15:07:38] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3355131 (10Samtar) Chrissymad still cannot use 2FA, and still gets the invalid token error on both the TOTP code and scratch codes. She has checked her phone's time and it is correct. A... [15:08:21] (03CR) 10Andrew Bogott: [C: 032] Labspuppetbackend: add endpoint to get all used roles. [puppet] - 10https://gerrit.wikimedia.org/r/359443 (https://phabricator.wikimedia.org/T151522) (owner: 10Andrew Bogott) [15:08:35] 10Operations, 10ops-eqiad, 10Scoring-platform-team: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3355133 (10Halfak) It looks like these servers are being installed in CODFW but the named include numbers in the 1000s. It seems common to have servers in CODFW have names in the... [15:12:05] (03PS1) 10Faidon Liambotis: wmflib: fix all Hiera backends' Rubocop infractions [puppet] - 10https://gerrit.wikimedia.org/r/359447 [15:12:07] (03PS1) 10Faidon Liambotis: Kill module puppet_statsd [puppet] - 10https://gerrit.wikimedia.org/r/359448 [15:12:09] (03PS1) 10Faidon Liambotis: wmflib: cleanup secret.rb a little bit [puppet] - 10https://gerrit.wikimedia.org/r/359449 [15:12:11] (03PS1) 10Faidon Liambotis: wmflib/to_milliseconds: fix two minor rubocop cops [puppet] - 10https://gerrit.wikimedia.org/r/359450 [15:12:13] (03PS1) 10Faidon Liambotis: graphite: cleanup configparser_format a little bit [puppet] - 10https://gerrit.wikimedia.org/r/359451 [15:12:15] (03PS1) 10Faidon Liambotis: Fix Style/FormatString Rucobop across all Rakefiles [puppet] - 10https://gerrit.wikimedia.org/r/359452 [15:12:17] (03PS1) 10Faidon Liambotis: wmflib: fix rubocop infractions in serializers [puppet] - 10https://gerrit.wikimedia.org/r/359453 [15:12:24] akosiaris: ^ :) [15:12:32] 10Operations, 10ops-eqiad, 10Scoring-platform-team: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3355138 (10Cmjohnson) @halfak, these 9 are being installed in eqiad. [15:12:54] 10Operations, 10ops-eqiad, 10Scoring-platform-team: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3258960 (10Dzahn) @halfak since Chris racked them i'm pretty sure they are physically in EQIAD, so 1xxx names would be right. The request at T142578 says: The Site/Location: EQI... [15:13:33] 10Operations, 10ops-eqiad, 10Scoring-platform-team: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3355157 (10Halfak) Aha! My mistake! Thanks for the clarification. /me goes to fix related tasks where he wrote "codfw". :) [15:15:41] (03PS2) 10Hashar: Rake: optimize typos task for CI [puppet] - 10https://gerrit.wikimedia.org/r/357804 (https://phabricator.wikimedia.org/T166888) [15:16:40] hashar: did you see the description of a new reworked Rakefile I gave on the task? [15:17:24] paravoid: yeah I have a draft to reply to that [15:17:31] (03CR) 10jerkins-bot: [V: 04-1] Rake: optimize typos task for CI [puppet] - 10https://gerrit.wikimedia.org/r/357804 (https://phabricator.wikimedia.org/T166888) (owner: 10Hashar) [15:18:45] (03PS3) 10Hashar: Rake: optimize typos task for CI [puppet] - 10https://gerrit.wikimedia.org/r/357804 (https://phabricator.wikimedia.org/T166888) [15:19:09] (03CR) 10Hashar: "PS2 check all files for typos whenever ./typos has been altered. Still have to properly test it locally though." [puppet] - 10https://gerrit.wikimedia.org/r/357804 (https://phabricator.wikimedia.org/T166888) (owner: 10Hashar) [15:19:23] * hashar shakes fists at ruby output buffering [15:20:53] (03CR) 10Hashar: [C: 031] wmflib/to_milliseconds: fix two minor rubocop cops [puppet] - 10https://gerrit.wikimedia.org/r/359450 (owner: 10Faidon Liambotis) [15:22:12] (03CR) 10Hashar: [C: 031] "That actually makes the code nicer to read. Nested ternary operators are nasty." [puppet] - 10https://gerrit.wikimedia.org/r/359451 (owner: 10Faidon Liambotis) [15:22:35] hashar: I really think doing it that way will make it clearer and more readable [15:22:45] rather than just embedding git_changed_in_files in every of those tasks [15:22:57] and re-running git on every one of those invocations, btw [15:22:57] then I am not sure it is going to save much time [15:23:26] but that would surely ends up with a much cleaner / easier to follow code [15:23:40] what is? [15:23:53] my proposal you mean? [15:24:42] brb [15:27:56] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/1/3: down - Core: cr2-esams:xe-0/1/3 (Level3, BDFS2448, 84ms) {#2013} [10Gbps wave]BR [15:28:16] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/1/3: down - Core: cr2-eqiad:xe-4/1/3 (Level3, BDFS2448, 84ms) {#A0010621} [10Gbps wave]BR [15:29:22] XioNoX: expected? ^^^ [15:29:50] * XioNoX looks at the calendar [15:30:40] * volans looked already at it and didn't find anything obvious [15:31:56] traffic routed around seamlesly [15:32:21] good :) [15:34:03] I'm going to contact level3 as well [15:35:16] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 59, down: 0, dormant: 0, excluded: 0, unused: 0 [15:35:33] well, I guess not [15:35:56] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [15:35:59] lol [15:38:18] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3355220 (10Nemo_bis) > Why are we restricting Zero-rated users to only the "not a wiki" version of our wikis? That's a good question, an... [15:42:38] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3355223 (10FDMS) Can't Zero users opt to get a non-Zero version of Wikimedia projects, i.e. simply use their data allowance to access unp... [15:46:10] (03PS1) 10Hashar: tests: disable ruby output buffering [puppet] - 10https://gerrit.wikimedia.org/r/359457 [15:50:30] (03CR) 10Hashar: "Another little gems! Will cause "puts" to flush the buffer and thus the timing in the Jenkins console to be more or less reliable :-}" [puppet] - 10https://gerrit.wikimedia.org/r/359457 (owner: 10Hashar) [15:50:59] (03CR) 10Faidon Liambotis: [C: 04-1] "This would run git_changed_in_head for the third time in a CI job. Let's fix this properly?" [puppet] - 10https://gerrit.wikimedia.org/r/357804 (https://phabricator.wikimedia.org/T166888) (owner: 10Hashar) [15:54:49] (03CR) 10Hashar: "I am going with the low hanging fruits." [puppet] - 10https://gerrit.wikimedia.org/r/357804 (https://phabricator.wikimedia.org/T166888) (owner: 10Hashar) [16:25:57] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3354727 (10Framawiki) I can confirm Anomie's comment : I can logout/login without problems on frwiki, with FreeOTP. >>! In T168064#3354905, @TNTPublic wrote: > @Tgr I'll show myself out... [16:28:31] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3355292 (10zhuyifei1999) >>! In T167400#3355220, @Nemo_bis wrote: > to be reversed when MediaWiki manages to block the invalid files, rig... [16:29:28] (03CR) 10Dzahn: "tbh i just did 1001 only because the TODO list says that for now we just point the DNS name to 1001 and i like to just do one thing at a t" [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) (owner: 10Dzahn) [16:33:50] (03PS2) 10Dzahn: mariadb: add GRANTs for tendril@dbmonitor1001, tendriL@dbmonitor2001 [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) [16:34:51] (03CR) 10Dzahn: "added both now. should we also delete the grant from 10.% after we don't need it anymore?" [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) (owner: 10Dzahn) [16:35:03] (03CR) 10Jcrespo: [C: 031] "Please note I voted +1 to the previous one, too. Either are ok to me." [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) (owner: 10Dzahn) [16:35:23] (03CR) 10Dzahn: "and why does it currently work?:)" [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) (owner: 10Dzahn) [16:36:14] (03CR) 10Dzahn: [C: 031] "ok, cool, thanks. but i understand this needs deployment too" [puppet] - 10https://gerrit.wikimedia.org/r/359373 (https://phabricator.wikimedia.org/T149557) (owner: 10Dzahn) [16:37:18] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3331800 (10Steinsplitter) We have to take a measure to stop WP zero abuse, we have 1 terabyte of pirated content yet. [16:41:01] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3355329 (10Benoit_Rochon) Where should I ask to be granted bureaucrat on that Wiki ? [16:41:57] (03PS1) 10Volans: ClusterShell: allow to set a timeout per command [software/cumin] - 10https://gerrit.wikimedia.org/r/359466 (https://phabricator.wikimedia.org/T164838) [16:41:59] (03PS1) 10Volans: CLI: migrate to timeout per command [software/cumin] - 10https://gerrit.wikimedia.org/r/359467 (https://phabricator.wikimedia.org/T164838) [16:42:51] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3355339 (10DatGuy) @MarcoAurelio is a steward that can grant you it. Not sure if it has the same policy as CU/OS where t... [16:44:59] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3355346 (10Steinsplitter) WP zero pirates are sharing links to the content in Facebook Groups (FB is zero rating traffic as well), for ex... [16:47:19] (03PS3) 10Volans: CLI: improve configuration error handling [software/cumin] - 10https://gerrit.wikimedia.org/r/357234 (https://phabricator.wikimedia.org/T158747) [16:54:26] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3355377 (10Benoit_Rochon) If it needs to be at least two, grant also @Amqui who is the initiator of Atikamekw project.... [17:02:26] 10Operations, 10MediaWiki-extensions-CentralAuth: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064#3355386 (10Tgr) >>! In T168064#3355131, @Samtar wrote: > As mentioned above, I have pasted on her behalf a scratch code at P5588 (visible only to members of Security) which may be helpfu... [17:14:38] (03PS2) 10Nschaaf: Update recommendation-api module and role [puppet] - 10https://gerrit.wikimedia.org/r/358026 (https://phabricator.wikimedia.org/T167113) [17:15:27] (03PS3) 10Nschaaf: Update recommendation-api module and role [puppet] - 10https://gerrit.wikimedia.org/r/358026 (https://phabricator.wikimedia.org/T167113) [17:24:13] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3355437 (10MF-Warburg) Sysop requests can be posted on m:SRP after a local discussion happened. Bureaucrat rights will n... [17:27:44] (03CR) 10Dzahn: [C: 032] "yep, per hashar and test you did. also, it seems the current version has a syntax issue anyways? " if os_version('debian jessie') "" [puppet] - 10https://gerrit.wikimedia.org/r/356246 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [17:28:06] (03PS8) 10Dzahn: contint: Only install libmysqlclient-dev if on trusty or jessie [puppet] - 10https://gerrit.wikimedia.org/r/356246 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [17:29:10] (03CR) 10Dzahn: [C: 032] "installs libmariadbclient-dev on stretch" [puppet] - 10https://gerrit.wikimedia.org/r/356246 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [17:29:45] (03PS2) 10Faidon Liambotis: wmflib: fix all Hiera backends' Rubocop infractions [puppet] - 10https://gerrit.wikimedia.org/r/359447 [17:29:47] (03PS2) 10Faidon Liambotis: Kill module puppet_statsd [puppet] - 10https://gerrit.wikimedia.org/r/359448 [17:29:49] (03PS2) 10Faidon Liambotis: wmflib: cleanup secret.rb a little bit [puppet] - 10https://gerrit.wikimedia.org/r/359449 [17:29:51] (03PS2) 10Faidon Liambotis: wmflib/to_milliseconds: fix two minor rubocop cops [puppet] - 10https://gerrit.wikimedia.org/r/359450 [17:29:53] (03PS2) 10Faidon Liambotis: graphite: cleanup configparser_format a little bit [puppet] - 10https://gerrit.wikimedia.org/r/359451 [17:29:55] (03PS2) 10Faidon Liambotis: Fix Style/FormatString Rucobop across all Rakefiles [puppet] - 10https://gerrit.wikimedia.org/r/359452 [17:29:57] (03PS2) 10Faidon Liambotis: wmflib: fix rubocop infractions in serializers [puppet] - 10https://gerrit.wikimedia.org/r/359453 [17:29:59] (03PS1) 10Faidon Liambotis: Remove Rubocop exception for non-existent file [puppet] - 10https://gerrit.wikimedia.org/r/359477 [17:30:01] (03PS1) 10Faidon Liambotis: Fix more whitespace-related Rubocop across the tree [puppet] - 10https://gerrit.wikimedia.org/r/359478 [17:30:05] (03PS1) 10Faidon Liambotis: Fix Style/RegexpLiteral Rubocop offenses [puppet] - 10https://gerrit.wikimedia.org/r/359479 [17:30:07] (03PS1) 10Faidon Liambotis: wmflib: fix another couple minor rubocop offenses [puppet] - 10https://gerrit.wikimedia.org/r/359480 [17:30:09] (03PS1) 10Faidon Liambotis: wmflib, admin: fix Rubocop Style/For offenses [puppet] - 10https://gerrit.wikimedia.org/r/359481 [17:30:11] (03PS1) 10Faidon Liambotis: base: fix Rubocop MethodCallWithoutArgsParentheses [puppet] - 10https://gerrit.wikimedia.org/r/359482 [17:30:13] (03PS1) 10Faidon Liambotis: utils/expanderrb.rb: fix Style/SpecialGlobalVars [puppet] - 10https://gerrit.wikimedia.org/r/359483 [17:30:15] (03PS1) 10Faidon Liambotis: Fix Style/NegatedIf Rubocop offense across the tree [puppet] - 10https://gerrit.wikimedia.org/r/359484 [17:30:17] (03PS1) 10Faidon Liambotis: rubocop: move three ignores to .rubocop.yml [puppet] - 10https://gerrit.wikimedia.org/r/359485 [17:32:02] 10Operations, 10ops-codfw, 10DC-Ops, 10Discovery, and 3 others: elastic2020 is powered off and does not want to restart - https://phabricator.wikimedia.org/T149006#3355492 (10debt) 05Open>03Resolved Yay! :) [17:33:30] heh, i won't click submit right now. that would be mean [17:33:40] making you rebase all that again right the second [17:33:43] no, do, these will all sit for review [17:33:46] ok :) [17:34:34] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[mkfs-/dev/sdj1] [17:34:35] paladox: wanna confirm on jenkins-slave again? [17:34:39] Ok [17:36:15] (03CR) 10BBlack: [C: 031] Fix Style/FormatString Rucobop across all Rakefiles [puppet] - 10https://gerrit.wikimedia.org/r/359452 (owner: 10Faidon Liambotis) [17:37:52] ran it on jenkin-slave-01 .. no-op as it should [17:37:57] you do the stretch one :) [17:38:07] (03CR) 10BBlack: [C: 031] "On careful examination the changes look logically-sound, but I haven't tried to execute the new code!" [puppet] - 10https://gerrit.wikimedia.org/r/359449 (owner: 10Faidon Liambotis) [17:38:16] nevermind, this is already stretch [17:38:37] did you say that was in-place upgrade? [17:38:57] (03PS1) 10BBlack: numa.rb: add device_to_node data [puppet] - 10https://gerrit.wikimedia.org/r/359487 [17:38:59] (03PS1) 10BBlack: add numactl to base packages [puppet] - 10https://gerrit.wikimedia.org/r/359488 [17:39:01] (03PS1) 10BBlack: tlsproxy: restrict whole daemon to relevant NUMA node(s) [puppet] - 10https://gerrit.wikimedia.org/r/359489 [17:39:04] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [140.0] [17:39:15] paladox: looks all good to me, has libmariadb dev package and no errors [17:39:36] mutante /me is rebasing puppet master. It has merge conflicts [17:39:48] paladox: oh? the contint master? [17:39:56] nope puppet-paladox3 [17:40:00] ok [17:40:15] (03CR) 10jerkins-bot: [V: 04-1] tlsproxy: restrict whole daemon to relevant NUMA node(s) [puppet] - 10https://gerrit.wikimedia.org/r/359489 (owner: 10BBlack) [17:40:26] mutante: i got a question you may not know the asnwer, but do you know in site.pp for icinga what the ores worker stuff is all about, if not i can ask other people. [17:40:33] either way, jenkins-slave-01 is stretch and lgtm [17:40:40] but what is with the zuul gearman alert there now [17:40:55] mutante: iirc that just means zuul is being slower than normal [17:41:55] mutante that means someone sent alot of jobs to zuul [17:41:55] Zppix: site.pp for icinga? do you mean that puppet class that has the ores monitoring? [17:42:02] yes mutante sorry [17:42:05] Zppix: paladox: ok [17:42:07] aha [17:42:08] https://integration.wikimedia.org/zuul/ [17:42:13] (03PS2) 10BBlack: tlsproxy: restrict whole daemon to relevant NUMA node(s) [puppet] - 10https://gerrit.wikimedia.org/r/359489 [17:42:13] nodepool is not working [17:42:19] mutante ^^ [17:42:36] RainbowSprinkles: nodepool is haivng issues fyi [17:43:03] Zppix: RainbowSprinkles isn't the nodepool maintainer [17:43:25] greg-g: I thought releng in gernal was... am i wrong? [17:43:32] general* [17:43:40] Zppix: so .. like i said. i know what puppet does to add the service checks to icinga. but doesnt really mean i know the grafana side of it [17:43:50] stuff is running, not sure why you are saying it's not working [17:44:15] I'm ignoring, things are running [17:44:15] (03CR) 10BBlack: [C: 032] numa.rb: add device_to_node data [puppet] - 10https://gerrit.wikimedia.org/r/359487 (owner: 10BBlack) [17:44:21] Zppix: so there is existing monitoring, that checks HTTP on the worker nodes. i know about that [17:44:22] it just started working [17:44:33] Zppix: but that is seprate from 5xx checking in the graphs [17:44:35] greg-g: nevermind its fine now... weird [17:44:55] greg-g: i asked because of "PROBLEM - Work requests waiting in Zuul Gearman server" [17:45:40] mutante: alright thanks for the info again! [17:45:41] maybe something restarted if it just came back [17:47:14] Zppix: if you look at "class icinga::monitor::ores", you will see some "@monitoring::host" those are to add hosts to icinga [17:47:39] Zppix: then you will see some "monitoring::service" these are adding service checks on host(s) [17:48:04] RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] [17:48:14] the warning/PROBLEM was just that there was a lot of instantaneous backlog (faidon's patches). nothing was actually wrong with the software [17:48:14] mutante: is there a list of supported checks anywhere that you know of, im not familar with icinga prod runs compared to the icinga2 instance i help maintain [17:48:15] they have a check_command => that is what it is actually running. Like "check_ores_workers", then you search for that check command [17:48:17] or the processes [17:48:24] PROBLEM - MegaRAID on ms-be1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [17:48:25] ACKNOWLEDGEMENT - MegaRAID on ms-be1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T168081 [17:48:28] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1001 - https://phabricator.wikimedia.org/T168081#3355528 (10ops-monitoring-bot) [17:48:32] Zppix: mutante paladox ^ [17:48:35] what I said [17:48:40] greg-g: got it, cool:) just making sure since we also merged something [17:48:48] * greg-g nods [17:48:48] ok thanks [17:49:01] sorry, I was just wrapping up a 1:1 [17:49:21] greg-g: sorry, then i just know in the past nodepool had problems and i wanted someone to know before it got out of hand. [17:50:00] Zppix: for one, we install nagios-plugin-* packages from Debian, so part of "list of supported checks" is looking at their contents, f.e. dpkg -L [17:50:02] sure, but blind statements like "nodepool isn't working" when it was aren't helpful to anyone [17:50:31] Zppix: the other part is our own custom commands.. they can be bash scripts, or python or anything really [17:51:04] Zppix: grep -r "check_command" * in puppet repo gives you an idea [17:51:09] mutante: alright, thanks ill have to look into that myself... thanks [17:51:40] (03PS3) 10BBlack: tlsproxy: restrict whole daemon to relevant NUMA node(s) [puppet] - 10https://gerrit.wikimedia.org/r/359489 [17:54:02] Zppix: and finally if it's not in packages and you don't want to re-invent the wheel.. there is https://exchange.nagios.org/ [17:54:56] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3355546 (10Benoit_Rochon) Do I need to contact anyone to configure Wikidata to have interwiki working, or this is will b... [18:03:18] (03Restored) 10Paladox: redis: add support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/354041 (owner: 10Paladox) [18:03:24] PROBLEM - Disk space on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:24] PROBLEM - swift-account-server on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:24] PROBLEM - swift-container-server on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:24] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:34] PROBLEM - swift-object-replicator on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:34] PROBLEM - swift-object-updater on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:44] PROBLEM - swift-account-auditor on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:54] PROBLEM - dhclient process on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:54] PROBLEM - configured eth on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:54] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:54] PROBLEM - swift-container-auditor on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:54] PROBLEM - swift-container-updater on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:03:55] PROBLEM - swift-account-replicator on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:14] PROBLEM - swift-object-auditor on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:14] PROBLEM - swift-container-replicator on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:14] PROBLEM - swift-account-reaper on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:14] PROBLEM - DPKG on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:14] PROBLEM - MD RAID on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:14] PROBLEM - salt-minion processes on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:15] PROBLEM - Check size of conntrack table on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:15] PROBLEM - swift-object-server on ms-be1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:08:45] !log ms-be1001 - powercycling crashed server - "[14076481.245487] general protection fault: 0000 [#4] SMP [18:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:24] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:10:38] !log ms-be1001 - The following VDs are missing: 09 [18:10:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:02] (03Draft1) 10Paladox: contint: Disable hhvm temporarily on stretch [puppet] - 10https://gerrit.wikimedia.org/r/359492 [18:11:05] (03PS2) 10Paladox: contint: Disable hhvm temporarily on stretch [puppet] - 10https://gerrit.wikimedia.org/r/359492 [18:13:01] (03PS3) 10Paladox: contint: Disable hhvm temporarily on stretch [puppet] - 10https://gerrit.wikimedia.org/r/359492 (https://phabricator.wikimedia.org/T166611) [18:14:32] !log ms-be1001: did not change config, tried again, now detected 13 drives again, coming back [18:14:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:14:44] RECOVERY - configured eth on ms-be1001 is OK: OK - interfaces up [18:14:44] RECOVERY - dhclient process on ms-be1001 is OK: PROCS OK: 0 processes with command name dhclient [18:14:45] RECOVERY - very high load average likely xfs on ms-be1001 is OK: OK - load average: 0.69, 0.15, 0.05 [18:14:45] RECOVERY - swift-container-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:14:45] RECOVERY - swift-container-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [18:14:54] RECOVERY - swift-account-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [18:14:54] RECOVERY - Host ms-be1001 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [18:15:04] RECOVERY - swift-container-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [18:15:04] RECOVERY - swift-account-reaper on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [18:15:04] RECOVERY - DPKG on ms-be1001 is OK: All packages OK [18:15:04] RECOVERY - swift-object-auditor on ms-be1001 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [18:15:04] RECOVERY - MD RAID on ms-be1001 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [18:15:05] RECOVERY - Check size of conntrack table on ms-be1001 is OK: OK: nf_conntrack is 0 % full [18:15:05] RECOVERY - salt-minion processes on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:15:15] RECOVERY - swift-object-server on ms-be1001 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [18:15:15] RECOVERY - Disk space on ms-be1001 is OK: DISK OK [18:15:15] RECOVERY - swift-account-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [18:15:15] RECOVERY - swift-container-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [18:15:24] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1001 is OK: OK ferm input default policy is set [18:15:34] RECOVERY - swift-object-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [18:15:34] RECOVERY - swift-object-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:15:34] RECOVERY - swift-account-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [18:18:24] RECOVERY - MegaRAID on ms-be1001 is OK: OK: optimal, 13 logical, 13 physical [18:22:09] (03PS7) 10Paladox: redis: add support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/354041 [18:22:14] (03PS8) 10Paladox: redis: add support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/354041 [18:22:45] (03CR) 10Dzahn: [C: 031] "though i don't know if it will really just skip a few tests and not have another dependency issue" [puppet] - 10https://gerrit.wikimedia.org/r/359492 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [18:23:02] (03CR) 10Paladox: "> though i don't know if it will really just skip a few tests and not" [puppet] - 10https://gerrit.wikimedia.org/r/359492 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [18:40:15] (03CR) 10Dzahn: [C: 031] "afaict. seems reasonable to assume the same additional settings as on jessie, at least at first" [puppet] - 10https://gerrit.wikimedia.org/r/354041 (owner: 10Paladox) [18:53:05] 10Operations, 10Recommendation-API, 10Service-deployment-requests, 10Services (doing), 10User-mobrovac: New Service Request: recommendation-api - https://phabricator.wikimedia.org/T167664#3355689 (10mobrovac) @akosiaris let's talk ETAs :) [19:01:41] 10Operations, 10ops-codfw, 10ORES, 10Scoring-platform-team, 10Patch-For-Review: rack/setup/install ores2001-2009 - https://phabricator.wikimedia.org/T165170#3355718 (10Halfak) [19:05:04] 10Operations, 10ops-requests: Server admin log replication to social networking - https://phabricator.wikimedia.org/T82205#3355730 (10Dzahn) [19:10:42] mutante: re:ms-be1001, it's listed for decom, see T166489 [19:10:43] T166489: Decommission ms-be1001 - ms-be1012 - https://phabricator.wikimedia.org/T166489 [19:11:47] 10Operations, 10Patch-For-Review, 10Technical-Debt: Supersede RT tickets references - https://phabricator.wikimedia.org/T165733#3355736 (10Dzahn) RT #4566 was "status of wikipedia.ee and wikimedia.ee" it was just me asking about the status and then in the end " i have just been bold and redirected wikipedia.... [19:12:15] volans: thanks, ok [19:12:32] (03PS2) 10EBernhardson: [WIP] Add ltr-query 0.1.1 snapshot [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/359359 [19:13:10] mutante: could be that was already down and was just an expired downtime? also T168081 could probably be closed as won't fix [19:13:10] T168081: Degraded RAID on ms-be1001 - https://phabricator.wikimedia.org/T168081 [19:13:32] volans: no, it actually crashed, i saw it on console [19:13:34] !log restarting elasticesarch on relforge to pick up new ltr-query plugin version [19:13:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:45] volans: ok, closing it [19:14:00] mutante: ok, thanks! [19:14:29] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1001 - https://phabricator.wikimedia.org/T168081#3355746 (10Dzahn) 05Open>03declined won't fix because it's already scheduled for decom in T166489 [19:16:45] 10Operations, 10User-fgiunchedi: Decommission ms-be1001 - ms-be1012 - https://phabricator.wikimedia.org/T166489#3297629 (10Dzahn) can they be shutdown at this point? ms-be1001 had a hardware fail today and i powercycled it before realizing they are already scheduled for decom. could i continue by shutting th... [19:18:09] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3355774 (10Bawolff) >>! In T167400#3354974, @BBlack wrote: >>>! In T167400#3350622, @Bawolff wrote: >>> Wikipedia Zero traffic is tied to... [19:25:01] 10Operations, 10Patch-For-Review, 10Technical-Debt: Supersede RT tickets references - https://phabricator.wikimedia.org/T165733#3355796 (10Dzahn) > RT #4579 is actually here T82205 and i made it public Eh, that was #4679 in the original comment, nevermind. but anywas RT #4679 is "wekipedia.com - unconfigur... [19:31:15] 10Operations, 10MW-1.30-release-notes, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Atikamekw Wikipedia - https://phabricator.wikimedia.org/T167714#3355819 (10Reedy) >>! In T167714#3354884, @Amire80 wrote: >>>! In T167714#3350157, @Reedy wrote: >> The wiki is now for... [19:31:30] 10Operations: make labs instance's /var/log/ readable by default - https://phabricator.wikimedia.org/T80830#3355821 (10Dzahn) [19:34:59] 10Operations, 10Patch-For-Review, 10Technical-Debt: Supersede RT tickets references - https://phabricator.wikimedia.org/T165733#3355833 (10Dzahn) I found 2 more: RT-2712 (modules/base/manifests/syslogs.pp) is T80830 RT-2121 ( 6 x in modules/mediawiki/files/apache/sites/main.conf) is T80309 Thew few left... [19:36:23] (03PS1) 10Dzahn: replace the last RT references with Phab links [puppet] - 10https://gerrit.wikimedia.org/r/359504 (https://phabricator.wikimedia.org/T165733) [19:41:57] (03PS2) 10Dzahn: replace the last RT references with Phab links [puppet] - 10https://gerrit.wikimedia.org/r/359504 (https://phabricator.wikimedia.org/T165733) [19:44:35] (03CR) 10Dzahn: [C: 032] "comments-only" [puppet] - 10https://gerrit.wikimedia.org/r/359504 (https://phabricator.wikimedia.org/T165733) (owner: 10Dzahn) [19:45:58] 10Operations, 10Patch-For-Review, 10Technical-Debt: Supersede RT tickets references - https://phabricator.wikimedia.org/T165733#3355841 (10Dzahn) 05Open>03Resolved Alright, this is as good as it gets, i claim resolved. [19:47:31] 10Operations: investigate shared inbox options - https://phabricator.wikimedia.org/T146746#3355843 (10Dzahn) I got positive feedback from at least 3 people using the new Google group. I think i can safely say we are staying with this as thew new solution. We can turn off the email flow into RT i think. [19:48:22] 10Operations: investigate shared inbox options - https://phabricator.wikimedia.org/T146746#3355845 (10Dzahn) p:05High>03Normal [19:54:43] !log disabled cluster 2fa for Chrissymad for T168064 (confirmed by email) [19:54:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:52] T168064: Possible issue with 2FA tokens - https://phabricator.wikimedia.org/T168064 [20:11:40] (03PS1) 10Jforrester: Enable OOjs UI buttons on EditPage for plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359514 (https://phabricator.wikimedia.org/T162849) [20:25:13] 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 2 others: runUpdate.sh script in wikidata stand-alone has abruptly started incurring numerous 429 errors. - https://phabricator.wikimedia.org/T168019#3355978 (10Smalyshev) > In the specific case of 429 responses, it should definitely not crash but rathe... [20:44:09] mwlog1001 should be added to https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints [20:54:04] PROBLEM - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [3000.0] [21:00:24] PROBLEM - HHVM rendering on mw2121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:01:14] RECOVERY - HHVM rendering on mw2121 is OK: HTTP OK: HTTP/1.1 200 OK - 78207 bytes in 0.296 second response time [21:03:04] RECOVERY - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [21:55:04] PROBLEM - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [3000.0] [22:03:04] RECOVERY - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [22:25:11] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Zuul: Migrate zuul-server behind systemd service - https://phabricator.wikimedia.org/T167845#3356215 (10greg) (meta: moving to #releng-external as this is not being worked on by... [22:27:28] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Zuul: Migrate zuul-server behind systemd service - https://phabricator.wikimedia.org/T167845#3356225 (10Paladox) Your welcome :) [23:06:32] (03CR) 10BryanDavis: "LGTM. There could probably be a follow up patch after this is seen to work that cleans up the oath_dbpass from hiera and removes role::mar" [puppet] - 10https://gerrit.wikimedia.org/r/359152 (https://phabricator.wikimedia.org/T167961) (owner: 10Marostegui) [23:07:28] (03CR) 10BryanDavis: "I think Andrew wanted to be sure to be around when this rolls out to check for unexpected weirdness, so please coordinate with him." [puppet] - 10https://gerrit.wikimedia.org/r/359152 (https://phabricator.wikimedia.org/T167961) (owner: 10Marostegui) [23:30:27] hello! anyone able to help me add a user to a gerrit group? https://gerrit.wikimedia.org/r/#/admin/groups/539,members mutante ? [23:30:42] i need to add barrybrowsertestbot [23:31:04] but i have no idea how to make that happen [23:32:21] I don't have the needed rights [23:33:55] 10Operations, 10media-storage: uploads.wm.o commons archive 20170615014039!Adsalm.webm visible despite file deleted on Commons - https://phabricator.wikimedia.org/T168002#3352611 (10Younes19956) Can i ask You ? how to delete any file is Dangerous on wikimedia commons ? [23:38:58] 10Operations, 10media-storage: uploads.wm.o commons archive 20170615014039!Adsalm.webm visible despite file deleted on Commons - https://phabricator.wikimedia.org/T168002#3352611 (10Reedy) Interestingly, I just did https://commons.wikimedia.org/wiki/File:Adsalm.webm?action=purge and now https://upload.wikimedi... [23:46:45] 10Operations, 10media-storage: uploads.wm.o commons archive 20170615014039!Adsalm.webm visible despite file deleted on Commons - https://phabricator.wikimedia.org/T168002#3356371 (10zhuyifei1999) >>! In T168002#3356365, @Reedy wrote: > Interestingly, I just did https://commons.wikimedia.org/wiki/File:Adsalm.we...