[00:46:12] (03CR) 10BryanDavis: [C: 031] "comments/help text only" [puppet] - 10https://gerrit.wikimedia.org/r/367004 (owner: 10Krinkle) [00:56:04] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate 30x increase in Jobrunner errors - https://phabricator.wikimedia.org/T171371#3464398 (10bd808) > Which probably means it was merged without deploying, and then accidentally rolled out as part of preparing 1.30.0-wmf.9 It was [[https://tool... [00:59:47] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate 30x increase in Jobrunner errors - https://phabricator.wikimedia.org/T171371#3464400 (10bd808) Broadly this looks a whole lot like {T87360} from 2.5 years ago. There are cleanup steps documented in that task for how we made the jobrunner fi... [01:11:15] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate 30x increase in Jobrunner errors - https://phabricator.wikimedia.org/T171371#3464403 (10bd808) >>! In T171371#3464398, @bd808 wrote: >> Which probably means it was merged without deploying, and then accidentally rolled out as part of prepar... [02:29:54] 10Operations, 10MediaWiki-extensions-Scribunto: Build and push a new hhvm-luasandbox package - https://phabricator.wikimedia.org/T171166#3464420 (10tstarling) [03:05:00] !log l10nupdate@tin LocalisationUpdate failed: git pull of extensions failed [03:05:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:11:06] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=7162.70 Read Requests/Sec=3939.00 Write Requests/Sec=412.90 KBytes Read/Sec=40463.60 KBytes_Written/Sec=4710.40 [04:16:07] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=1.70 Read Requests/Sec=0.00 Write Requests/Sec=0.80 KBytes Read/Sec=0.00 KBytes_Written/Sec=18.80 [04:26:58] (03CR) 10Subramanya Sastry: [C: 031] visualdiff: Remove manually built `uprightdiff` [puppet] - 10https://gerrit.wikimedia.org/r/367131 (owner: 10Legoktm) [05:40:34] !log Configure and start s2 replication on labsdb1011 - T153743 [05:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:40:46] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [05:57:20] 10Operations, 10DBA, 10Performance-Team, 10Availability (Multiple-active-datacenters): Perform testing for TLS effect on connection rate - https://phabricator.wikimedia.org/T171071#3453087 (10Marostegui) Hey, I would be nice to do a test with MariaDB 10.0 and 10.1 if possible, to see if there are any reg... [05:59:45] (03PS1) 10Marostegui: s2.hosts: Add labsdb1011 to s2 list of hosts [software] - 10https://gerrit.wikimedia.org/r/367357 (https://phabricator.wikimedia.org/T153743) [06:07:01] (03CR) 10Marostegui: [C: 032] s2.hosts: Add labsdb1011 to s2 list of hosts [software] - 10https://gerrit.wikimedia.org/r/367357 (https://phabricator.wikimedia.org/T153743) (owner: 10Marostegui) [06:07:40] (03Merged) 10jenkins-bot: s2.hosts: Add labsdb1011 to s2 list of hosts [software] - 10https://gerrit.wikimedia.org/r/367357 (https://phabricator.wikimedia.org/T153743) (owner: 10Marostegui) [06:11:16] 10Operations, 10ops-eqiad: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T171232#3464523 (10Marostegui) @Cmjohnson remember that there are some hosts totally ready for you to decommission them which disks could be use to replace this faulty disk if needed: T166486 T164702 T163778 Thanks! [06:22:40] (03PS3) 10Jcrespo: mariadb.service: Set start/stop timeout to infinity [software] - 10https://gerrit.wikimedia.org/r/365255 [06:25:53] (03CR) 10Marostegui: [C: 031] mariadb.service: Set start/stop timeout to infinity [software] - 10https://gerrit.wikimedia.org/r/365255 (owner: 10Jcrespo) [06:28:06] (03CR) 10Jcrespo: [C: 032] mariadb.service: Set start/stop timeout to infinity [software] - 10https://gerrit.wikimedia.org/r/365255 (owner: 10Jcrespo) [06:30:53] 10Operations, 10Goal: Improve database backups' coverage, monitoring and data recovery time (part 1) (tracking) - https://phabricator.wikimedia.org/T169658#3464553 (10jcrespo) [06:34:52] (03PS1) 10Jcrespo: dblists: Updates to manual database lists for dbstore2002 changes [software] - 10https://gerrit.wikimedia.org/r/367358 (https://phabricator.wikimedia.org/T171321) [06:36:48] (03PS2) 10Jcrespo: dblists: Update manual database lists for dbstore2002 changes [software] - 10https://gerrit.wikimedia.org/r/367358 (https://phabricator.wikimedia.org/T171321) [06:45:14] !log installing apache security updates on appserver canaries [06:45:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:18] (03CR) 10Jcrespo: [C: 032] "@Krinkle I will merge as is (as the scope of this ticket is limited) and you can merge do any additional followup later." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) (owner: 10Reedy) [06:52:08] (03CR) 10Jcrespo: [C: 032] Move some trailing ] onto newlines to make more balanced [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366839 (owner: 10Reedy) [06:52:21] (03PS2) 10Jcrespo: Move some trailing ] onto newlines to make more balanced [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366839 (owner: 10Reedy) [06:52:41] (03Merged) 10jenkins-bot: Fix up some file indenting broken by my phpcs changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) (owner: 10Reedy) [06:52:53] (03CR) 10jenkins-bot: Fix up some file indenting broken by my phpcs changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) (owner: 10Reedy) [06:57:02] 10Operations, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3464610 (10MoritzMuehlenhoff) These have been reimaged with jessie, but I'm wondering if it would be better to use stre... [06:58:35] (03PS3) 10Jcrespo: Move some trailing ] onto newlines to make more balanced [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366839 (owner: 10Reedy) [07:05:38] (03PS6) 10Jcrespo: db-readonly: Change the read only message for something generic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356584 (https://phabricator.wikimedia.org/T166345) [07:06:39] (03PS7) 10Jcrespo: db-readonly: Change the read only message for something generic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356584 (https://phabricator.wikimedia.org/T166345) [07:07:16] 10Operations, 10DBA, 10MediaWiki-extensions-ClickTracking: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3464633 (10Marostegui) [07:07:32] (03CR) 10jenkins-bot: Move some trailing ] onto newlines to make more balanced [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366839 (owner: 10Reedy) [07:09:00] 10Operations, 10DBA, 10MediaWiki-extensions-ClickTracking: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#1737574 (10Marostegui) old_growth status: ``` Only exists on s1 ``` [07:09:20] 10Operations, 10DBA, 10MediaWiki-extensions-ClickTracking: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3464641 (10Marostegui) [07:10:27] (03CR) 10Jcrespo: [C: 04-1] db-readonly: Change the read only message for something generic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356584 (https://phabricator.wikimedia.org/T166345) (owner: 10Jcrespo) [07:12:34] 10Operations, 10DBA, 10MediaWiki-extensions-ClickTracking: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3464658 (10Marostegui) click_tracking status: ``` Exists on: s1, s2, s3, s4, s5, s6, s... [07:13:06] 10Operations, 10DBA, 10MediaWiki-extensions-ClickTracking: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3464659 (10Marostegui) [07:13:22] !log jynus@tin Synchronized wmf-config/db-codfw.php: Fix indenting (duration: 00m 45s) [07:13:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:33] !log jynus@tin Synchronized wmf-config/StartProfiler.php: Fix indenting (duration: 00m 43s) [07:14:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:19] 10Operations, 10DBA, 10MediaWiki-extensions-ClickTracking: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3464660 (10Marostegui) click_tracking_user_properties status: ``` Exists on: s1, s2, s... [07:15:26] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Fix indenting (duration: 00m 43s) [07:15:32] 10Operations, 10DBA, 10MediaWiki-extensions-ClickTracking: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3464661 (10Marostegui) [07:15:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:17:48] !log Rename table old_growth on db1089 - T115982 [07:17:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:17:58] T115982: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982 [07:19:34] 10Operations, 10DBA, 10MediaWiki-extensions-ClickTracking: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3464672 (10Marostegui) I have renamed old_growth table on db1089 and will leave it lik... [07:21:44] (03CR) 10Jcrespo: [C: 04-1] "I broke this on some rebasing- needs to be fixed with the new indents with the original message and comment change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356584 (https://phabricator.wikimedia.org/T166345) (owner: 10Jcrespo) [07:28:05] (03PS16) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [07:28:07] (03PS2) 10Jcrespo: mariadb: Disable mariadb main instance starting for multisource [puppet] - 10https://gerrit.wikimedia.org/r/366252 (https://phabricator.wikimedia.org/T169514) [07:29:05] (03CR) 10Jcrespo: [C: 04-1] "Not fixed yet" [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [07:30:38] (03CR) 10Marostegui: [C: 031] dblists: Update manual database lists for dbstore2002 changes [software] - 10https://gerrit.wikimedia.org/r/367358 (https://phabricator.wikimedia.org/T171321) (owner: 10Jcrespo) [07:31:19] (03CR) 10Jcrespo: [C: 032] mariadb: Disable mariadb main instance starting for multisource [puppet] - 10https://gerrit.wikimedia.org/r/366252 (https://phabricator.wikimedia.org/T169514) (owner: 10Jcrespo) [07:31:27] (03PS3) 10Jcrespo: mariadb: Disable mariadb main instance starting for multisource [puppet] - 10https://gerrit.wikimedia.org/r/366252 (https://phabricator.wikimedia.org/T169514) [07:42:36] (03PS1) 10Jcrespo: mariadb-multiinstance: Fix missing header on override [puppet] - 10https://gerrit.wikimedia.org/r/367361 (https://phabricator.wikimedia.org/T169514) [07:44:00] (03CR) 10Jcrespo: [C: 032] mariadb-multiinstance: Fix missing header on override [puppet] - 10https://gerrit.wikimedia.org/r/367361 (https://phabricator.wikimedia.org/T169514) (owner: 10Jcrespo) [07:48:31] (03PS5) 10Elukey: role::prometheus::apache_exporter: move to profiles [puppet] - 10https://gerrit.wikimedia.org/r/366830 [07:50:58] (03CR) 10Elukey: [C: 032] role::prometheus::apache_exporter: move to profiles [puppet] - 10https://gerrit.wikimedia.org/r/366830 (owner: 10Elukey) [07:51:12] (03CR) 10Jcrespo: [C: 032] dblists: Update manual database lists for dbstore2002 changes [software] - 10https://gerrit.wikimedia.org/r/367358 (https://phabricator.wikimedia.org/T171321) (owner: 10Jcrespo) [07:51:38] (03PS2) 10Jcrespo: mariadb: Pool db2072 with low load as s1 main traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365285 (https://phabricator.wikimedia.org/T170662) [07:59:01] (03PS4) 10Giuseppe Lavagetto: rake: new rakefile specifically for CI [puppet] - 10https://gerrit.wikimedia.org/r/366591 (https://phabricator.wikimedia.org/T166888) [08:00:26] (03PS1) 10Elukey: role::prometheus::hhmv_exporter: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/367362 [08:04:07] (03PS3) 10Gehel: Fix service dependency name for update service [puppet] - 10https://gerrit.wikimedia.org/r/366989 (https://phabricator.wikimedia.org/T168918) (owner: 10Smalyshev) [08:06:03] (03CR) 10Gehel: [C: 032] Fix service dependency name for update service [puppet] - 10https://gerrit.wikimedia.org/r/366989 (https://phabricator.wikimedia.org/T168918) (owner: 10Smalyshev) [08:06:14] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/7137/ looks good (mendelevium/bohrium ended un in among the test host by mistake)" [puppet] - 10https://gerrit.wikimedia.org/r/367362 (owner: 10Elukey) [08:08:13] (03CR) 10Jcrespo: "As a reminder, @Herron, this is blocked on you to provide the right information." [puppet] - 10https://gerrit.wikimedia.org/r/365035 (https://phabricator.wikimedia.org/T170158) (owner: 10Jcrespo) [08:14:16] PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[apache2],Package[tzdata] [08:18:34] (03CR) 10Ladsgroup: "This one increases number of dispatchers from three to four." [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [08:24:53] 10Operations, 10Commons: Backend fetch failed - https://phabricator.wikimedia.org/T171421#3464863 (10zhuyifei1999) Works for me though [08:26:24] 10Operations, 10Commons: Backend fetch failed - https://phabricator.wikimedia.org/T171421#3464866 (10zhuyifei1999) (not sure if this is #traffic or #media-storage or #thumbor) [08:27:48] (03PS17) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [08:28:59] (03CR) 10jerkins-bot: [V: 04-1] prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [08:29:04] !log restart thumbor on thumbor1001 temporarily without memory cgroup limitations [08:29:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:07] (03PS18) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [08:38:09] (03Abandoned) 10Marostegui: db-eqiad.php: Fix indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366804 (owner: 10Marostegui) [08:42:36] RECOVERY - puppet last run on mw2228 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [08:44:57] (03CR) 10Volans: [C: 031] "Indeed" [puppet] - 10https://gerrit.wikimedia.org/r/366525 (https://phabricator.wikimedia.org/T129222) (owner: 10Filippo Giunchedi) [08:47:04] (03CR) 10Volans: [C: 031] "Indeed, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/366876 (owner: 10Jcrespo) [08:55:43] (03CR) 10Jcrespo: "Not sure if the template path will work: https://puppet-compiler.wmflabs.org/compiler02/7138/dbstore2002.codfw.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [09:04:26] !log restart thumbor on thumbor1004 with MemoryLimit=8G [09:04:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:13:48] (03PS1) 10Elukey: Reduce the DNS queries for the statsd domain [debs/logster] - 10https://gerrit.wikimedia.org/r/367370 (https://phabricator.wikimedia.org/T171318) [09:15:04] 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3465056 (10elukey) Opened also an issue to upstream: https://github.com/etsy/logster/issues/103 [09:16:36] PROBLEM - puppet last run on mw2238 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[apache2] [09:21:50] (03Draft2) 10MarcoAurelio: Allow contentadmin/sysop to configure blocking AbuseFilters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367369 [09:27:46] (03CR) 10MarcoAurelio: "Some questions I've got." (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367369 (owner: 10MarcoAurelio) [09:27:53] (03Abandoned) 10Elukey: Re-enable persistent connection to Redis for jobrunners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/351854 (https://phabricator.wikimedia.org/T125735) (owner: 10Elukey) [09:30:39] !log uploaded openjdk-8 8u145-b15 to apt.wikimedia.org/jessie-wikimedia [09:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:56] RECOVERY - puppet last run on mw2238 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [09:45:02] 10Operations: Integrate jessie 8.9 point release - https://phabricator.wikimedia.org/T171452#3465236 (10MoritzMuehlenhoff) [09:46:05] 10Operations: Integrate stretch 9.1 point release - https://phabricator.wikimedia.org/T171453#3465249 (10MoritzMuehlenhoff) [09:47:28] (03PS1) 10Filippo Giunchedi: thumbor: bump MemoryLimit to 15% [puppet] - 10https://gerrit.wikimedia.org/r/367373 (https://phabricator.wikimedia.org/T121388) [09:53:41] 10Operations, 10LDAP-Access-Requests, 10Wikidata-Sprint: Add "chrisneuroth" to wmde LDAP group - https://phabricator.wikimedia.org/T170552#3465312 (10christophneuroth) a:03christophneuroth [09:53:49] (03PS2) 10Elukey: role::prometheus::hhmv_exporter: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/367362 [09:55:18] (03CR) 10Elukey: [C: 032] role::prometheus::hhmv_exporter: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/367362 (owner: 10Elukey) [09:57:48] 10Operations, 10LDAP-Access-Requests, 10Wikidata-Sprint: Add "chrisneuroth" to wmde LDAP group - https://phabricator.wikimedia.org/T170552#3465338 (10christophneuroth) a:05christophneuroth>03None [09:59:15] 10Operations, 10LDAP-Access-Requests, 10Wikidata-Sprint: Add "chrisneuroth" to wmde LDAP group - https://phabricator.wikimedia.org/T170552#3435056 (10christophneuroth) @Addshore sorry for the slow response. NDA should be good now :) [09:59:56] 10Operations, 10Beta-Cluster-Infrastructure, 10VPS-Projects, 10Release-Engineering-Team (Kanban), and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3465349 (10hashar) [10:01:16] (03CR) 10Filippo Giunchedi: "Testing on 1001 and 1004 showed pybal being able to fetch successfully much more often than 1002 and 1003" [puppet] - 10https://gerrit.wikimedia.org/r/367373 (https://phabricator.wikimedia.org/T121388) (owner: 10Filippo Giunchedi) [10:02:52] (03CR) 10Ema: [C: 031] thumbor: bump MemoryLimit to 15% [puppet] - 10https://gerrit.wikimedia.org/r/367373 (https://phabricator.wikimedia.org/T121388) (owner: 10Filippo Giunchedi) [10:06:44] (03PS1) 10Elukey: role::prometheus::memcached_exporter: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/367375 [10:07:35] (03PS2) 10Filippo Giunchedi: thumbor: bump MemoryLimit to 15% [puppet] - 10https://gerrit.wikimedia.org/r/367373 (https://phabricator.wikimedia.org/T121388) [10:09:16] PROBLEM - Host labvirt1015 is DOWN: PING CRITICAL - Packet loss = 100% [10:09:48] !log installing openjdk security updates on praseodymium/cerium/xenon [10:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:13] (03CR) 10Elukey: "pcc looks good: https://puppet-compiler.wmflabs.org/compiler02/7139/" [puppet] - 10https://gerrit.wikimedia.org/r/367375 (owner: 10Elukey) [10:12:29] (03CR) 10Ema: [C: 031] Reduce the DNS queries for the statsd domain [debs/logster] - 10https://gerrit.wikimedia.org/r/367370 (https://phabricator.wikimedia.org/T171318) (owner: 10Elukey) [10:12:48] (03CR) 10Filippo Giunchedi: [C: 032] thumbor: bump MemoryLimit to 15% [puppet] - 10https://gerrit.wikimedia.org/r/367373 (https://phabricator.wikimedia.org/T121388) (owner: 10Filippo Giunchedi) [10:13:13] (03CR) 10Elukey: [C: 032] Reduce the DNS queries for the statsd domain [debs/logster] - 10https://gerrit.wikimedia.org/r/367370 (https://phabricator.wikimedia.org/T171318) (owner: 10Elukey) [10:18:02] (03PS1) 10Elukey: Revert "Reduce the DNS queries for the statsd domain" [debs/logster] - 10https://gerrit.wikimedia.org/r/367377 [10:18:14] (03CR) 10Elukey: [V: 032 C: 032] Revert "Reduce the DNS queries for the statsd domain" [debs/logster] - 10https://gerrit.wikimedia.org/r/367377 (owner: 10Elukey) [10:20:31] damage done [10:20:32] sigh [10:21:46] (03PS19) 10Paladox: Zuul: Add systemd script for zuul [puppet] - 10https://gerrit.wikimedia.org/r/359016 (https://phabricator.wikimedia.org/T167833) [10:22:00] (03PS6) 10Paladox: Gerrit: Reveal the author in the title of the email [puppet] - 10https://gerrit.wikimedia.org/r/356645 (https://phabricator.wikimedia.org/T43608) [10:22:27] !log roll restart thumbor to apply new memory limits [10:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:42] (03PS4) 10Paladox: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) [10:24:33] (03CR) 10jerkins-bot: [V: 04-1] contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [10:25:21] (03PS5) 10Paladox: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) [10:53:15] (03PS1) 10Elukey: Reduce the DNS queries for the statsd domain [debs/logster] - 10https://gerrit.wikimedia.org/r/367382 (https://phabricator.wikimedia.org/T171318) [10:53:31] (03CR) 10Elukey: [C: 032] Reduce the DNS queries for the statsd domain [debs/logster] - 10https://gerrit.wikimedia.org/r/367382 (https://phabricator.wikimedia.org/T171318) (owner: 10Elukey) [10:57:44] 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3465437 (10elukey) Made a mess in the git repo for logster, tried to amend it as best as I could removing unnecessary cruft to avoid fu... [11:01:21] (03PS1) 10Elukey: Update changelog for version 0.11-2~jessie [debs/logster] - 10https://gerrit.wikimedia.org/r/367383 (https://phabricator.wikimedia.org/T171318) [11:02:07] (03PS2) 10Elukey: Update changelog for version 0.11-2~jessie [debs/logster] - 10https://gerrit.wikimedia.org/r/367383 (https://phabricator.wikimedia.org/T171318) [11:03:27] (03PS3) 10Elukey: Update changelog for version 0.11-2~jessie [debs/logster] - 10https://gerrit.wikimedia.org/r/367383 (https://phabricator.wikimedia.org/T171318) [11:03:58] sorry for the spam [11:04:02] (03PS4) 10Elukey: Update changelog for version 0.0.11-2~jessie [debs/logster] - 10https://gerrit.wikimedia.org/r/367383 (https://phabricator.wikimedia.org/T171318) [11:04:26] PROBLEM - DPKG on stat1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:04:58] ^ stat1006 is me [11:05:26] RECOVERY - DPKG on stat1006 is OK: All packages OK [11:10:25] 10Operations, 10Beta-Cluster-Infrastructure, 10VPS-Projects, 10Release-Engineering-Team (Kanban), and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3465468 (10hashar) 05Open>03Resolved a:03hashar I have removed faulty puppet classes,... [11:13:10] !log updates for jessie 8.8 and stretch 9.1 point updates [11:13:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:22] 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3465539 (10elukey) Diff between the current version on cp1008 and the new one: ``` elukey@copper:~/logster$ debdiff logster_0.0.10-1~j... [11:26:53] jouncebot: next [11:26:55] In 1 hour(s) and 33 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T1300) [11:27:41] phuedx: want to deploy your commit yourself today? just asking if you need some training :) hasharLunch [11:28:23] ... hasharLunch and I will be away next week, so you can deploy yourself if needed [11:28:59] asking since I have noticed "I can get the above deployed on Monday..." https://phabricator.wikimedia.org/T171325#3461428 [11:31:00] hasharLunch: I can do EU SWAT today, but it would be great if you could take a quick looks at the commits and +1 them, you have way more experience that I do [12:02:56] 10Operations, 10LDAP-Access-Requests, 10Wikidata-Sprint: Add "chrisneuroth" to wmde LDAP group - https://phabricator.wikimedia.org/T170552#3465589 (10Tobi_WMDE_SW) 05stalled>03Open NDA has been signed (see T170616#3465336), so this good to go now! [12:06:31] 10Operations, 10LDAP-Access-Requests, 10Wikidata-Sprint: Add "chrisneuroth" to wmde LDAP group - https://phabricator.wikimedia.org/T170552#3465598 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff I'll check the NDA status with the WMF Legal department tonight and add you to LDAP tomorrow. [12:22:28] (03PS1) 10Aude: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367391 [12:35:02] (03CR) 10Gehel: [C: 04-1] "This looks good (minor comment inline)." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/366170 (https://phabricator.wikimedia.org/T153856) (owner: 10Bearloga) [12:40:11] zeljkof: can i deploy my swat stuff first? (maybe a little early) [12:40:17] i can't stay around the entire hour [12:41:53] aude: sure [12:42:08] ok [12:42:42] i think nothing else is happening now... [12:43:20] jouncebot: next [12:43:20] In 0 hour(s) and 16 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T1300) [12:49:11] (03PS1) 10Ladsgroup: Turn on reading from the term_full_entity_id in testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367393 (https://phabricator.wikimedia.org/T165197) [12:51:51] 10Operations: Integrate stretch 9.1 point release - https://phabricator.wikimedia.org/T171453#3465731 (10MoritzMuehlenhoff) None of the packages removed for 9.1 were present in our environment. These are fully rolled out: adwaita-icon-theme apt c-ares phpunit pulseaudio [12:52:10] 10Operations: Integrate jessie 8.9 point release - https://phabricator.wikimedia.org/T171452#3465732 (10MoritzMuehlenhoff) None of the packages removed for 8.9 were present in our environment. These are fully rolled out: c-ares cfitsio debconf debootstrap w3m [12:55:27] (03CR) 10Ladsgroup: [C: 04-2] "We need to rebuild the table first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367393 (https://phabricator.wikimedia.org/T165197) (owner: 10Ladsgroup) [12:56:48] 10Operations: Replace nrpe 2.15 (& evaluate alternatives) - https://phabricator.wikimedia.org/T157853#3465748 (10MoritzMuehlenhoff) The stretch 9.1 point update now provides a version of nagios-nrpe which is compatible with older versions: https://packages.qa.debian.org/n/nagios-nrpe/news/20170715T221717Z.html... [12:57:48] jouncebot: next [12:57:48] In 0 hour(s) and 2 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T1300) [12:57:54] (y) [12:58:10] * Sagan was too lazy to search for the link [13:00:06] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T1300). Please do the needful. [13:00:06] phuedx, Sagan, MatmaRex, and aude: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:13] * Sagan is still here [13:00:14] please leave my stuff for last, i have to go run an errand. i'll be back in 30 minutes. sorry [13:00:28] MatmaRex: ok [13:00:29] k [13:00:30] o/ [13:00:41] I'm here, so if you want we can start with my small patches [13:00:41] hi [13:00:46] im doing mine first [13:00:49] hasharLunch: I can do the swat, unless you insist ;) [13:00:50] ok [13:00:58] testing on mwdebug [13:01:07] then I have time to remove one from the list :o [13:01:08] aude: ok, want to do the rest, or should I? ;) [13:01:12] my 2FA app was too slow [13:01:13] zeljkof: please do :] [13:01:30] o/ [13:01:50] zeljkof: i can't stay the entire swat so you can do the rest? [13:01:51] (03CR) 10Hashar: [C: 031] Allow flooders to remove themselves from the flood group on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367025 (https://phabricator.wikimedia.org/T171379) (owner: 10Urbanecm) [13:01:57] almost done [13:01:58] aude, MatmaRex In case that somebody of you wants to deploy one patch more: I removed one, so we only have 7 now [13:02:00] with mine [13:02:12] phuedx: no reverts of reverts today?! I am disappoint... [13:02:21] aude: sure, just joking :) [13:03:19] :) [13:03:46] zeljkof: sec [13:03:59] i can't revert revert all of today's changes if you'd like? ;) [13:04:49] !log aude@tin Synchronized php-1.30.0-wmf.10/extensions/Wikidata: Fix several Wikidata bugs (duration: 02m 10s) [13:04:57] checking again [13:05:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:08] phuedx: stack too deep... o.O [13:06:33] (03PS2) 10Aude: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367391 [13:06:53] phuedx: I don't see you in [13:06:54] (03CR) 10Aude: [C: 032] Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367391 (owner: 10Aude) [13:07:29] phuedx: in #wikimedia-releng, so asking here, not that CI is ready, if you (or anybody from the team) wants to pair on selenium/node, let me know [13:07:30] zeljkof: see me in what? [13:07:32] the cache epoch thing should be ok... we've done this before for wikidata [13:07:42] oh mibad! [13:07:47] changed irc client, i'll rejoin now [13:08:09] phuedx: no problemo, did not want to spam here, but this is the only channel you are in :) [13:08:14] aude: can I take over? [13:08:30] once i'm done the config patch [13:08:44] aude: sure, just ping me, will start reviewing patches... [13:08:47] should be quick [13:09:01] no problem, I have understood your comments as you are done [13:09:18] (03Merged) 10jenkins-bot: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367391 (owner: 10Aude) [13:09:23] phuedx: reviewing 366882 [13:09:30] (03CR) 10jenkins-bot: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367391 (owner: 10Aude) [13:10:14] phuedx: can it be tested at mwdebug? (once deployed there) [13:10:35] yes, i can double check the rate makes sense and nothing falls over [13:10:40] zeljkof: ^ [13:10:54] checking mine on mwdebug [13:11:02] phuedx: cool, will ping you as soon it is there, the commit looks ok to me [13:11:08] note that the "default" rate has been in production for months and this is a deviation from the norm [13:12:22] !log aude@tin Synchronized wmf-config/Wikibase.php: Bump cache epoch for wikidata (duration: 00m 43s) [13:12:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:42] zeljkof: i'm done [13:12:45] all looks good [13:12:56] aude: great, I am taking over the SWAT then [13:13:01] thanks :) [13:13:25] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366882 (https://phabricator.wikimedia.org/T171325) (owner: 10Phuedx) [13:13:37] zeljkof: when deploying the patch with the import sources, I'd check at medebug if the wiki is still up. once it's merged to prod a steward will check it for me [13:14:06] Sagan: ok, will ping you when ready, merging phuedx's commit now [13:14:10] ok :) [13:14:25] !log restarting elasticsearch on relforge for jmv upgrade [13:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:45] (03Merged) 10jenkins-bot: pagePreviews: Increase instrumentation sampling rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366882 (https://phabricator.wikimedia.org/T171325) (owner: 10Phuedx) [13:16:12] (03CR) 10jenkins-bot: pagePreviews: Increase instrumentation sampling rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366882 (https://phabricator.wikimedia.org/T171325) (owner: 10Phuedx) [13:16:44] phuedx: 366882 is at mwdebug1002, please test and let me know if I can push further [13:17:18] zeljkof: will do! [13:21:28] i'm here now [13:22:29] zeljkof: go, but i'll need to submit one for ruwiki too as i missed it :/ [13:22:32] MatmaRex: we are on the first commit, yours is fourth :) will ping you [13:22:38] as in lgtm [13:22:43] ^ zeljkof [13:22:46] phuedx: ok, want to submit it now? [13:22:52] or for the next swat? [13:22:59] phuedx: deploying... [13:23:01] zeljkof: on it and i'll put it on the back of the queue [13:23:11] phuedx: ok [13:23:12] if we miss it, then i'll get it deployed in the next window [13:23:21] don't want to delay other folk! [13:23:45] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:366882|pagePreviews: Increase instrumentation sampling rate (T171325)]] (duration: 00m 43s) [13:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:57] T171325: Increase sampling rate of page previews to 1% on ruwiki, huwiki, itwiki - https://phabricator.wikimedia.org/T171325 [13:24:06] phuedx: deployed, please check [13:24:11] on it [13:24:38] Sagan: reviewing 367025... [13:24:53] (03PS2) 10Zfilipin: Allow flooders to remove themselves from the flood group on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367025 (https://phabricator.wikimedia.org/T171379) (owner: 10Urbanecm) [13:25:19] zeljkof: :thumbsup: [13:25:30] Sagan: can you test it at mwdebug1002 (once there, in a minute or so) [13:25:39] zeljkof: yeah [13:25:47] phuedx: ? [13:25:58] change is fine in prod [13:26:02] oh, saw the emoji now 😅 [13:26:03] I will take a look if it's at Special:ListGroupRights, it should appear there [13:27:32] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367025 (https://phabricator.wikimedia.org/T171379) (owner: 10Urbanecm) [13:28:00] !log upgrading nagios-nrpe-server to 3.0.1-3+deb9u1 on all stretch hosts [13:28:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:20] (03PS1) 10Phuedx: pagePreviews: Increase i13n sampling rate for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367398 (https://phabricator.wikimedia.org/T171325) [13:29:44] (03Merged) 10jenkins-bot: Allow flooders to remove themselves from the flood group on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367025 (https://phabricator.wikimedia.org/T171379) (owner: 10Urbanecm) [13:29:57] (03CR) 10jenkins-bot: Allow flooders to remove themselves from the flood group on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367025 (https://phabricator.wikimedia.org/T171379) (owner: 10Urbanecm) [13:31:23] Sagan: 367025 is at mwdebug1002, please test and let me know if I can continue [13:31:41] zeljkof: looks good for me, you can proceed [13:31:55] Sagan: deploying... [13:31:57] reloading a already loaded special page is a good way to test fast :) [13:32:41] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:367025|Allow flooders to remove themselves from the flood group on zhwiki (171379)]] (duration: 00m 43s) [13:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:56] zeljkof: i've scheduled the follow-on change, ping me if we've got time [13:33:10] 10Operations: Replace nrpe 2.15 (& evaluate alternatives) - https://phabricator.wikimedia.org/T157853#3465912 (10faidon) Nice find! It tested it and indeed it works out of the box now. I removed 2.15-1+stretch1 from stretch-wikimedia and upgraded all stretch systems (with the exception of ms-be2024 which is down... [13:33:18] Sagan, phuedx, aude: a new error in logs "Warning: Invalid Font Weight", anything related to your patches? [13:33:27] phuedx: ok [13:33:30] not to mine [13:33:42] the error is dropping, so might be unrelated... [13:33:48] (dropping in numbers) [13:33:52] zeljkof: unrelated -- link to the error? [13:34:18] 10Operations, 10monitoring: Replace nrpe 2.15 (& evaluate alternatives) - https://phabricator.wikimedia.org/T157853#3465916 (10faidon) [13:35:33] phuedx: at the top here https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor [13:35:49] I mean, the top of the list of messages [13:36:14] it is falling and rising... might not be related to swat [13:36:42] no stack trace :/ [13:36:47] yeah [13:36:57] much helpful [13:37:32] 10Operations, 10ops-esams, 10DNS, 10Traffic, 10netops: eeden ethernet outage - https://phabricator.wikimedia.org/T146391#3465921 (10faidon) This hasn't happened in a long time, should we just resolve? [13:37:33] Sagan: reviewing 367334... [13:37:52] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367334 (https://phabricator.wikimedia.org/T171395) (owner: 10Urbanecm) [13:38:36] btw, why is such a change listed at gate-and-submit, and not at gate-and-submit-swat? [13:38:44] at zuul [13:39:19] Sagan: do we have such pipeline? [13:39:23] * zeljkof is looking [13:39:34] look at that... [13:39:34] I think the swat one is new [13:39:38] never noticed it before [13:39:41] 10Operations, 10DNS, 10Traffic: Monitor DNS delegations - https://phabricator.wikimedia.org/T171470#3465927 (10faidon) [13:39:45] not sure how to make a commit go there [13:39:54] hashar: I guess you know more? [13:40:07] the convention is to +2 the commit is "SWAT", but maybe there is something more [13:40:14] test-prio is also new [13:40:21] (03PS2) 10Zfilipin: Add some import sources for tawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367334 (https://phabricator.wikimedia.org/T171395) (owner: 10Urbanecm) [13:40:28] (03CR) 10Zfilipin: Add some import sources for tawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367334 (https://phabricator.wikimedia.org/T171395) (owner: 10Urbanecm) [13:40:34] (03CR) 10Zfilipin: [C: 032] Add some import sources for tawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367334 (https://phabricator.wikimedia.org/T171395) (owner: 10Urbanecm) [13:40:39] I guess some important things will automatically go to the cue, depending on repos etc [13:41:14] Sagan: did not notice 367334 needed to be reabased before doing +2, rebased now, a few minutes of delay, sorry... [13:41:23] np [13:41:59] well, the commit did end up in test-prio pipeline, at least something... :) [13:42:04] yeah :D [13:42:12] gate-and-submit-swat is apparently only for wmf.N branches [13:42:28] hm, ok, and why not for operations config? [13:42:32] (03PS5) 10Giuseppe Lavagetto: rake: new rakefile specifically for CI [puppet] - 10https://gerrit.wikimedia.org/r/366591 (https://phabricator.wikimedia.org/T166888) [13:42:36] and presumably it exists so that if someone merges a ton of normal patches at the same time as SWAT, those tests don't block tests for SWAT patches [13:42:38] MatmaRex: oh, then your commit should end up there [13:42:49] (03Merged) 10jenkins-bot: Add some import sources for tawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367334 (https://phabricator.wikimedia.org/T171395) (owner: 10Urbanecm) [13:42:57] Sagan: no one should be merging anything else in operations/mediawiki-config during SWAT, i guess [13:43:00] (03CR) 10jenkins-bot: Add some import sources for tawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367334 (https://phabricator.wikimedia.org/T171395) (owner: 10Urbanecm) [13:43:04] (03PS19) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [13:43:38] MatmaRex: I don't know if that problem still exists, but once there were changing from another repo (mediawiki) blocking a SWAT. but was long time ago, IIRC [13:43:53] Sagan: 367334 is at mwdebug1002, please test [13:44:34] zeljkof: ok, the wiki itself looks fine, no errors. please proceed, I will ask a steward to test it once it's fully there [13:44:43] Sagan: deploying [13:45:33] (03PS4) 10Filippo Giunchedi: puppetmaster: stop serving private via fileserver [puppet] - 10https://gerrit.wikimedia.org/r/366808 (https://phabricator.wikimedia.org/T79881) [13:45:35] (03PS3) 10Filippo Giunchedi: Don't show diffs for files with secret content [puppet] - 10https://gerrit.wikimedia.org/r/366806 (https://phabricator.wikimedia.org/T79881) [13:45:37] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:367334|Add some import sources for tawikisource (T171395)]] (duration: 00m 43s) [13:45:46] Shanmugamp7: can you take a look at the list of import sources, if wikis like itwikisource are listed there? [13:45:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:46] T171395: Enabling options to import from other language wikisource to Tamil wikisource - https://phabricator.wikimedia.org/T171395 [13:46:03] (at tawikisource) [13:46:20] Sagan: yesen, fr, bn and it is listed [13:46:25] Sagan: yeah, deployed [13:46:38] Shanmugamp7: thanks for testing :) [13:46:39] are* [13:46:40] (03CR) 10Filippo Giunchedi: [C: 032] puppetmaster: stop serving private via fileserver [puppet] - 10https://gerrit.wikimedia.org/r/366808 (https://phabricator.wikimedia.org/T79881) (owner: 10Filippo Giunchedi) [13:46:42] Sagan, Shanmugamp7: all good? [13:46:47] zeljkof: and thx for deploying [13:46:52] zeljkof: yeah [13:46:53] np [13:46:59] this "Warning: Invalid Font Weight" error is coming and going... not like [13:47:12] Sagan: thanks for deploying with #releng :) [13:47:17] (03PS6) 10Giuseppe Lavagetto: rake: new rakefile specifically for CI [puppet] - 10https://gerrit.wikimedia.org/r/366591 (https://phabricator.wikimedia.org/T166888) [13:48:27] MatmaRex: I will leave your commit for the end, since it is the only one for core [13:49:00] aude's commits are deployed, so phuedx's 367398 is next [13:49:15] (03PS2) 10Zfilipin: pagePreviews: Increase i13n sampling rate for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367398 (https://phabricator.wikimedia.org/T171325) (owner: 10Phuedx) [13:50:17] ok [13:50:17] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367398 (https://phabricator.wikimedia.org/T171325) (owner: 10Phuedx) [13:51:38] MatmaRex: different workflow for your commit, sorry, I rarely do core in swat, will have to find my notes :) [13:52:18] zeljkof: no problem :) [13:52:27] (03Merged) 10jenkins-bot: pagePreviews: Increase i13n sampling rate for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367398 (https://phabricator.wikimedia.org/T171325) (owner: 10Phuedx) [13:52:34] it's good that CI is not busy, no delays there, at least [13:52:38] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3466006 (10Andrew) [13:52:40] (03CR) 10jenkins-bot: pagePreviews: Increase i13n sampling rate for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367398 (https://phabricator.wikimedia.org/T171325) (owner: 10Phuedx) [13:52:51] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466021 (10Wong128hk) [13:53:20] (03CR) 10Daniel Kinzler: [C: 031] "Yes, we want to have more units, and the data looks fine." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362606 (https://phabricator.wikimedia.org/T168582) (owner: 10Smalyshev) [13:53:26] !log installing bind security updates (we're using client-side libs/tools only) [13:53:26] ACKNOWLEDGEMENT - Host labvirt1015 is DOWN: PING CRITICAL - Packet loss = 100% andrew bogott This box is very sick :( https://phabricator.wikimedia.org/T171473 [13:53:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:42] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466035 (10Wong128hk) [13:54:00] phuedx: 367398 is at mwdebug, please test [13:54:10] zeljkof: which mwdebug? [13:54:31] phuedx: oh, sorry, it's mwdebug1002 always, forgot to make it explicit [13:54:48] phuedx: https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#Test_Canary [13:54:51] (docs) [13:55:39] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466021 (10Marostegui) When do you want to attempt to rename this user? I am happy to monitor the DBs for you while this happens. Sometimes we have to ease some dur... [13:55:41] (03CR) 10Faidon Liambotis: [C: 031] librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) (owner: 10Filippo Giunchedi) [13:55:53] zeljkof: yeah. there are two ;) 1001 & 2 [13:55:55] iirc i always used 1001! [13:55:57] :) [13:56:00] zeljkof: lgtm [13:56:05] phuedx: deploying [13:56:16] phuedx: I always follow the docs :) [13:56:16] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3466042 (10Andrew) (I should note that there's no data of interest on that box -- reimaging is just fine) [13:57:08] !log zfilipin@tin scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details) [13:57:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:31] phuedx: uh oh 13:57:08 sync-file failed: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details) [13:57:43] (03PS2) 10Filippo Giunchedi: base: check and alert on free filesystem inodes [puppet] - 10https://gerrit.wikimedia.org/r/366525 (https://phabricator.wikimedia.org/T129222) [13:57:56] it's just one canary, so it might not be a real problem, looking at the logs [13:58:23] 13:57:08 Check 'Logstash Error rate for mwdebug1002.eqiad.wmnet' failed: ERROR: 22% OVER_THRESHOLD (Avg. Error rate: Before: 0.78, After: 10.00, Threshold: 7.79) [13:58:26] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466047 (10Marostegui) p:05Triage>03Normal [13:58:36] php-1.30.0-wmf.10/includes/utils/FileContentsHasher.php","line":57,"message":"PHP Warning: filemtime(): No such file or directory [13:58:39] bah [13:58:46] from /w/load.php [13:58:47] (03CR) 10Faidon Liambotis: [C: 04-1] Remove DNS records for unused IPs (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/366871 (owner: 10Ayounsi) [13:59:01] zeljkof: phuedx: looks like some file is not available to the resource loader [13:59:05] (03CR) 10Faidon Liambotis: [C: 031] base: check and alert on free filesystem inodes [puppet] - 10https://gerrit.wikimedia.org/r/366525 (https://phabricator.wikimedia.org/T129222) (owner: 10Filippo Giunchedi) [13:59:26] hashar: but how come it failed only on one canary? [13:59:34] it fails on mwdebug1002 [13:59:38] wat? [13:59:40] (03CR) 10Faidon Liambotis: [C: 031] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/366806 (https://phabricator.wikimedia.org/T79881) (owner: 10Filippo Giunchedi) [13:59:45] (03CR) 10Giuseppe Lavagetto: [C: 031] pybal: bind instrumentation TCP port to private addresses (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [13:59:49] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466050 (10Wong128hk) It is ready for now. Thx. [14:00:07] (03CR) 10Filippo Giunchedi: [C: 032] base: check and alert on free filesystem inodes [puppet] - 10https://gerrit.wikimedia.org/r/366525 (https://phabricator.wikimedia.org/T129222) (owner: 10Filippo Giunchedi) [14:00:07] try to scap pull mwdebug1002 and browse some content on it? [14:00:17] hashar: it's already there [14:00:36] or maybe it is a false alarm [14:01:00] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3466006 (10Cmjohnson) This could be a h/w issue. The h/w system event log shows this ecord: 1 Date/Time: 04/28/2017 19:37:34 Source: system Severity: Ok Description:... [14:01:22] hashar, phuedx: wee already deployed a very similar change earlier today, no problems https://gerrit.wikimedia.org/r/#/c/366882/ [14:01:53] hashar: this one just fixes a typo that had itwiki twice instead or ruwiki https://gerrit.wikimedia.org/r/#/c/367398/ [14:02:05] hasharLunch, zeljkof: browsing on mwdebug1002 [14:02:14] ahh [14:02:17] so yeah looks safe [14:02:31] hashar: should I just re-try the deploy? [14:02:41] looks like some unrelated errors spiked right during the swat canary check [14:02:45] which triggered the error rate [14:02:51] yeah I would rety [14:02:56] thcipriani|afk: around for one of those "one canary complains" situations? ;) [14:03:07] hashar: ok, re-trying the deploy [14:03:23] !log mwdebug1002 ran scap pull [14:03:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:57] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:367398|pagePreviews: Increase i13n sampling rate for ruwiki (T171325)]] (duration: 00m 43s) [14:03:57] hashar: what is more scary is this in logs "966 Warning: Invalid Font Weight" [14:04:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:08] T171325: Increase sampling rate of page previews to 1% on ruwiki, huwiki, itwiki - https://phabricator.wikimedia.org/T171325 [14:04:08] 10Operations, 10Commons, 10media-storage, 10monitoring: Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails - https://phabricator.wikimedia.org/T106937#3466092 (10chasemp) @fgiunchedi it depends on what we want to watch move. We already have a number of emulated/chrome checks that could... [14:04:14] phuedx, hashar: no trouble this time [14:04:14] those come from pdf rendering iirc [14:04:23] phuedx: please check [14:04:51] MatmaRex: we are out of time, can you stay longer? or would you prefer to move the commit to another window? [14:05:42] zeljkof: i'm fine with either [14:05:43] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): rack/setup/install wdqs100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T171210#3466093 (10Gehel) @Cmjohnson now that I am back, do you need anything from me to move forward on this? [14:05:48] 10Puppet: Set puppet config_version to something referring to git - https://phabricator.wikimedia.org/T171477#3466094 (10fgiunchedi) [14:06:04] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466106 (10Marostegui) >>! In T171474#3466050, @Wong128hk wrote: > It is ready for now. Thx. There is a deployment going on right now - so let's wait a few minutes... [14:06:05] zeljkof: lgtm [14:06:09] zeljkof: it's not very high priority. but i'm around if you have time and want to finish it [14:06:25] phuedx: great, thanks for deploying with #releng ;) [14:06:50] MatmaRex: ok, will try to deploy, if I get stuck let's move to another window :) [14:07:23] !log extending EU SWAT until https://gerrit.wikimedia.org/r/#/c/367384/ is deployed [14:07:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:35] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): rack/setup/install wdqs100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T171210#3466107 (10Cmjohnson) @gehel no, not right now. Once they're racked and installed they will be turned over. They're i... [14:07:38] zeljkof: can you ping me here once the swat is done? [14:08:05] marostegui: will do! is there something urgent? MatmaRex says his commit is not urgent [14:08:23] zeljkof: Nope, not urgent, just an user rename, we can wait until you guys are done [14:08:35] marostegui: ok, thanks [14:08:41] zeljkof: hmm actually, i'd rather reschedule it [14:10:27] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3466109 (10Andrew) Thank you, Chris! This is new hardware and we can live without it... can we leave this in your hands to follow up with Dell? Is there any additional info you need? [14:10:58] MatmaRex: ok, sorry, I am a slow deployer, so your commit did not make it... [14:11:23] !log EU SWAT finished [14:11:24] zeljkof: no problem. i'll get it swatted in the evening [14:11:28] thanks :) [14:11:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:33] marostegui: we are done, you can take over [14:11:38] thanks! [14:12:00] MatmaRex: I have found my notes, so it would get done, sooner or later... ;) [14:12:09] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466116 (10Marostegui) You can now go ahead. Once you start the process, please paste here the meta url so I can follow the progress per wiki Thanks [14:12:16] PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2142030 [14:12:30] (03CR) 10Filippo Giunchedi: [C: 031] role::prometheus::memcached_exporter: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/367375 (owner: 10Elukey) [14:12:54] MatmaRex: the commit did end up in gate-and-submit-swat [14:13:02] still running... [14:13:19] zeljkof: if you removed the C+2, it won't get merged [14:13:30] MatmaRex: I did [14:13:42] I am surprised the jobs did not get aborted [14:13:48] zeljkof: yeah. no problem then. the job will finish, jenkins will give it a V+1 but it won't merge without a C+2 [14:14:27] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466146 (10Wong128hk) Thx a lot. The rename is started. https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/%E8%83%A1%E8%98%BF%E8%94%94 [14:14:57] (03CR) 10Hashar: "One could extend puppet-lint to ensure that whenever a file has content => secret, the show_diff is set to false." [puppet] - 10https://gerrit.wikimedia.org/r/366806 (https://phabricator.wikimedia.org/T79881) (owner: 10Filippo Giunchedi) [14:15:03] !log Global rename of Carrotkit - T171474 [14:15:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:21] T171474: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474 [14:15:28] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466156 (10Marostegui) >>! In T171474#3466146, @Wong128hk wrote: > Thx a lot. The rename is started. > > https://meta.wikimedia.org/wiki/Special:GlobalRenameProgre... [14:16:19] (03PS1) 10Ottomata: Use different consumer group for the mysql eventbus eventlogging consumer [puppet] - 10https://gerrit.wikimedia.org/r/367403 [14:17:17] (03CR) 10jerkins-bot: [V: 04-1] Use different consumer group for the mysql eventbus eventlogging consumer [puppet] - 10https://gerrit.wikimedia.org/r/367403 (owner: 10Ottomata) [14:17:55] 10Operations, 10Traffic, 10Patch-For-Review: Investigate nginx reload behavior - https://phabricator.wikimedia.org/T164579#3466168 (10ema) 05Open>03Resolved a:03ema Closing, the problem is known and there's no perfect solution (but one nginx reload a day is much better than one every hour!). [14:18:21] (03PS4) 10Ottomata: statistics::packages: Add libssl-dev and comments [puppet] - 10https://gerrit.wikimedia.org/r/366107 (https://phabricator.wikimedia.org/T152712) (owner: 10Bearloga) [14:18:52] (03PS2) 10Ottomata: Use different consumer group for the mysql eventbus eventlogging consumer [puppet] - 10https://gerrit.wikimedia.org/r/367403 [14:19:01] (03CR) 10Ottomata: [V: 032 C: 032] statistics::packages: Add libssl-dev and comments [puppet] - 10https://gerrit.wikimedia.org/r/366107 (https://phabricator.wikimedia.org/T152712) (owner: 10Bearloga) [14:19:46] (03PS3) 10Ottomata: Use different consumer group for the mysql eventbus eventlogging consumer [puppet] - 10https://gerrit.wikimedia.org/r/367403 [14:19:48] (03CR) 10Jcrespo: [C: 032] prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [14:20:05] (03PS20) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [14:20:46] (03CR) 10Ottomata: [C: 032] Use different consumer group for the mysql eventbus eventlogging consumer [puppet] - 10https://gerrit.wikimedia.org/r/367403 (owner: 10Ottomata) [14:20:48] (03CR) 10Ottomata: [V: 032 C: 032] Use different consumer group for the mysql eventbus eventlogging consumer [puppet] - 10https://gerrit.wikimedia.org/r/367403 (owner: 10Ottomata) [14:20:50] (03PS4) 10Hashar: contint: role and packages for R language [puppet] - 10https://gerrit.wikimedia.org/r/363337 (https://phabricator.wikimedia.org/T153856) [14:21:16] (03CR) 10Hashar: "Nice! I have rebased on top of Ibd8b76f2ffd1cfaab6fdcc84117042eb668ed598 and thus change boils down to:" [puppet] - 10https://gerrit.wikimedia.org/r/363337 (https://phabricator.wikimedia.org/T153856) (owner: 10Hashar) [14:21:27] (03PS21) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [14:22:09] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3466183 (10MF-Warburg) 05stalled>03Open p:05Lowest>03Normal Langcom has reviewed the concerns and found them unfounded. So please continue with th... [14:26:06] PROBLEM - Check systemd state on dbstore2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:26:56] PROBLEM - puppet last run on dbstore2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-mysqld-exporter@] [14:27:35] (03PS1) 10Jforrester: Enable response reference lists on all Wikiquotes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367404 (https://phabricator.wikimedia.org/T159895) [14:29:50] !log Run maintain-views on labsdb1009, labsdb1010 and labsdb1011 for s2 wikis - T153743 [14:30:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:01] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [14:30:09] 10Operations, 10LDAP-Access-Requests, 10Wikidata-Sprint: Add "chrisneuroth" to wmde LDAP group - https://phabricator.wikimedia.org/T170552#3466217 (10MoritzMuehlenhoff) @christophneuroth : The Legal department of WMF doesn't have an NDA for you on record (the L3 you signed at T170616 is only sufficient for P... [14:31:11] 10Operations, 10netops: Merge AS14907 with AS43281 - https://phabricator.wikimedia.org/T167840#3346480 (10mark) I guess there's something to be said for using different ASNs for core vs CDN in the case of losing our transport connectivity to (one of) the CDN sites. We could then still tunnel this over the Inte... [14:32:21] (03PS1) 10Filippo Giunchedi: profile: check and alert on free filesystem inodes [puppet] - 10https://gerrit.wikimedia.org/r/367406 (https://phabricator.wikimedia.org/T129222) [14:32:23] (03PS1) 10Filippo Giunchedi: Update check_disk_options to check/alert on free inodes [puppet] - 10https://gerrit.wikimedia.org/r/367407 (https://phabricator.wikimedia.org/T129222) [14:35:56] (03PS1) 10Jcrespo: prometheus-mysqld-exporter: Fix dependencies and not auto-start [puppet] - 10https://gerrit.wikimedia.org/r/367408 (https://phabricator.wikimedia.org/T170666) [14:36:18] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/7140/tin.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/367406 (https://phabricator.wikimedia.org/T129222) (owner: 10Filippo Giunchedi) [14:36:24] (03PS2) 10Filippo Giunchedi: profile: check and alert on free filesystem inodes [puppet] - 10https://gerrit.wikimedia.org/r/367406 (https://phabricator.wikimedia.org/T129222) [14:36:55] (03PS2) 10Jcrespo: prometheus-mysqld-exporter: Fix dependencies and not auto-start [puppet] - 10https://gerrit.wikimedia.org/r/367408 (https://phabricator.wikimedia.org/T170666) [14:37:41] (03CR) 10Filippo Giunchedi: [C: 032] profile: check and alert on free filesystem inodes [puppet] - 10https://gerrit.wikimedia.org/r/367406 (https://phabricator.wikimedia.org/T129222) (owner: 10Filippo Giunchedi) [14:38:48] (03PS3) 10Jcrespo: prometheus-mysqld-exporter: Fix dependencies and not auto-start [puppet] - 10https://gerrit.wikimedia.org/r/367408 (https://phabricator.wikimedia.org/T170666) [14:38:55] (03PS5) 10Giuseppe Lavagetto: PDF Render: Check hourly if the service is running via cron [puppet] - 10https://gerrit.wikimedia.org/r/359967 (https://phabricator.wikimedia.org/T159922) (owner: 10GWicke) [14:41:07] (03CR) 10Giuseppe Lavagetto: [C: 032] "This is introducing a sizeable debt to be paid as soon as possible; I agreed to merge this change with the agreement this won't stay aroun" [puppet] - 10https://gerrit.wikimedia.org/r/359967 (https://phabricator.wikimedia.org/T159922) (owner: 10GWicke) [14:41:16] (03PS1) 10Giuseppe Lavagetto: Revert "PDF Render: Check hourly if the service is running via cron" [puppet] - 10https://gerrit.wikimedia.org/r/367409 [14:41:29] (03CR) 10Jcrespo: [C: 032] prometheus-mysqld-exporter: Fix dependencies and not auto-start [puppet] - 10https://gerrit.wikimedia.org/r/367408 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [14:41:37] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "To be merged on September 15th, 2017" [puppet] - 10https://gerrit.wikimedia.org/r/367409 (owner: 10Giuseppe Lavagetto) [14:41:39] (03PS4) 10Jcrespo: prometheus-mysqld-exporter: Fix dependencies and not auto-start [puppet] - 10https://gerrit.wikimedia.org/r/367408 (https://phabricator.wikimedia.org/T170666) [14:43:11] 10Operations, 10ops-eqiad, 10OCG-General, 10Reading-Web-Backlog (Tracking), 10User-Joe: ocg1001 is broken - https://phabricator.wikimedia.org/T170886#3466241 (10Joe) 05Open>03Resolved [14:45:25] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466244 (10Marostegui) This is now done, not big delays happened on the slowest slaves (10-20 seconds tops). Feel free to close this task if you think it is done fr... [14:46:49] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Carrotkit → 胡蘿蔔: supervision needed - https://phabricator.wikimedia.org/T171474#3466247 (10Wong128hk) 05Open>03Resolved a:03Wong128hk Rename was completed. Thank you for supervision. [14:47:16] RECOVERY - puppet last run on dbstore2002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [14:49:17] RECOVERY - Check systemd state on dbstore2002 is OK: OK - running: The system is fully operational [14:52:48] (03PS1) 10Hashar: contint: webperf Jenkins slave [puppet] - 10https://gerrit.wikimedia.org/r/367411 (https://phabricator.wikimedia.org/T166756) [14:54:26] PROBLEM - Check systemd state on dbstore2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:59:18] (03PS2) 10Hashar: contint: webperformance Jenkins slave [puppet] - 10https://gerrit.wikimedia.org/r/367411 (https://phabricator.wikimedia.org/T166756) [15:06:58] (03PS1) 10Gehel: maps - initial data import needs to be done with 900913 geometry [puppet] - 10https://gerrit.wikimedia.org/r/367415 (https://phabricator.wikimedia.org/T169011) [15:07:31] 10Operations, 10ops-codfw, 10Cloud-VPS, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3466298 (10chasemp) 05Open>03Resolved closing this as further implementation will be tracked in other tasks [15:07:35] 10Operations, 10ops-codfw, 10Cloud-VPS, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3466301 (10chasemp) 05Open>03Resolved closing this as further implementation will be tracked in other tasks [15:09:47] 10Operations, 10monitoring, 10Patch-For-Review: Check for an oversized exim4 queue indicating mail delivery failures - https://phabricator.wikimedia.org/T133110#2220489 (10fgiunchedi) While are at it I'd suggest removing the disk i/o check which hasn't yield good result [15:15:26] PROBLEM - Apache HTTP on mw1283 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [15:15:36] PROBLEM - HHVM rendering on mw1283 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [15:16:06] PROBLEM - Nginx local proxy to apache on mw1283 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.007 second response time [15:16:20] (03CR) 10Hashar: [C: 031] "cherry picked on applied to provision webperformance.integration.eqiad.wmflabs" [puppet] - 10https://gerrit.wikimedia.org/r/367411 (https://phabricator.wikimedia.org/T166756) (owner: 10Hashar) [15:16:37] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 79124 bytes in 0.260 second response time [15:17:06] RECOVERY - Nginx local proxy to apache on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.034 second response time [15:17:26] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.024 second response time [15:17:27] 10Operations, 10monitoring: Monitor internal CA expirations - https://phabricator.wikimedia.org/T171157#3466343 (10faidon) a:03Dzahn [15:18:16] 10Operations, 10netops: Evaluate NetBox as a Racktables replacement & IPAM - https://phabricator.wikimedia.org/T170144#3466346 (10faidon) [15:18:25] (03CR) 10Gehel: [C: 032] maps - initial data import needs to be done with 900913 geometry [puppet] - 10https://gerrit.wikimedia.org/r/367415 (https://phabricator.wikimedia.org/T169011) (owner: 10Gehel) [15:19:16] PROBLEM - pdfrender on scb1003 is CRITICAL: connect to address 10.64.32.153 and port 5252: Connection refused [15:19:31] PROBLEM - LVS HTTP IPv4 on pdfrender.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.35 and port 5252: Connection refused [15:19:31] PROBLEM - pdfrender on scb1004 is CRITICAL: connect to address 10.64.48.29 and port 5252: Connection refused [15:19:36] PROBLEM - pdfrender on scb2001 is CRITICAL: connect to address 10.192.32.132 and port 5252: Connection refused [15:19:36] PROBLEM - pdfrender on scb2003 is CRITICAL: connect to address 10.192.0.33 and port 5252: Connection refused [15:19:36] PROBLEM - pdfrender on scb2002 is CRITICAL: connect to address 10.192.48.43 and port 5252: Connection refused [15:19:36] PROBLEM - pdfrender on scb2005 is CRITICAL: connect to address 10.192.0.34 and port 5252: Connection refused [15:19:43] <_joe_> oh whow [15:19:46] PROBLEM - pdfrender on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 5252: Connection refused [15:19:51] PROBLEM - LVS HTTP IPv4 on pdfrender.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.35 and port 5252: Connection refused [15:19:52] <_joe_> ok I'm fixing it [15:19:54] no maintenance ongoing? [15:19:56] PROBLEM - pdfrender on scb2004 is CRITICAL: connect to address 10.192.16.36 and port 5252: Connection refused [15:19:59] that patch worked as expected [15:20:00] <_joe_> no, the change I merged [15:20:03] oh [15:20:08] kk [15:20:16] PROBLEM - pdfrender on scb2006 is CRITICAL: connect to address 10.192.32.20 and port 5252: Connection refused [15:20:21] <_joe_> volans: can you merge my revert while I bring the shit back up? [15:20:26] let me know if I can help [15:20:26] PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - pdfrender_5252 - Could not depool server scb2004.codfw.wmnet because of too many down! [15:20:32] sure [15:20:53] _joe_: there is a -2 from you [15:20:55] 10Operations, 10monitoring: Fix Icinga checks for test/decom servers - https://phabricator.wikimedia.org/T151632#3466354 (10faidon) a:03Dzahn [15:20:58] if it's https://gerrit.wikimedia.org/r/#/c/367409/ [15:21:00] <_joe_> volans: yeah ignore it [15:21:01] :-) [15:21:20] (03CR) 10Volans: [C: 032] "Fixing outage" [puppet] - 10https://gerrit.wikimedia.org/r/367409 (owner: 10Giuseppe Lavagetto) [15:21:26] RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [15:21:26] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [15:21:27] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [15:21:36] RECOVERY - pdfrender on scb2001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.075 second response time [15:21:46] RECOVERY - pdfrender on scb2003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time [15:21:46] RECOVERY - pdfrender on scb2002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time [15:21:46] RECOVERY - pdfrender on scb2005 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.075 second response time [15:21:56] <_joe_> Interestingly, it worked when I tested it before [15:21:56] RECOVERY - pdfrender on scb2004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time [15:21:59] <_joe_> uhm [15:22:09] (03PS2) 10Volans: Revert "PDF Render: Check hourly if the service is running via cron" [puppet] - 10https://gerrit.wikimedia.org/r/367409 (owner: 10Giuseppe Lavagetto) [15:22:13] (03CR) 10Volans: [V: 032 C: 032] Revert "PDF Render: Check hourly if the service is running via cron" [puppet] - 10https://gerrit.wikimedia.org/r/367409 (owner: 10Giuseppe Lavagetto) [15:22:28] _joe_: that is one euro for the "it worked on my workstation" jar :-) [15:22:37] <_joe_> jynus: in production [15:22:47] _joe_: I cannot submit in gerrit [15:22:48] ok, I was just kidding [15:22:57] 10Operations, 10monitoring: certspotter on einsteinium has issues talking to external - https://phabricator.wikimedia.org/T162327#3466359 (10faidon) a:03faidon [15:23:18] volans: now you can [15:23:32] thanks [15:23:53] 10Operations, 10monitoring: Replace nrpe 2.15 (& evaluate alternatives) - https://phabricator.wikimedia.org/T157853#3466367 (10faidon) a:03faidon [15:24:02] negative vote have precedence over positive ones on gerrit [15:24:05] <_joe_> ook, now I see the problem is non-deterministic [15:24:18] _joe_: merged [15:24:41] 10Operations, 10monitoring, 10User-fgiunchedi: update diamond to latest upstream version - https://phabricator.wikimedia.org/T97635#3466372 (10faidon) a:03fgiunchedi [15:24:47] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [15:26:10] <_joe_> uhm, meh, my bad [15:26:35] 10Operations, 10monitoring: Icinga: timeseries checks should have the link to a graph with the data - https://phabricator.wikimedia.org/T170353#3428036 (10faidon) a:03Volans [15:27:35] <_joe_> don't know why the recovery doesn't come [15:27:38] <_joe_> it should [15:28:34] <_joe_> ok this is an error in pybal [15:29:11] !log oblivian@puppetmaster1001 conftool action : set/pooled=yes; selector: service=pdfrender [15:29:21] <_joe_> ok now the recovery should come [15:29:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:22] <_joe_> sigh [15:29:41] RECOVERY - LVS HTTP IPv4 on pdfrender.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time [15:30:01] RECOVERY - LVS HTTP IPv4 on pdfrender.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [15:32:40] (03CR) 10Giuseppe Lavagetto: "For the record, there were several issues with the script. It failed because of a quoting failure, but it exposed how easy it is to cause " [puppet] - 10https://gerrit.wikimedia.org/r/367409 (owner: 10Giuseppe Lavagetto) [15:35:04] 10Operations, 10monitoring: Programmatic generation of grafana dashboards - https://phabricator.wikimedia.org/T171482#3466440 (10fgiunchedi) [15:39:08] 10Operations, 10monitoring, 10Patch-For-Review: Icinga disk space check should also check inode usage - https://phabricator.wikimedia.org/T129222#3466483 (10fgiunchedi) a:03fgiunchedi [15:39:41] 10Operations, 10monitoring: certspotter on einsteinium has issues talking to external - https://phabricator.wikimedia.org/T162327#3466486 (10faidon) 05Open>03declined This is basically an artifact of the CT logs failing to respond every now and then, which certspotter complains about. It doesn't happen oft... [15:39:45] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324#3466488 (10faidon) [15:43:52] 10Operations, 10monitoring: Programmatic generation of grafana dashboards - https://phabricator.wikimedia.org/T171482#3466440 (10hashar) OpenStack infrastructure has a python based utility to generate Grafana board based on a YAML DSL. That is similar to their Jenkins Job Builder used to generate jobs. Basic... [15:46:30] (03PS1) 10Jcrespo: prometheus-mysqld-exporter: Fix parameter problems [puppet] - 10https://gerrit.wikimedia.org/r/367419 (https://phabricator.wikimedia.org/T170666) [15:49:59] 10Operations, 10Puppet, 10Cloud-VPS: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188#3466547 (10bd808) [15:52:22] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3466554 (10Jayprakash12345) @Dereckson Sir, please create wiki in this week. It taked already long time. [15:54:42] (03PS2) 10Jcrespo: prometheus-mysqld-exporter: Fix parameter problems [puppet] - 10https://gerrit.wikimedia.org/r/367419 (https://phabricator.wikimedia.org/T170666) [15:55:22] (03CR) 10Jcrespo: [V: 032 C: 032] prometheus-mysqld-exporter: Fix parameter problems [puppet] - 10https://gerrit.wikimedia.org/r/367419 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [15:56:05] 10Operations, 10Puppet, 10Cloud-VPS: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188#3456889 (10Andrew) I'm pretty sure that #1 is moot -- at least, anytime we discuss it we conclude that the 'labs-support' vlan isn't really a useful concept and should be elimi... [15:57:06] RECOVERY - Check systemd state on dbstore2002 is OK: OK - running: The system is fully operational [15:58:17] 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3466572 (10elukey) Pull request merged from upstream: https://github.com/etsy/logster/commit/a24249886391a8b885d21c882f6fcaa95e29b015... [15:58:57] 10Operations, 10Puppet, 10Cloud-VPS: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188#3466573 (10Andrew) Here are some things that need to be thought about/figured out before we can go forward: - Security model: Having a labs VM that is a Ops-only and critical... [15:59:02] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3376126 (10Jayprakash12345) p:05Normal>03High [16:02:58] 10Operations, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#3466588 (10fdans) [16:05:23] 10Operations, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#3466596 (10fdans) a:03elukey [16:11:08] (03PS1) 10Ema: varnish: reject phabricator uploads from WP0 users [puppet] - 10https://gerrit.wikimedia.org/r/367422 (https://phabricator.wikimedia.org/T168142) [16:13:14] (03PS1) 10BryanDavis: wmcs: Use yaml.safe_load in maintain-{meta_p, views}.py [puppet] - 10https://gerrit.wikimedia.org/r/367423 [16:13:16] (03PS1) 10BryanDavis: wmcs: Use yaml.safe_load in nova_fullstack_test.py [puppet] - 10https://gerrit.wikimedia.org/r/367424 [16:13:18] (03PS1) 10BryanDavis: wmcs: Use yaml.safe_load in archive-project-volumes [puppet] - 10https://gerrit.wikimedia.org/r/367425 [16:16:53] (03PS1) 10BryanDavis: wmcs: Use yaml.safe_load in shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/367426 [16:17:59] (03CR) 10jerkins-bot: [V: 04-1] wmcs: Use yaml.safe_load in shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/367426 (owner: 10BryanDavis) [16:18:15] (03CR) 10Volans: [C: 031] "Indeed, safe_load should be the default ;)" [puppet] - 10https://gerrit.wikimedia.org/r/367423 (owner: 10BryanDavis) [16:19:16] (03CR) 10Volans: [C: 031] "Indeed!" [puppet] - 10https://gerrit.wikimedia.org/r/367424 (owner: 10BryanDavis) [16:19:21] PROBLEM - LVS HTTP IPv4 on pdfrender.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.35 and port 5252: Connection refused [16:19:26] PROBLEM - pdfrender on scb2004 is CRITICAL: connect to address 10.192.16.36 and port 5252: Connection refused [16:19:37] _joe_ ? ^^^ [16:19:46] PROBLEM - pdfrender on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 5252: Connection refused [16:19:48] <_joe_> hat? [16:19:56] PROBLEM - pdfrender on scb1003 is CRITICAL: connect to address 10.64.32.153 and port 5252: Connection refused [16:19:58] PROBLEM - pdfrender on scb1004 is CRITICAL: connect to address 10.64.48.29 and port 5252: Connection refused [16:20:01] the revert didnt' remove the cron [16:20:02] PROBLEM - LVS HTTP IPv4 on pdfrender.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.35 and port 5252: Connection refused [16:20:08] I assumed you were cleaning it [16:20:08] <_joe_> shit [16:20:09] <_joe_> yes [16:20:16] PROBLEM - pdfrender on scb2001 is CRITICAL: connect to address 10.192.32.132 and port 5252: Connection refused [16:20:16] PROBLEM - pdfrender on scb2003 is CRITICAL: connect to address 10.192.0.33 and port 5252: Connection refused [16:20:17] PROBLEM - pdfrender on scb2002 is CRITICAL: connect to address 10.192.48.43 and port 5252: Connection refused [16:20:17] sorry didn't ask explicitely [16:20:17] PROBLEM - pdfrender on scb2005 is CRITICAL: connect to address 10.192.0.34 and port 5252: Connection refused [16:20:17] PROBLEM - pdfrender on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 5252: Connection refused [16:20:17] <_joe_> sorry, nope [16:20:37] I was also in a meeting when merging the revert [16:20:45] !log oblivian@puppetmaster1001 conftool action : set/pooled=yes; selector: service=pdfrender [16:20:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:02] <_joe_> me too [16:21:03] <_joe_> sigh [16:21:19] RECOVERY - pdfrender on scb2001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.075 second response time [16:21:19] RECOVERY - pdfrender on scb2003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.077 second response time [16:21:19] RECOVERY - pdfrender on scb2002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time [16:21:19] RECOVERY - pdfrender on scb2005 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.075 second response time [16:21:19] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.004 second response time [16:21:46] RECOVERY - pdfrender on scb2006 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time [16:21:47] RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [16:21:56] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.004 second response time [16:21:57] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.004 second response time [16:22:11] RECOVERY - LVS HTTP IPv4 on pdfrender.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.073 second response time [16:22:32] RECOVERY - LVS HTTP IPv4 on pdfrender.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.005 second response time [16:23:01] <_joe_> *fixed* [16:23:11] thanks! [16:23:14] <_joe_> sigh [16:23:16] indeed [16:23:29] <_joe_> sorry, I should've dropped everything else [16:23:29] sorry for not asking explicitely about it [16:23:36] <_joe_> nah it was my bad [16:23:44] <_joe_> I was looking into pybal and got distracted [16:23:47] (03PS2) 10BryanDavis: wmcs: Use yaml.safe_load in shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/367426 [16:23:57] it's just that awesome [16:24:55] (03PS1) 10BryanDavis: wmcs: Use yaml.safe_load in kube2proxy.py [puppet] - 10https://gerrit.wikimedia.org/r/367428 [16:25:27] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy [16:44:20] 10Operations, 10Continuous-Integration-Config, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#3466736 (10hashar) @Joe proposed a rewriting of the Puppet Rakefile as part of T166888 Patch... [16:49:00] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3466772 (10Anomie) [16:49:01] jouncebot: next [16:49:01] In 1 hour(s) and 10 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T1800) [16:49:04] jouncebot: now [16:49:04] No deployments scheduled for the next 1 hour(s) and 10 minute(s) [16:49:04] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463322 (10Anomie) This reminds me of {... [17:02:32] 10Operations, 10OTRS: mendelevium (otrs) running out of inodes - https://phabricator.wikimedia.org/T171490#3466797 (10fgiunchedi) [17:12:50] (03PS2) 10Andrew Bogott: openstack: Remove stray pmtpa references [puppet] - 10https://gerrit.wikimedia.org/r/367004 (owner: 10Krinkle) [17:15:33] 10Operations, 10OTRS: mendelevium (otrs) running out of inodes - https://phabricator.wikimedia.org/T171490#3466832 (10herron) It looks like previous otrs versions in /opt are using the inodes mendelevium:/opt$ df -i / Filesystem Inodes IUsed IFree IUse% Mounted on /dev/vda1 1577968 1498012 7... [17:19:28] (03CR) 10Andrew Bogott: [C: 032] "Thanks for the cleanup!" [puppet] - 10https://gerrit.wikimedia.org/r/367004 (owner: 10Krinkle) [17:21:00] (03PS3) 10Reedy: Mostly re-enable Generic.Arrays.DisallowLongArraySyntax.Found [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366776 [17:23:22] (03PS2) 10Rush: wmcs: Use yaml.safe_load in maintain-{meta_p, views}.py [puppet] - 10https://gerrit.wikimedia.org/r/367423 (owner: 10BryanDavis) [17:32:12] 10Operations, 10Puppet, 10Cloud-VPS: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188#3456889 (10chasemp) My understanding of this is we are looking at #1 as the current compromise short of moving services into the the Labs realm directly, though I believe in th... [17:33:29] 10Operations, 10Traffic: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442#3466847 (10BBlack) Add T171318 to the list too. There's doubtless a long tail of issues we'll never fully realize that would be helped by work here. Part of the reason this ticket's still idling... [17:34:45] 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3229950 (10BBlack) [17:34:48] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Anycast (Auth)DNS - https://phabricator.wikimedia.org/T98006#1256936 (10BBlack) [17:34:51] 10Operations, 10Traffic: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442#3466866 (10BBlack) [17:37:20] 10Operations, 10ops-codfw, 10Cloud-VPS, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3466870 (10chasemp) 05Open>03Resolved ongoing implementation tracked in T167559 [17:38:01] ACKNOWLEDGEMENT - HP RAID on ms-be1016 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Cache: Permanently Disabled - Cable Error - Battery/Capacitor: Recharging nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T171492 [17:38:05] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1016 - https://phabricator.wikimedia.org/T171492#3466875 (10ops-monitoring-bot) [17:38:43] !log gehel@tin Started deploy [wdqs/wdqs@c1b5c27]: (no justification provided) [17:38:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:40:41] !log gehel@tin Finished deploy [wdqs/wdqs@c1b5c27]: (no justification provided) (duration: 01m 58s) [17:40:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:08] 10Operations, 10OTRS: mendelevium (otrs) running out of inodes - https://phabricator.wikimedia.org/T171490#3466917 (10herron) tar/gzipping a couple previous versions has brought inode utilization down to 80% mendelevium:/opt$ df -i / Filesystem Inodes IUsed IFree IUse% Mounted on /dev/vda1... [17:41:12] SMalyshev: ^ wdqs deployment done, tests are green [17:41:31] (03PS2) 10Smalyshev: wdqs - send ldf traffic to wdqs1003.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/363596 (https://phabricator.wikimedia.org/T166244) (owner: 10Gehel) [17:41:45] (03CR) 10Smalyshev: [C: 031] wdqs - send ldf traffic to wdqs1003.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/363596 (https://phabricator.wikimedia.org/T166244) (owner: 10Gehel) [17:44:06] gehel: thank you! [17:45:02] (03CR) 10Reedy: [C: 032] Mostly re-enable Generic.Arrays.DisallowLongArraySyntax.Found [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366776 (owner: 10Reedy) [17:45:04] (03CR) 10Rush: [C: 032] wmcs: Use yaml.safe_load in maintain-{meta_p, views}.py [puppet] - 10https://gerrit.wikimedia.org/r/367423 (owner: 10BryanDavis) [17:45:56] (03PS2) 10Rush: wmcs: Use yaml.safe_load in nova_fullstack_test.py [puppet] - 10https://gerrit.wikimedia.org/r/367424 (owner: 10BryanDavis) [17:50:24] 10Operations, 10Traffic: Implement machine-local forwarding DNS caches - https://phabricator.wikimedia.org/T171498#3466986 (10BBlack) [17:52:33] (03PS5) 10MacFan4000: Update ExtensionDistributer versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 [17:55:27] (03Merged) 10jenkins-bot: Mostly re-enable Generic.Arrays.DisallowLongArraySyntax.Found [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366776 (owner: 10Reedy) [17:55:43] (03PS5) 10Reedy: Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 [17:55:45] (03CR) 10Reedy: [C: 032] Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [17:56:05] (03CR) 10Rush: [C: 032] wmcs: Use yaml.safe_load in nova_fullstack_test.py [puppet] - 10https://gerrit.wikimedia.org/r/367424 (owner: 10BryanDavis) [17:56:16] (03CR) 10jenkins-bot: Mostly re-enable Generic.Arrays.DisallowLongArraySyntax.Found [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366776 (owner: 10Reedy) [17:57:05] 10Operations, 10Multimedia, 10Performance-Team, 10RESTBase-API, and 3 others: Thumb API: Varnish / CDN questions - https://phabricator.wikimedia.org/T150673#3467048 (10GWicke) [17:58:04] (03PS2) 10Rush: wmcs: Use yaml.safe_load in archive-project-volumes [puppet] - 10https://gerrit.wikimedia.org/r/367425 (owner: 10BryanDavis) [17:58:10] (03CR) 10BryanDavis: [C: 04-1] "$wmfRealm guard is probably not what is wanted/intended." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367369 (owner: 10MarcoAurelio) [17:58:28] (03Merged) 10jenkins-bot: Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [17:58:48] (03CR) 10jenkins-bot: Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [17:59:42] !log reedy@tin Synchronized wmf-config/: phpcs (duration: 00m 45s) [17:59:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T1800). [18:00:05] RoanKattouw, Smalyshev, and James_F: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:15] (03PS6) 10MacFan4000: Update ExtensionDistributer versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 [18:00:32] am here [18:00:33] * James_F waves. [18:00:41] !log reedy@tin Synchronized docroot/: phpcs (duration: 00m 44s) [18:00:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:51] I'm here [18:00:54] * Reedy finishes syncing all the things [18:01:25] What's to deploy? [18:01:32] !log reedy@tin Synchronized w: phpcs (duration: 00m 43s) [18:01:34] Quite a bit [18:01:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:50] (03CR) 10Rush: [C: 032] wmcs: Use yaml.safe_load in archive-project-volumes [puppet] - 10https://gerrit.wikimedia.org/r/367425 (owner: 10BryanDavis) [18:02:18] (03PS3) 10Rush: wmcs: Use yaml.safe_load in shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/367426 (owner: 10BryanDavis) [18:02:25] !log reedy@tin Synchronized tests: phpcs (duration: 00m 43s) [18:02:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:13] (03PS4) 10Reedy: Add more units for conversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362606 (https://phabricator.wikimedia.org/T168582) (owner: 10Smalyshev) [18:03:17] !log reedy@tin Synchronized phpcs.xml: phpcs (duration: 00m 43s) [18:03:17] (03CR) 10Reedy: [C: 032] Add more units for conversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362606 (https://phabricator.wikimedia.org/T168582) (owner: 10Smalyshev) [18:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:19] !log otto@tin Started deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2001 [18:05:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:38] Unfortunately, Jenkins is busy [18:06:58] !log otto@tin Finished deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2001 (duration: 01m 39s) [18:07:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:48] (03Merged) 10jenkins-bot: Add more units for conversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362606 (https://phabricator.wikimedia.org/T168582) (owner: 10Smalyshev) [18:09:50] As ever. [18:09:57] (03CR) 10jenkins-bot: Add more units for conversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362606 (https://phabricator.wikimedia.org/T168582) (owner: 10Smalyshev) [18:10:10] 10Operations, 10OTRS: mendelevium (otrs) running out of inodes - https://phabricator.wikimedia.org/T171490#3467124 (10herron) p:05Triage>03Normal [18:10:29] (03PS3) 10Reedy: Enable Cirrus search of wbsearchentities when using useCirrus=1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366788 (https://phabricator.wikimedia.org/T125500) (owner: 10Smalyshev) [18:10:38] (03CR) 10Reedy: [C: 032] Enable Cirrus search of wbsearchentities when using useCirrus=1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366788 (https://phabricator.wikimedia.org/T125500) (owner: 10Smalyshev) [18:11:08] !log reedy@tin Synchronized wmf-config/unitConversionConfig.json: T168582 (duration: 00m 43s) [18:11:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:19] T168582: Configure conversion for all units that have a conversion factor to SI units defined - https://phabricator.wikimedia.org/T168582 [18:12:22] 10Puppet, 10Mobile, 10Need-volunteer, 10Reading-Web-Backlog (Tracking), 10Reading-Web-Kanban-Board: URLs with title query string parameter and additional query string parameters do not redirect to mobile site - https://phabricator.wikimedia.org/T154227#2904582 (10Jdlrobson) [18:12:31] 10Puppet, 10Mobile, 10Need-volunteer, 10Reading-Web-Backlog (Tracking), 10Reading-Web-Kanban-Board: URLs with title query string parameter and additional query string parameters do not redirect to mobile site - https://phabricator.wikimedia.org/T154227#2904582 (10Jdlrobson) Looks like this regex is to bl... [18:13:40] !log reedy@tin Synchronized php-1.30.0-wmf.10/extensions/Thanks/extension.json: T170917 (duration: 00m 43s) [18:13:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:50] T170917: Thank button broken on mobilefrontend - displays "{{GENDER:[object Object]|{{GENDER:unknown|Thank}}}}" - https://phabricator.wikimedia.org/T170917 [18:13:54] (03Merged) 10jenkins-bot: Enable Cirrus search of wbsearchentities when using useCirrus=1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366788 (https://phabricator.wikimedia.org/T125500) (owner: 10Smalyshev) [18:14:57] !log otto@tin Started deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2001 [18:15:02] !log otto@tin Finished deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2001 (duration: 00m 04s) [18:15:06] !log reedy@tin Synchronized php-1.30.0-wmf.10/extensions/Echo/modules/styles/mw.echo.ui.NotificationBadgeWidget.less: T171302 (duration: 00m 45s) [18:15:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:28] T171302: Notifications flyout footer has an unwanted margin - https://phabricator.wikimedia.org/T171302 [18:16:09] (03PS2) 10Reedy: Enable response reference lists on all Wikiquotes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367404 (https://phabricator.wikimedia.org/T159895) (owner: 10Jforrester) [18:16:11] (03CR) 10jenkins-bot: Enable Cirrus search of wbsearchentities when using useCirrus=1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366788 (https://phabricator.wikimedia.org/T125500) (owner: 10Smalyshev) [18:16:13] (03CR) 10Reedy: [C: 032] Enable response reference lists on all Wikiquotes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367404 (https://phabricator.wikimedia.org/T159895) (owner: 10Jforrester) [18:16:41] !log reedy@tin Synchronized wmf-config/Wikibase.php: T125500 (duration: 00m 43s) [18:16:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:50] T125500: Index Wikidata labels and descriptions as separate fields in ElasticSearch - https://phabricator.wikimedia.org/T125500 [18:17:14] !log otto@tin Started deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2002 [18:17:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:15] (03Merged) 10jenkins-bot: Enable response reference lists on all Wikiquotes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367404 (https://phabricator.wikimedia.org/T159895) (owner: 10Jforrester) [18:18:20] !log otto@tin Finished deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2002 (duration: 01m 06s) [18:18:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:31] (03CR) 10jenkins-bot: Enable response reference lists on all Wikiquotes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367404 (https://phabricator.wikimedia.org/T159895) (owner: 10Jforrester) [18:18:47] (03CR) 10Legoktm: [C: 031] "Thanks" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 (owner: 10MacFan4000) [18:18:50] (03PS2) 10Ayounsi: Remove DNS records for unused IPs [dns] - 10https://gerrit.wikimedia.org/r/366871 [18:18:55] Reedy: also wanna sync out https://gerrit.wikimedia.org/r/#/c/365137/ ? [18:19:15] Can do [18:19:26] Ha, was about to suggest that. [18:19:29] !log reedy@tin Synchronized php-1.30.0-wmf.10/includes/specials/pagers/UsersPager.php: T171332 (duration: 00m 43s) [18:19:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:39] T171332: Checkboxes on Special:ListUsers don't work - https://phabricator.wikimedia.org/T171332 [18:19:46] 10Operations, 10Puppet, 10Mobile, 10Need-volunteer, and 2 others: URLs with title query string parameter and additional query string parameters do not redirect to mobile site - https://phabricator.wikimedia.org/T154227#3467189 (10Jdlrobson) Background: Looks like this was introduced by T103592. The regex s... [18:20:26] legoktm: Why are we not switching the snapshot variable to 1.30? [18:20:28] (03CR) 10jerkins-bot: [V: 04-1] Update ExtensionDistributer versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 (owner: 10MacFan4000) [18:20:37] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: T159895 (duration: 00m 43s) [18:20:41] Also, comment shouldn't be start of line [18:20:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:47] T159895: Support wikis in converting reference lists over to `responsive` - https://phabricator.wikimedia.org/T159895 [18:20:52] (03PS4) 10Reedy: Set proofreadpage-showheaders = 1 for tawikisource bnwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366323 (https://phabricator.wikimedia.org/T169478) [18:21:07] Reedy: we can't do that until there are REL1_30 branches [18:21:16] Meh [18:21:26] Anyway, needs fixing because of PHPCS [18:21:32] Leave a comment to that extent, and bump to 1.30? ;) [18:21:41] (03CR) 10Reedy: [C: 032] Set proofreadpage-showheaders = 1 for tawikisource bnwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366323 (https://phabricator.wikimedia.org/T169478) (owner: 10Reedy) [18:22:26] (03PS7) 10Legoktm: Update ExtensionDistributer versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 (owner: 10MacFan4000) [18:22:59] (03CR) 1020after4: [C: 031] varnish: reject phabricator uploads from WP0 users [puppet] - 10https://gerrit.wikimedia.org/r/367422 (https://phabricator.wikimedia.org/T168142) (owner: 10Ema) [18:23:13] (03Merged) 10jenkins-bot: Set proofreadpage-showheaders = 1 for tawikisource bnwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366323 (https://phabricator.wikimedia.org/T169478) (owner: 10Reedy) [18:23:15] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3467215 (10Anomie) It looks like my sup... [18:23:22] (03CR) 10jenkins-bot: Set proofreadpage-showheaders = 1 for tawikisource bnwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366323 (https://phabricator.wikimedia.org/T169478) (owner: 10Reedy) [18:24:30] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: T169478 T169481 (duration: 00m 43s) [18:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:24:41] T169481: Enable "Show header and footer fields when editing in the পাতা namespace" setting by default for Bengali wikisource - https://phabricator.wikimedia.org/T169481 [18:24:41] T169478: Enable "Show header and footer fields when editing in the Proofread namespace" setting by default for Tamil wikisource - https://phabricator.wikimedia.org/T169478 [18:25:20] !log reedy@tin Synchronized wmf-config/CommonSettings.php: T169478 T169481 (duration: 00m 42s) [18:25:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:36] !log otto@tin Started deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2003 [18:25:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:53] !log otto@tin Finished deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2003 (duration: 00m 17s) [18:25:53] (03PS8) 10Reedy: Update ExtensionDistributer versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 (owner: 10MacFan4000) [18:25:58] (03CR) 10Reedy: [C: 032] Update ExtensionDistributer versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 (owner: 10MacFan4000) [18:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:49] !log otto@tin Started deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes [18:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:28] !log otto@tin Finished deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes (duration: 00m 39s) [18:27:35] !log otto@tin Started deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes [18:27:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:50] !log otto@tin Finished deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes (duration: 00m 14s) [18:27:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:59] (03Merged) 10jenkins-bot: Update ExtensionDistributer versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 (owner: 10MacFan4000) [18:28:11] (03CR) 10jenkins-bot: Update ExtensionDistributer versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365137 (owner: 10MacFan4000) [18:28:33] !log otto@tin Started deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes [18:28:38] (03CR) 10Rush: [C: 032] wmcs: Use yaml.safe_load in shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/367426 (owner: 10BryanDavis) [18:28:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:43] !log otto@tin Finished deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes (duration: 00m 10s) [18:28:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:03] !log reedy@tin Synchronized wmf-config/CommonSettings.php: T153271 (duration: 00m 43s) [18:29:12] (03PS2) 10Rush: wmcs: Use yaml.safe_load in kube2proxy.py [puppet] - 10https://gerrit.wikimedia.org/r/367428 (owner: 10BryanDavis) [18:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:12] T153271: Release MediaWiki 1.29 - https://phabricator.wikimedia.org/T153271 [18:29:21] (03PS1) 10Reedy: Update phpunit to 4.8.36 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367453 [18:29:23] (03PS1) 10Reedy: Update mediawiki-codesniffer to 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 [18:30:08] !log otto@tin Started deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes [18:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:20] !log otto@tin Finished deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes (duration: 00m 12s) [18:30:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:54] 10Operations, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#3467266 (10Ottomata) Fix is deployed to eventbus. [18:32:22] (03CR) 10jerkins-bot: [V: 04-1] Update mediawiki-codesniffer to 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 (owner: 10Reedy) [18:32:41] (03CR) 10Reedy: [C: 032] Update phpunit to 4.8.36 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367453 (owner: 10Reedy) [18:32:51] legoktm: ^ you broke it : [18:32:51] :P [18:33:30] Some interesting warnings there.. [18:33:56] (03CR) 10Rush: [C: 032] wmcs: Use yaml.safe_load in kube2proxy.py [puppet] - 10https://gerrit.wikimedia.org/r/367428 (owner: 10BryanDavis) [18:33:59] hmm [18:34:03] it's linting non php files [18:34:05] (03Merged) 10jenkins-bot: Update phpunit to 4.8.36 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367453 (owner: 10Reedy) [18:35:26] !log reedy@tin Synchronized composer.json: phpunit (duration: 00m 43s) [18:35:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:39] PROBLEM - puppet last run on kafka2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): User[eventlogging] [18:36:09] (03CR) 10jenkins-bot: Update phpunit to 4.8.36 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367453 (owner: 10Reedy) [18:36:21] !log reedy@tin Synchronized composer.lock: phpunit (duration: 00m 43s) [18:36:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:33] legoktm: has MediaWiki.ExtraCharacters.CharacterBeforePHPOpeningTag.Found been upstreamed to Generic.PHP.CharacterBeforePHPOpeningTag.Found ? [18:44:33] (03PS1) 10Ottomata: Remove statistics::private role from stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/367457 (https://phabricator.wikimedia.org/T171373) [18:45:10] it was replaced with the upstream one yeah [18:45:42] they independently? included our modifications [18:45:49] lol [18:48:47] (03PS2) 10Ottomata: Remove statistics::private role from stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/367457 (https://phabricator.wikimedia.org/T171373) [18:53:15] (03CR) 10Ottomata: [V: 032 C: 032] "https://puppet-compiler.wmflabs.org/compiler02/7142/stat1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/367457 (https://phabricator.wikimedia.org/T171373) (owner: 10Ottomata) [18:53:21] (03CR) 10Ottomata: [C: 032] Remove statistics::private role from stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/367457 (https://phabricator.wikimedia.org/T171373) (owner: 10Ottomata) [18:53:43] (03PS2) 10Reedy: Update mediawiki-codesniffer to 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 [18:55:47] (03CR) 10jerkins-bot: [V: 04-1] Update mediawiki-codesniffer to 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 (owner: 10Reedy) [18:55:50] (03PS1) 10Ottomata: Remove rsyncd frag from stat1002 for analytics cluster hdfs rsync [puppet] - 10https://gerrit.wikimedia.org/r/367462 (https://phabricator.wikimedia.org/T171373) [18:56:13] (03CR) 10Ottomata: [V: 032 C: 032] Remove rsyncd frag from stat1002 for analytics cluster hdfs rsync [puppet] - 10https://gerrit.wikimedia.org/r/367462 (https://phabricator.wikimedia.org/T171373) (owner: 10Ottomata) [18:56:19] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:58:19] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:59:04] (03PS1) 10Ottomata: Since private role is no longer on stat1002, we need host specific admin::groups [puppet] - 10https://gerrit.wikimedia.org/r/367464 (https://phabricator.wikimedia.org/T171373) [18:59:51] (03PS3) 10Reedy: Update mediawiki-codesniffer to 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 [19:00:21] (03CR) 10Ottomata: [C: 032] Since private role is no longer on stat1002, we need host specific admin::groups [puppet] - 10https://gerrit.wikimedia.org/r/367464 (https://phabricator.wikimedia.org/T171373) (owner: 10Ottomata) [19:00:24] legoktm: Changing the exclude to be docroot/ and it still scans them... : [19:01:04] hmm [19:01:05] [19:01:18] why is that part not working? [19:02:08] (03CR) 10jerkins-bot: [V: 04-1] Update mediawiki-codesniffer to 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 (owner: 10Reedy) [19:02:40] We could make it ignore Internal.NoCodeFound [19:02:42] But that's meh [19:03:10] Doesn't fix some of the other errors though either [19:03:13] can you reproduce it locally? [19:03:23] Unfortunately, yes [19:05:26] can you file a bug? [19:06:10] phpcs sucks? :P [19:09:48] jouncebot: now [19:09:48] No deployments scheduled for the next 0 hour(s) and 50 minute(s) [19:14:50] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [19:15:10] (03PS4) 10Reedy: phpcs changes for mediawiki-codesniffer 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 [19:21:58] (03PS1) 10Reedy: Update mediawiki-codesniffer to 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367465 [19:22:57] (03CR) 10Reedy: [C: 032] phpcs changes for mediawiki-codesniffer 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 (owner: 10Reedy) [19:24:49] (03CR) 10jerkins-bot: [V: 04-1] Update mediawiki-codesniffer to 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367465 (owner: 10Reedy) [19:25:14] (03Merged) 10jenkins-bot: phpcs changes for mediawiki-codesniffer 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 (owner: 10Reedy) [19:26:06] !log reedy@tin Synchronized wmf-config/: phpcs (duration: 00m 44s) [19:26:13] (03CR) 10jenkins-bot: phpcs changes for mediawiki-codesniffer 0.10.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367454 (owner: 10Reedy) [19:26:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:59] !log reedy@tin Synchronized tests/: phpcs (duration: 00m 43s) [19:27:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:29] 10Operations, 10DBA, 10Mail, 10Patch-For-Review: Setup database for dmarc service - https://phabricator.wikimedia.org/T170158#3467519 (10herron) The mysql client IP addresses are: diadem 208.80.153.17 dysprosium 208.80.154.24 [19:30:49] !log otto@tin Started deploy [eventlogging/analytics@41e3418]: unique index only for id columns [19:30:52] !log otto@tin Finished deploy [eventlogging/analytics@41e3418]: unique index only for id columns (duration: 00m 02s) [19:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:00] PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [600.0] [19:45:47] (03CR) 10Zhuyifei1999: "Do we have some statistics on the zero uploads on phab? From my experience on Commons those who upload from zero ranges (abuse filter has " [puppet] - 10https://gerrit.wikimedia.org/r/367422 (https://phabricator.wikimedia.org/T168142) (owner: 10Ema) [19:49:09] PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [600.0] [19:52:09] PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [600.0] [19:55:23] !log ban elastic1031 from elasticsearch cluster, it's overloaded [19:55:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:34] (03CR) 10Zhuyifei1999: "Clarification: The intersection of https://commons.wikimedia.org/wiki/Special:Log/delete/Embedded_Data_Bot (uploads containing embedded ra" [puppet] - 10https://gerrit.wikimedia.org/r/367422 (https://phabricator.wikimedia.org/T168142) (owner: 10Ema) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T2000). Please do the needful. [20:02:05] banning 1031 seems to have done the trick, p50 is still way above normal but dropped from 300+ to 100 [20:03:09] RECOVERY - Number of backend failures per minute from CirrusSearch on graphite1001 is OK: OK: Less than 20.00% above the threshold [300.0] [20:03:37] p95 is still unreasonably high though ... [20:12:59] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [20:13:12] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate 30x increase in Jobrunner errors - https://phabricator.wikimedia.org/T171371#3467777 (10Krinkle) >>! In T171371#3464398, @bd808 wrote: > It was [[https://tools.wmflabs.org/sal/log/AVzrzZtHU4b8yJAIAZtx|deployed by @demon]]: > ``` > 2017-06-2... [20:13:49] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy [20:19:10] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] [20:22:10] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] [20:22:43] !log bsitzmann@tin Started deploy [mobileapps/deploy@2b4ca3b]: Update mobileapps to b608ec8 [20:22:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:48] !log bsitzmann@tin Finished deploy [mobileapps/deploy@2b4ca3b]: Update mobileapps to b608ec8 (duration: 04m 06s) [20:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:36:19] (03CR) 10Krinkle: varnish: Avoid std.fileread() and use new errorpage template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/350966 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [20:37:06] (03PS9) 10Krinkle: varnish: Switch browsersec to use errorpage template [puppet] - 10https://gerrit.wikimedia.org/r/355338 (https://phabricator.wikimedia.org/T113114) [20:40:24] 10Operations, 10ops-eqiad, 10Analytics-Kanban: Smartctl errors for one kafka1012 disk - https://phabricator.wikimedia.org/T168927#3467891 (10Nuria) [20:40:31] 10Operations, 10ops-eqiad, 10Analytics-Kanban: Smartctl errors for one kafka1012 disk - https://phabricator.wikimedia.org/T168927#3381297 (10Nuria) 05Open>03Resolved [20:44:23] i gotta run. p95 still hasn't recovered ... but 50 and 75 are reasonable at least. 95 likely wont recover until none of the machines are overloaded.. [20:46:16] (03PS7) 10Andrew Bogott: Puppetmaster: Fix apache config ssldir [puppet] - 10https://gerrit.wikimedia.org/r/365053 [20:49:46] 10Operations, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): some elasticsearch servers in eqiad have CPU overheating - https://phabricator.wikimedia.org/T168816#3467918 (10dcausse) We think that temp issues may exacerbate the load issues we see on the elasticsearch cluster in eqiad. Looking at... [20:59:10] !log banning elastic1027 after elastic1017 to move shards around [20:59:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:54] (03PS1) 10Reedy: Add a global email blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367537 [21:00:04] dapatrick, bawolff, and Reedy: Dear anthropoid, the time has come. Please deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T2100). [21:01:41] (03CR) 10Reedy: [C: 032] Add a global email blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367537 (owner: 10Reedy) [21:03:34] (03Merged) 10jenkins-bot: Add a global email blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367537 (owner: 10Reedy) [21:03:37] (03PS8) 10Andrew Bogott: Puppetmaster: Fix apache config ssldir [puppet] - 10https://gerrit.wikimedia.org/r/365053 [21:03:44] (03CR) 10jenkins-bot: Add a global email blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367537 (owner: 10Reedy) [21:04:44] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Add a global email blacklist (duration: 00m 43s) [21:04:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:11:21] !log unbanning elastic1027/elastic1017 [21:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:11:35] (03CR) 10Greg Grossmeier: "Missed today, how about adding it to Tuesday's puppet swat?" [puppet] - 10https://gerrit.wikimedia.org/r/364148 (https://phabricator.wikimedia.org/T103886) (owner: 10Giuseppe Lavagetto) [21:19:29] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] [21:20:10] Looks to be a lot of lua noise [21:20:30] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] [21:25:05] 10Operations, 10Services (doing), 10User-mobrovac: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3468042 (10mobrovac) [21:26:07] 10Operations, 10Services (doing), 10User-mobrovac: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3434983 (10mobrovac) I have managed to test all of the services running on SCB and they all work under NodeJS v6.11. We are ready to move on SCB. [21:26:38] Reedy: this one? https://phabricator.wikimedia.org/T166348 [21:27:01] Nope [21:27:01] 363 LuaSandboxFunction::call(): recursion detected in /srv/mediawiki/php-1.30.0-wmf.10/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 312 [21:28:47] Reedy: twentyafterfour reported https://phabricator.wikimedia.org/T168898 before, which brad dupe'd into https://phabricator.wikimedia.org/T166348 :) [21:29:19] lol [21:41:01] (03PS1) 10Catrope: Enable ORES on sqwiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367613 [21:41:03] (03PS1) 10Catrope: Remove wgOresDamagingThresholds settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367614 [21:46:13] (03PS2) 10Catrope: Enable ORES on sqwiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367613 (https://phabricator.wikimedia.org/T170723) [21:49:41] (03PS8) 10Jforrester: Remove compact language links dblist for simplicity (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [21:50:08] (03PS9) 10Jforrester: Remove compact language links dblist for simplicity (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [21:50:35] (03CR) 10Jforrester: "PS8: Simplified a little more, clarified title; PS9: Rebase." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [21:53:16] (03CR) 10Jforrester: [C: 04-1] Remove compact language links dblist for simplicity (no-op) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [21:54:59] herron: around? [21:59:16] (03PS3) 10Ayounsi: Remove DNS records for unused IPs [dns] - 10https://gerrit.wikimedia.org/r/366871 [21:59:49] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [22:00:18] (03CR) 10Ayounsi: [C: 032] Remove DNS records for unused IPs (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/366871 (owner: 10Ayounsi) [22:08:28] (03PS1) 10Ayounsi: Add pfw3a/b-codfw mgmt interfaces to DNS [dns] - 10https://gerrit.wikimedia.org/r/367616 (https://phabricator.wikimedia.org/T169643) [22:09:17] (03CR) 10Ayounsi: [C: 032] Add pfw3a/b-codfw mgmt interfaces to DNS [dns] - 10https://gerrit.wikimedia.org/r/367616 (https://phabricator.wikimedia.org/T169643) (owner: 10Ayounsi) [22:18:19] PROBLEM - MegaRAID on labsdb1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [22:18:21] ACKNOWLEDGEMENT - MegaRAID on labsdb1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T171538 [22:18:24] 10Operations, 10ops-eqiad: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468270 (10ops-monitoring-bot) [22:29:14] 10Operations, 10ops-eqiad, 10Data-Services: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468303 (10bd808) [22:32:41] 10Operations, 10ops-eqiad, 10Data-Services: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468270 (10jcrespo) I have no idea why things didn't explode here. [22:34:32] 10Operations, 10ops-eqiad, 10Data-Services: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468330 (10jcrespo) I would have ready the labsdb1001 depool patch just in case. [22:36:47] (03PS1) 10Andrew Bogott: puppetmaster profiles: Fix some really extreme typos [puppet] - 10https://gerrit.wikimedia.org/r/367619 [22:38:12] (03CR) 10Andrew Bogott: [C: 032] puppetmaster profiles: Fix some really extreme typos [puppet] - 10https://gerrit.wikimedia.org/r/367619 (owner: 10Andrew Bogott) [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T2300). Please do the needful. [23:00:04] RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:47] I'll do the SWAT myself [23:00:57] (03PS3) 10Catrope: Enable ORES on sqwiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367613 (https://phabricator.wikimedia.org/T170723) [23:01:01] (03CR) 10Catrope: [C: 032] Enable ORES on sqwiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367613 (https://phabricator.wikimedia.org/T170723) (owner: 10Catrope) [23:01:04] (03PS2) 10Catrope: Remove wgOresDamagingThresholds settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367614 [23:01:08] (03CR) 10Catrope: [C: 032] Remove wgOresDamagingThresholds settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367614 (owner: 10Catrope) [23:02:48] mooeypoo: When poking around in the RC code to deal with days/hours, I randomly discovered this: https://github.com/wikimedia/mediawiki/blob/master/includes/DefaultSettings.php#L6678 [23:02:51] (03Merged) 10jenkins-bot: Enable ORES on sqwiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367613 (https://phabricator.wikimedia.org/T170723) (owner: 10Catrope) [23:02:53] (03Merged) 10jenkins-bot: Remove wgOresDamagingThresholds settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367614 (owner: 10Catrope) [23:03:05] (03CR) 10jenkins-bot: Enable ORES on sqwiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367613 (https://phabricator.wikimedia.org/T170723) (owner: 10Catrope) [23:03:07] (03CR) 10jenkins-bot: Remove wgOresDamagingThresholds settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367614 (owner: 10Catrope) [23:08:30] (03PS1) 10Andrew Bogott: puppetmaster frontend profile: Allow hiera to configure the hostname [puppet] - 10https://gerrit.wikimedia.org/r/367621 [23:11:53] (03PS1) 10Catrope: Add missing max value to rowiki ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367622 [23:12:06] (03PS2) 10Catrope: Add missing max value to rowiki ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367622 [23:12:14] (03CR) 10Catrope: [C: 032] Add missing max value to rowiki ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367622 (owner: 10Catrope) [23:13:29] 10Operations, 10ops-eqiad, 10Data-Services: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468441 (10chasemp) p:05Triage>03Unbreak! a:03Cmjohnson I think this must be one of the two RAID1 drives for the OS itself rather than a drive in the RAID0 data array. We should really g... [23:13:41] (03Merged) 10jenkins-bot: Add missing max value to rowiki ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367622 (owner: 10Catrope) [23:15:09] 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team (Kanban): Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468445 (10chasemp) [23:16:01] (03CR) 10jenkins-bot: Add missing max value to rowiki ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367622 (owner: 10Catrope) [23:22:36] 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team (Kanban): Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468451 (10chasemp) hmm ```# cat /proc/mdstat Personalities : unused devices: ``` [23:22:51] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable ORES on sqwiki and rowiki (T170723) (duration: 00m 44s) [23:23:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:00] T170723: Deploy ORES Review Tool & ORES-based RCFilters for Romanian & Albanian Wikipedia - https://phabricator.wikimedia.org/T170723 [23:37:35] (03PS1) 10Rush: DON'T MERGE: labsdb: in case labsdb1001 falls over [puppet] - 10https://gerrit.wikimedia.org/r/367625 (https://phabricator.wikimedia.org/T171538) [23:42:30] RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 7988 [23:49:59] (03PS1) 10Ayounsi: Add fasw-c-codfw mgmt interfaces to DNS [dns] - 10https://gerrit.wikimedia.org/r/367629 (https://phabricator.wikimedia.org/T169643) [23:52:29] (03CR) 10Ayounsi: [C: 032] Add fasw-c-codfw mgmt interfaces to DNS [dns] - 10https://gerrit.wikimedia.org/r/367629 (https://phabricator.wikimedia.org/T169643) (owner: 10Ayounsi)