[00:07:01] (03PS1) 10Chad: Rolling reviewers back to 411a516 [software/gerrit/gerrit] (wmf/stable-2.14) - 10https://gerrit.wikimedia.org/r/421449 [00:07:10] (03CR) 10Chad: [V: 032 C: 032] Rolling reviewers back to 411a516 [software/gerrit/gerrit] (wmf/stable-2.14) - 10https://gerrit.wikimedia.org/r/421449 (owner: 10Chad) [00:07:38] RECOVERY - Disk space on elastic1028 is OK: DISK OK [00:24:09] (03PS2) 10Dzahn: Abstract Gerrit's public key out of gerrit::jetty [puppet] - 10https://gerrit.wikimedia.org/r/416201 (owner: 10Chad) [00:25:22] (03CR) 10Dzahn: [C: 032] "https://puppet-compiler.wmflabs.org/compiler03/10602/cobalt.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/416201 (owner: 10Chad) [00:29:35] (03PS2) 10Dzahn: Gerrit: Make project deletion less destructive [puppet] - 10https://gerrit.wikimedia.org/r/417183 (owner: 10Chad) [00:56:28] 10Operations, 10HHVM, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4074700 (10Liuxinyu970226) [00:58:37] Krinkle: hey, around? [00:59:39] Krinkle: just wanted to say, there is no need to get data and check it against revision table because the "auto" part exists in log_params [00:59:53] I can simply add it to deletePatrolLogs [01:13:29] PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 62107 MB (12% inode=99%) [01:14:56] Amir1: Hey, not sure how patrol relates to revision yeeah, there's no patrol info there. [01:15:35] for recent things, it's in recent_changes (if you figure out same oldid and user and user has autopatrol), but that seems fragile. [01:15:40] Thee data is in log_params indeed. [01:15:48] Special:Log also knows how to format the old data [01:15:57] and we have auto:1/0 in the new fields still as well, [01:15:58] Krinkle: no, my plan was to write a mainataince script to get curid from log_params and check it against revison table and if the users are the same, I consider them auto patrol [01:16:08] but the data is already there [01:16:08] https://github.com/wikimedia/mediawiki/blob/master/includes/logging/PatrolLog.php#L87 [01:16:12] Amir1: Oh, yeah, that isn't needed. [01:16:32] patrol/patrol < 2017 with unserialise(log_params)['auto']==true [01:16:46] I'm updating my patch to reflect that [01:19:02] Cool [01:22:07] Krinkle: https://gerrit.wikimedia.org/r/#/c/420214/10..11/maintenance/deleteAutoPatrolLogs.php,unified [01:26:15] (03PS1) 10EBernhardson: Add home dir files for ebernhardson [puppet] - 10https://gerrit.wikimedia.org/r/421461 [01:27:16] (03PS1) 10Dzahn: icinga/screen_monitoring: display user name [puppet] - 10https://gerrit.wikimedia.org/r/421462 (https://phabricator.wikimedia.org/T181409) [01:30:00] (03PS2) 10Dzahn: icinga/screen_monitoring: display user name [puppet] - 10https://gerrit.wikimedia.org/r/421462 (https://phabricator.wikimedia.org/T181409) [01:32:28] (03CR) 10Dzahn: [C: 032] Add home dir files for ebernhardson [puppet] - 10https://gerrit.wikimedia.org/r/421461 (owner: 10EBernhardson) [01:33:54] (03PS3) 10Dzahn: icinga/screen_monitoring: display user name [puppet] - 10https://gerrit.wikimedia.org/r/421462 (https://phabricator.wikimedia.org/T181409) [01:35:37] (03CR) 10Dzahn: [C: 032] icinga/screen_monitoring: display user name [puppet] - 10https://gerrit.wikimedia.org/r/421462 (https://phabricator.wikimedia.org/T181409) (owner: 10Dzahn) [01:36:40] (03PS1) 10Chad: Gerrit 2.14.7-9-g0f04397dbd, plus some plugins [software/gerrit] (stable-2.14) - 10https://gerrit.wikimedia.org/r/421463 [01:37:20] (03CR) 10Chad: "Still need to archiva these." [software/gerrit] (stable-2.14) - 10https://gerrit.wikimedia.org/r/421463 (owner: 10Chad) [01:38:49] PROBLEM - puppet last run on mw2214 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/ebernhardson/.vimrc] [01:39:05] (03PS3) 10Dzahn: Gerrit: Make project deletion less destructive [puppet] - 10https://gerrit.wikimedia.org/r/417183 (owner: 10Chad) [01:39:15] (03PS4) 10Dzahn: Gerrit: Make project deletion less destructive [puppet] - 10https://gerrit.wikimedia.org/r/417183 (owner: 10Chad) [01:39:19] (03CR) 10Dzahn: [C: 032] Gerrit: Make project deletion less destructive [puppet] - 10https://gerrit.wikimedia.org/r/417183 (owner: 10Chad) [01:40:38] PROBLEM - puppet last run on mw2244 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/home/ebernhardson/.bashrc],File[/home/ebernhardson/.vimrc] [01:40:50] aaww.wut [01:42:16] ok, it works fine [01:43:01] (03PS3) 10Tim Starling: mediawiki: Fix preg_match bug in furl causing bad redirect and E_NOTICE [puppet] - 10https://gerrit.wikimedia.org/r/420601 (owner: 10Krinkle) [01:43:04] just an unlucky one that fixes itself on next run [01:43:15] (03CR) 10Tim Starling: [C: 032] mediawiki: Fix preg_match bug in furl causing bad redirect and E_NOTICE [puppet] - 10https://gerrit.wikimedia.org/r/420601 (owner: 10Krinkle) [01:43:27] no_justification: the plugin change is also on cobalt now but i'm not restarting [01:43:40] Oh, I said let's wait on that one.... [01:43:43] But ok [01:43:56] oh, i never saw that [01:44:56] i think that got lost in netsplit or something? [01:45:29] RECOVERY - puppet last run on mw2244 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [01:46:03] (03PS2) 10Tim Starling: In furl use /usr/bin/php instead of php5 [puppet] - 10https://gerrit.wikimedia.org/r/391748 [01:46:35] (03CR) 10Tim Starling: [C: 032] In furl use /usr/bin/php instead of php5 [puppet] - 10https://gerrit.wikimedia.org/r/391748 (owner: 10Tim Starling) [01:48:17] (03PS1) 10Dzahn: Revert "Gerrit: Make project deletion less destructive" [puppet] - 10https://gerrit.wikimedia.org/r/421465 [01:48:36] (03PS2) 10Dzahn: Revert "Gerrit: Make project deletion less destructive" [puppet] - 10https://gerrit.wikimedia.org/r/421465 [01:49:43] (03CR) 10Dzahn: [C: 032] Revert "Gerrit: Make project deletion less destructive" [puppet] - 10https://gerrit.wikimedia.org/r/421465 (owner: 10Dzahn) [01:54:48] (03PS3) 10Tim Starling: In furl use /usr/bin/php instead of php5 [puppet] - 10https://gerrit.wikimedia.org/r/391748 [01:57:21] (03CR) 10Krinkle: [C: 031] "Assuming the the same config is applied in Beta, we'll also be able to test it there. (Or cherry-pick ahead of time)" [puppet] - 10https://gerrit.wikimedia.org/r/411522 (owner: 10Tim Starling) [01:58:09] PROBLEM - Disk space on elastic1030 is CRITICAL: DISK CRITICAL - free space: /srv 61974 MB (12% inode=99%) [02:00:38] PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 62278 MB (12% inode=99%) [02:03:48] RECOVERY - puppet last run on mw2214 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [02:11:18] RECOVERY - Disk space on elastic1030 is OK: DISK OK [02:13:13] (03CR) 10Chad: "Cherry pick to beta, never merge to prod ;-)" [puppet] - 10https://gerrit.wikimedia.org/r/411522 (owner: 10Tim Starling) [02:19:58] PROBLEM - Disk space on elastic1044 is CRITICAL: DISK CRITICAL - free space: /srv 85782 MB (12% inode=99%) [02:31:48] PROBLEM - Disk space on elastic1038 is CRITICAL: DISK CRITICAL - free space: /srv 85746 MB (12% inode=99%) [02:51:48] RECOVERY - Disk space on elastic1038 is OK: DISK OK [02:56:58] RECOVERY - Disk space on elastic1044 is OK: DISK OK [03:05:09] PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /srv 62069 MB (12% inode=99%) [03:11:09] PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /srv 61706 MB (12% inode=99%) [03:11:48] PROBLEM - Disk space on elastic1045 is CRITICAL: DISK CRITICAL - free space: /srv 85904 MB (12% inode=99%) [03:19:09] PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /srv 61899 MB (12% inode=99%) [03:24:38] (03CR) 10Tim Starling: "It turned out to be complicated to test on beta, since there are beta-specific conflicting rewrite rules. Beta's sections al" [puppet] - 10https://gerrit.wikimedia.org/r/411522 (owner: 10Tim Starling) [03:27:08] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 824.98 seconds [03:34:12] (03CR) 10Tim Starling: "Not much point testing on beta when hardly any of the apache configuration is shared." [puppet] - 10https://gerrit.wikimedia.org/r/411522 (owner: 10Tim Starling) [03:52:18] RECOVERY - Disk space on elastic1017 is OK: DISK OK [03:54:38] PROBLEM - Disk space on elastic1035 is CRITICAL: DISK CRITICAL - free space: /srv 86211 MB (12% inode=99%) [03:56:39] PROBLEM - Disk space on elastic1032 is CRITICAL: DISK CRITICAL - free space: /srv 86728 MB (12% inode=99%) [03:56:48] RECOVERY - Disk space on elastic1018 is OK: DISK OK [04:01:39] RECOVERY - Disk space on elastic1032 is OK: DISK OK [04:02:09] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 267.01 seconds [04:23:58] RECOVERY - Disk space on elastic1045 is OK: DISK OK [04:38:28] PROBLEM - Disk space on elastic1019 is CRITICAL: DISK CRITICAL - free space: /srv 62095 MB (12% inode=99%) [04:38:39] PROBLEM - Disk space on elastic1022 is CRITICAL: DISK CRITICAL - free space: /srv 61204 MB (12% inode=99%) [04:44:29] RECOVERY - Disk space on elastic1019 is OK: DISK OK [04:45:48] RECOVERY - Disk space on elastic1035 is OK: DISK OK [04:49:48] RECOVERY - Disk space on elastic1022 is OK: DISK OK [04:58:16] (03PS1) 10BryanDavis: toolforge: Add missing *.wikipedia.org to CSP policy [puppet] - 10https://gerrit.wikimedia.org/r/421472 (https://phabricator.wikimedia.org/T130748) [05:15:55] bd808: Don't forget things like blob: and data: [05:18:39] err, you already had data: i missed it [05:27:40] 10Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 10Research, and 3 others: Request access to data for Wikimedia Donation Patterns research - https://phabricator.wikimedia.org/T188945#4074900 (10DYNKM) Nope, happening again. Query draft: ``` hive -e " USE wmf; SELECT dt AS timestamp, md5(conc... [05:33:16] (03CR) 10Brian Wolff: "You maybe also want things like blob: filesystem: mediastream: wss://tools.wmflabs.org (I think wss: is not included with 'self' in some v" [puppet] - 10https://gerrit.wikimedia.org/r/421472 (https://phabricator.wikimedia.org/T130748) (owner: 10BryanDavis) [06:05:26] (03PS1) 10Krinkle: nagios: Remove 'krinkle' from cloud/cvn contact group [puppet] - 10https://gerrit.wikimedia.org/r/421475 [06:06:59] (03CR) 10Krinkle: [C: 031] "Hm.. I thought we consolidated most of it by now. I certainly wouldn't expect that kind of variance which seems rather significant and als" [puppet] - 10https://gerrit.wikimedia.org/r/411522 (owner: 10Tim Starling) [06:09:45] 10Operations, 10ops-eqiad, 10DBA: db1052 (s1 master) disks with lots of predictive failure errors - https://phabricator.wikimedia.org/T190301#4074910 (10Marostegui) 05Open>03Resolved a:03Cmjohnson All good now! ``` root@db1052:~# megacli -LDPDInfo -aAll | egrep -i "slot|error|failure count|s.m.a.r.t"... [06:10:21] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1052 - https://phabricator.wikimedia.org/T190446#4074914 (10Marostegui) 05Open>03Resolved a:03Marostegui This was part of a controlled replacement of disks with predictive failure. It is all good now ``` root@db1052:~# megacli -LDPDInfo -aAll Adap... [06:59:13] 10Operations, 10ops-eqiad: install conf1004-6 ssd upgrades - https://phabricator.wikimedia.org/T190230#4074943 (10Joe) [06:59:16] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#4074944 (10Joe) [07:33:43] (03CR) 10Giuseppe Lavagetto: [C: 031] "https://puppet-compiler.wmflabs.org/compiler03/10603/" [puppet] - 10https://gerrit.wikimedia.org/r/394977 (owner: 10ArielGlenn) [07:58:58] 10Operations, 10Analytics, 10DBA, 10EventBus, and 7 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4075020 (10jcrespo) From my point of view, the problem has gone: https://grafana-admin.wikimedia.org/dashboard/db/mysql-aggregated?panelId=9&full... [08:09:53] 10Operations, 10Analytics, 10DBA, 10EventBus, and 7 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4075026 (10jcrespo) I would even dare to say the baseline is lower: https://grafana.wikimedia.org/dashboard/db/jobqueue-eventbus?orgId=1&from=152... [08:17:57] (03PS37) 10ArielGlenn: php7 manifests for mediawiki on stretch [puppet] - 10https://gerrit.wikimedia.org/r/394977 [08:18:33] <_joe_> I'll disable puppet fleet-wide on servers potentially impacted [08:19:25] ack [08:19:34] !log reboot eventlog1001 for kernel upgrades [08:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:06] (03CR) 10ArielGlenn: [C: 032] php7 manifests for mediawiki on stretch [puppet] - 10https://gerrit.wikimedia.org/r/394977 (owner: 10ArielGlenn) [08:21:45] 10Operations, 10HHVM, 10Patch-For-Review: Long running mediawiki web requests impacts service availability, specially databases - https://phabricator.wikimedia.org/T149421#4075053 (10jcrespo) [08:23:14] (03PS1) 10Marostegui: mysql-core_eqiad: Move db1067 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/421477 [08:23:48] (03PS2) 10Marostegui: mysql-core_eqiad: Move db1067 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/421477 [08:24:12] (03CR) 10Jcrespo: [C: 031] mysql-core_eqiad: Move db1067 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/421477 (owner: 10Marostegui) [08:24:37] (03CR) 10Marostegui: [C: 032] mysql-core_eqiad: Move db1067 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/421477 (owner: 10Marostegui) [08:30:29] (03PS1) 10Urbanecm: Allow to import from the French Wiktionary to Incubator [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421480 (https://phabricator.wikimedia.org/T190445) [08:31:38] PROBLEM - puppet last run on mw1261 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:32:01] (03CR) 10Vgutierrez: [C: 031] "pcc is happy & old dashboards are gone. LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/421338 (https://phabricator.wikimedia.org/T184942) (owner: 10Ema) [08:32:43] (03PS1) 10Giuseppe Lavagetto: mediawiki::php: re-add mail.ini to jessie [puppet] - 10https://gerrit.wikimedia.org/r/421481 [08:32:47] <_joe_> apergos: ^^ [08:33:58] (03PS8) 10Urbanecm: Initial configuration for lfnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400234 (https://phabricator.wikimedia.org/T183561) [08:34:00] (03PS9) 10Urbanecm: Initial configuration for inhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402658 (https://phabricator.wikimedia.org/T184374) [08:34:02] (03PS9) 10Urbanecm: Initial configuration for romdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/412902 (https://phabricator.wikimedia.org/T187184) [08:34:04] (03PS5) 10Urbanecm: Initial configuration for gorwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416930 (https://phabricator.wikimedia.org/T189109) [08:34:05] do you mind putting it at the top right after the php_enmod line? [08:34:06] (03PS4) 10Urbanecm: Initial configuration for hiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417201 (https://phabricator.wikimedia.org/T188366) [08:34:08] (03PS5) 10Urbanecm: Initial configuration for euwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419171 [08:34:10] (03CR) 10Muehlenhoff: [C: 031] "Agreed, that should fix it." [puppet] - 10https://gerrit.wikimedia.org/r/421481 (owner: 10Giuseppe Lavagetto) [08:34:12] _joe_: [08:34:47] * apergos closes their version of the file with this exact change [08:34:47] <_joe_> apergos: yes? [08:34:51] (10:34:05 πμ) apergos: do you mind putting it at the top right after the php_enmod line? [08:34:52] <_joe_> ahah ok [08:35:09] <_joe_> well, actually [08:35:39] <_joe_> we have to distinguish between php5 and php7 there [08:36:05] puppet is still failing on deploy1001, but it's probably unrelated and specific to the usage on deployment servers: [08:36:08] ah yes, so a new stanza in the php7 one [08:36:09] meh [08:36:16] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Duplicate declaration: Package[php-luasandbox] is already declared in file /etc/puppet/modules/mediawiki/manifests/packages/php7.pp:31; cannot redeclare at /etc/puppet/modules/mediawiki/manifests/packages/php5.pp:23 at /etc/puppet/modules/mediawiki/manifests/packages/php5. [08:36:17] pp:23:5 on node deploy1001.eqiad.wmnet [08:36:19] Warning: Not using cache on failed catalog [08:36:29] although, no [08:36:36] <_joe_> moritzm: uhm that looks... broken [08:36:40] <_joe_> but in a different way [08:36:46] <_joe_> let's first fix the rest [08:36:49] yeah, looking into it [08:36:51] <_joe_> then we can think of deploy1001 [08:36:55] ack [08:36:56] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::php: re-add mail.ini to jessie [puppet] - 10https://gerrit.wikimedia.org/r/421481 (owner: 10Giuseppe Lavagetto) [08:39:48] <_joe_> labweb works [08:40:01] <_joe_> appservers with jessie are now ok [08:40:42] (03PS1) 10ArielGlenn: make sure mail.ini is gone for php7 as well as php5 [puppet] - 10https://gerrit.wikimedia.org/r/421482 [08:40:53] deploy is a problem in role::deployment::mediawiki, will fix later (it was broken before as well, now it's at least less broken) [08:40:54] what do folks think of this? [08:40:59] (03CR) 10jerkins-bot: [V: 04-1] make sure mail.ini is gone for php7 as well as php5 [puppet] - 10https://gerrit.wikimedia.org/r/421482 (owner: 10ArielGlenn) [08:41:06] (deploy1001 I meant) [08:41:07] well thank you jenkins [08:41:27] <_joe_> apergos: that would fail [08:41:37] <_joe_> it's a define, so can be called multiple times [08:41:38] RECOVERY - puppet last run on mw1261 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:41:43] ah of course [08:41:44] <_joe_> that would fail with a duplicate definition [08:41:52] <_joe_> it's ok the way it is for now [08:41:53] so we really have to have them in the manifest above [08:41:54] meh [08:42:14] <_joe_> I'm reenabling puppet on all servers with jessie [08:42:22] <_joe_> well, the appservers [08:42:49] oh you have already merged a patch [08:43:00] <_joe_> moritzm: are the videoscalers validated too? [08:43:30] (03Abandoned) 10ArielGlenn: make sure mail.ini is gone for php7 as well as php5 [puppet] - 10https://gerrit.wikimedia.org/r/421482 (owner: 10ArielGlenn) [08:43:31] it seems good to me, but since 8:22 there haven't been any new scaling jobs [08:43:43] that's not uncommon, though [08:44:00] <_joe_> yeah [08:44:01] I'll keep an eye on it, but I think we can re-enable puppet [08:44:10] <_joe_> did you restart the jobrunner service as a test? [08:44:17] <_joe_> but anyways, not critical [08:44:31] <_joe_> I'm reenabling puppet everywhere but on the few hosts I didn't check [08:44:50] <_joe_> apergos: can you check the snapshots? [08:44:59] I'm looking at newly uploaded viseos to commons and yeah there ain't much going on is the problem [08:45:07] sure I'll take care of those [08:45:56] I just restarted jobrunner and jobchron on mw1307, that seems fine as well [08:46:34] <_joe_> ok cool [08:46:49] <_joe_> I'll look at tin/naos, wasat/terbium now [08:47:53] I'm enabling puppet on the video scalers [08:48:04] <_joe_> moritzm: already did :) [08:48:10] ah, thanks :-) [08:48:12] <_joe_> 08:45 <@moritzm> I just restarted jobrunner and jobchron on mw1307, that seems fine as well │········ [08:48:15] the one snap is fine, doing the others [08:48:16] <_joe_> here :P [08:49:01] good :-) I'll make a fix to fix deploy1001 next [08:49:28] <_joe_> on the long run, I want to create a "php" module, much like we have a hhvm module [08:50:03] <_joe_> I also peeked at the voxpopuli/php puppet module, it seems fairly well written, but I'm not sure it will adapt 1:1 to our needs [08:50:21] <_joe_> but I would love to be able to use an upstream module and contribute to it maybe [08:50:54] sure [08:51:26] <_joe_> that's btw what the whole module/profile thing is all about [08:51:59] <_joe_> create modules in a structure that's less entangled, and more usable [08:52:06] <_joe_> by third parties [08:52:17] <_joe_> and be able to reuse third-party modules [08:52:27] (03PS1) 10Muehlenhoff: Use PHP 7 for deployment servers on stretch [puppet] - 10https://gerrit.wikimedia.org/r/421483 (https://phabricator.wikimedia.org/T175288) [08:52:55] (03CR) 10jerkins-bot: [V: 04-1] Use PHP 7 for deployment servers on stretch [puppet] - 10https://gerrit.wikimedia.org/r/421483 (https://phabricator.wikimedia.org/T175288) (owner: 10Muehlenhoff) [08:52:59] oh my, so that's how it included both [08:53:05] I was wonering [08:53:08] +d [08:53:19] mw1307 now also correctly transcoded a file [08:53:24] <_joe_> cool [08:53:30] <_joe_> I didn't have may doubts [09:01:16] (03PS2) 10Muehlenhoff: Use PHP 7 for deployment servers on stretch [puppet] - 10https://gerrit.wikimedia.org/r/421483 (https://phabricator.wikimedia.org/T175288) [09:01:34] hahahahahaha [09:01:53] there's several includes there the just need to be in a profile [09:01:59] *that just [09:02:38] <_joe_> generally, our puppet code around mediawiki needs a refactor [09:02:48] preaching to the chooooiiiiiir [09:02:51] <_joe_> esp the uses outside the mw appservers are sometimes outright evil [09:04:25] <_joe_> ok, I think we're done here [09:04:50] well as soon as the deployment1001 patch goes out and we know everything is well [09:05:00] yeah, it all looks fine [09:05:10] great [09:05:23] apergos: mind a quick review for ^ ? [09:06:28] (03CR) 10ArielGlenn: [C: 031] Use PHP 7 for deployment servers on stretch [puppet] - 10https://gerrit.wikimedia.org/r/421483 (https://phabricator.wikimedia.org/T175288) (owner: 10Muehlenhoff) [09:08:49] I kind of think you might not have to have that stanza at all [09:10:10] that it's picked up from include ::mediawiki -> ::mediawiki::php -> ::mediawiki::packages::php5 or ::mediawiki::packages::php7 [09:10:13] moritzm: [09:10:25] and that's where we got the conflict from [09:10:49] ah, let me check [09:15:11] well, not quite. the deployment servers need the full set of PHP packages, so they explicitly include it, what's being installed by mediawiki::php only pulls them in for trusty and stretch [09:15:38] (03PS1) 10Alexandros Kosiaris: Give tiller the right to manage network policies [deployment-charts] - 10https://gerrit.wikimedia.org/r/421484 (https://phabricator.wikimedia.org/T184923) [09:15:40] (03PS1) 10Alexandros Kosiaris: Annotate namespace with a default deny policy [deployment-charts] - 10https://gerrit.wikimedia.org/r/421485 (https://phabricator.wikimedia.org/T184923) [09:15:41] so I'll limit the os_version check to jessie to retain the status quo [09:15:42] (03PS1) 10Alexandros Kosiaris: WIP Add network policy objects to the helm charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/421486 (https://phabricator.wikimedia.org/T184923) [09:16:51] ah because jessie *groan* [09:17:26] yeah, we've had some non-intuitive OS assumptions in our classes :-) [09:17:41] s/OS// [09:17:56] (03PS2) 10Elukey: Recent versions of librdkafka allow to negociate API versions [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/318068 (owner: 10R4q3NWnUx2CEhVyr) [09:19:31] (03CR) 10Giuseppe Lavagetto: [C: 031] Give tiller the right to manage network policies [deployment-charts] - 10https://gerrit.wikimedia.org/r/421484 (https://phabricator.wikimedia.org/T184923) (owner: 10Alexandros Kosiaris) [09:20:58] (03PS3) 10Muehlenhoff: Restrict inclusion of full PHP packages on deployment servers to jessie [puppet] - 10https://gerrit.wikimedia.org/r/421483 (https://phabricator.wikimedia.org/T175288) [09:22:39] want my +1 on that or are you fine to just merge? [09:25:20] otI'll merge [09:25:32] it's okay, I'll merge after doublechecking with PCC [09:26:29] (03CR) 10Giuseppe Lavagetto: "one doubt: we actually want to block outgoing connections to unauthorized places more than we want to block incoming connections, right? I" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/421485 (https://phabricator.wikimedia.org/T184923) (owner: 10Alexandros Kosiaris) [09:31:16] (03PS4) 10Muehlenhoff: Restrict inclusion of full PHP packages on deployment servers to jessie [puppet] - 10https://gerrit.wikimedia.org/r/421483 (https://phabricator.wikimedia.org/T175288) [09:34:02] (03CR) 10Elukey: "Hi! While working on https://phabricator.wikimedia.org/T166833 I noticed that this api would be needed in order to add a custom timestamp " [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/318068 (owner: 10R4q3NWnUx2CEhVyr) [09:34:31] (03Abandoned) 10Elukey: Recent versions of librdkafka allow to negociate API versions [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/318068 (owner: 10R4q3NWnUx2CEhVyr) [09:40:27] (03CR) 10Muehlenhoff: [C: 032] Restrict inclusion of full PHP packages on deployment servers to jessie [puppet] - 10https://gerrit.wikimedia.org/r/421483 (https://phabricator.wikimedia.org/T175288) (owner: 10Muehlenhoff) [09:49:23] !log armed keyholder on deploy1001 [09:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:34] (03PS1) 10ArielGlenn: Remove the slow-parse logs dataset cleanup job [puppet] - 10https://gerrit.wikimedia.org/r/421489 (https://phabricator.wikimedia.org/T189284) [10:00:44] !log installing plexus-utils2 security updates [10:00:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:21] (03PS1) 10Elukey: Apply some consistency to source code formatting [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/421490 [10:06:23] (03CR) 10ArielGlenn: [C: 032] Remove the slow-parse logs dataset cleanup job [puppet] - 10https://gerrit.wikimedia.org/r/421489 (https://phabricator.wikimedia.org/T189284) (owner: 10ArielGlenn) [10:10:35] (03PS2) 10Elukey: Apply some consistency to source code formatting [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/421490 [10:12:05] (03PS1) 10Muehlenhoff: Various updates to Cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/421493 [10:12:43] (03CR) 10jerkins-bot: [V: 04-1] Various updates to Cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/421493 (owner: 10Muehlenhoff) [10:14:38] (03PS2) 10Muehlenhoff: Various updates to Cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/421493 [10:16:58] (03CR) 10Muehlenhoff: [C: 032] Various updates to Cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/421493 (owner: 10Muehlenhoff) [10:17:04] (03PS3) 10Muehlenhoff: Various updates to Cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/421493 [10:17:43] (03PS5) 10ArielGlenn: Store all dataset/dumps mirrors info in one hiera structure, and use it [puppet] - 10https://gerrit.wikimedia.org/r/419390 (https://phabricator.wikimedia.org/T189657) [10:18:31] (03CR) 10jerkins-bot: [V: 04-1] Store all dataset/dumps mirrors info in one hiera structure, and use it [puppet] - 10https://gerrit.wikimedia.org/r/419390 (https://phabricator.wikimedia.org/T189657) (owner: 10ArielGlenn) [10:22:14] !log restarting apache on krypton to pick up curl security update [10:22:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:43] (03PS6) 10ArielGlenn: Store all dataset/dumps mirrors info in one hiera structure, and use it [puppet] - 10https://gerrit.wikimedia.org/r/419390 (https://phabricator.wikimedia.org/T189657) [10:23:20] (03CR) 10jerkins-bot: [V: 04-1] Store all dataset/dumps mirrors info in one hiera structure, and use it [puppet] - 10https://gerrit.wikimedia.org/r/419390 (https://phabricator.wikimedia.org/T189657) (owner: 10ArielGlenn) [10:25:31] (03PS7) 10ArielGlenn: Store all dataset/dumps mirrors info in one hiera structure, and use it [puppet] - 10https://gerrit.wikimedia.org/r/419390 (https://phabricator.wikimedia.org/T189657) [10:28:42] (03PS1) 10Jcrespo: mariadb: Increase db1072 weight to compensate for higher s3 load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421498 [10:36:10] !log upload cassandra2.2.6-wmf3 to jessie/stretch-wikimedia -C component/cassandra22 - T189529 [10:36:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:15] T189529: Test/upload new cassandra 2.2.6 package (wmf3) - https://phabricator.wikimedia.org/T189529 [10:43:32] (03CR) 10Jcrespo: [C: 032] mariadb: Increase db1072 weight to compensate for higher s3 load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421498 (owner: 10Jcrespo) [10:44:50] (03Merged) 10jenkins-bot: mariadb: Increase db1072 weight to compensate for higher s3 load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421498 (owner: 10Jcrespo) [10:48:37] (03CR) 10jenkins-bot: mariadb: Increase db1072 weight to compensate for higher s3 load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421498 (owner: 10Jcrespo) [10:49:21] !log ladsgroup@tin Synchronized wmf-config/Wikibase-production.php: [[gerrit:421333|Disable reading wb_terms search fields on wikidata (T189777)]] (duration: 00m 59s) [10:49:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:27] T189777: Disable reading from term_search_key from wb_terms table in wikidata - https://phabricator.wikimedia.org/T189777 [10:49:41] ^ This was a leftover from last night SWAT (I forgot to rebase and just deployed) [10:49:49] jynus: Can you double check? [10:50:17] yeah, I can see it now [10:50:28] I use https://noc.wikimedia.org/conf/highlight.php?file=Wikibase-production.php&1 to double check [10:50:44] Thank you, let's monitor DB IO too, this might affect performance but that's very unlikely [10:50:46] the &1 is to get rid of cache [10:52:01] I am actually going to wait for deployment of mine- I was compensating for higher s3 load, an I don't want to affect monitoring [10:53:45] oh, sorry [10:55:17] (03PS1) 10Vgutierrez: prometheus: calculate varnish requests daily/weekly averages [puppet] - 10https://gerrit.wikimedia.org/r/421505 (https://phabricator.wikimedia.org/T184942) [10:59:02] 10Operations, 10Performance-Team (Radar), 10User-Elukey: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963#4075339 (10elukey) a:05elukey>03None [10:59:09] !log deployed new replication filter for labsdb1004 on u2815__p.all_articles T190488 [10:59:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:15] T190488: Possibly a big update going to: u2815__p`.`all_articles - https://phabricator.wikimedia.org/T190488 [11:07:33] (03PS1) 10Jcrespo: toolsdb: Deploy replication filter to avoid replica lag [puppet] - 10https://gerrit.wikimedia.org/r/421509 (https://phabricator.wikimedia.org/T190488) [11:08:04] (03CR) 10Jcrespo: [C: 032] toolsdb: Deploy replication filter to avoid replica lag [puppet] - 10https://gerrit.wikimedia.org/r/421509 (https://phabricator.wikimedia.org/T190488) (owner: 10Jcrespo) [11:09:23] !log restarting jvm daemons on analytics100[12] (Hadoop Masters) for openjdk-8 upgrade [11:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:18] all daemons restarted on 1001 [11:15:28] (should have been for analytics :) [11:19:20] !log installing libvorbis security updates on trusty (Debian already fixed) [11:19:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:54] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Increase db1072 weight (duration: 00m 59s) [11:46:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:44] 10Operations, 10MediaWiki-API, 10Traffic: Query API for rev props times out with an error message, but status is 200 OK - https://phabricator.wikimedia.org/T190410#4075420 (10BBlack) >>! In T190410#4074369, @Anomie wrote: >>>! In T190410#4073920, @BBlack wrote: >> What exactly is the client going to do diffe... [11:52:08] PROBLEM - Disk space on elastic1027 is CRITICAL: DISK CRITICAL - free space: /srv 61798 MB (12% inode=99%) [11:55:14] (03CR) 10Mark Bergsma: Fix Attribute.__eq__ and .__ne__ (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/420119 (owner: 10Mark Bergsma) [11:57:55] (03PS1) 10Muehlenhoff: Don't include mediawiki fonts list in toollabs::exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/421515 (https://phabricator.wikimedia.org/T190135) [12:01:05] (03PS4) 10Mark Bergsma: Fix MPReachNLRIAttribute AFI_INET construction from tuple [debs/pybal] - 10https://gerrit.wikimedia.org/r/420120 [12:01:48] (03CR) 10Vgutierrez: [C: 031] Fix Attribute.__eq__ and .__ne__ (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/420119 (owner: 10Mark Bergsma) [12:02:19] (03CR) 10Mark Bergsma: [C: 032] Fix Attribute.__eq__ and .__ne__ [debs/pybal] - 10https://gerrit.wikimedia.org/r/420119 (owner: 10Mark Bergsma) [12:02:42] (03CR) 10Mark Bergsma: Fix MPReachNLRIAttribute AFI_INET construction from tuple (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/420120 (owner: 10Mark Bergsma) [12:02:45] (03CR) 10Vgutierrez: [C: 031] Fix MPReachNLRIAttribute AFI_INET construction from tuple [debs/pybal] - 10https://gerrit.wikimedia.org/r/420120 (owner: 10Mark Bergsma) [12:02:47] (03Merged) 10jenkins-bot: Fix Attribute.__eq__ and .__ne__ [debs/pybal] - 10https://gerrit.wikimedia.org/r/420119 (owner: 10Mark Bergsma) [12:02:54] (03CR) 10Mark Bergsma: [C: 032] Fix MPReachNLRIAttribute AFI_INET construction from tuple [debs/pybal] - 10https://gerrit.wikimedia.org/r/420120 (owner: 10Mark Bergsma) [12:03:01] (03CR) 10Arturo Borrero Gonzalez: [C: 031] Don't include mediawiki fonts list in toollabs::exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/421515 (https://phabricator.wikimedia.org/T190135) (owner: 10Muehlenhoff) [12:03:22] (03Merged) 10jenkins-bot: Fix MPReachNLRIAttribute AFI_INET construction from tuple [debs/pybal] - 10https://gerrit.wikimedia.org/r/420120 (owner: 10Mark Bergsma) [12:03:49] 10Operations, 10HHVM, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4037934 (10Ladsgroup) >>! In T189295#4074637, @Bawolff wrote: > Another thing to watch out for, is that farsi wikis are using a hack to work around a bug in... [12:09:09] (03PS1) 10Arturo Borrero Gonzalez: labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) [12:12:52] (03CR) 10Arturo Borrero Gonzalez: [V: 04-1] "This produces some errors when doing compilation:" [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [12:13:20] (03PS1) 10Mark Bergsma: Update .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/421517 [12:14:08] RECOVERY - Disk space on elastic1027 is OK: DISK OK [12:14:37] (03CR) 10Mark Bergsma: [C: 032] Update .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/421517 (owner: 10Mark Bergsma) [12:15:12] (03Merged) 10jenkins-bot: Update .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/421517 (owner: 10Mark Bergsma) [12:20:02] (03Abandoned) 10Mark Bergsma: Add no-gravity configuration option [debs/pybal] - 10https://gerrit.wikimedia.org/r/187346 (https://phabricator.wikimedia.org/T86650) (owner: 10Mark Bergsma) [12:24:19] (03CR) 10Vgutierrez: [C: 031] Fix StubLVSService to use a set instead of a dict for .servers [debs/pybal] - 10https://gerrit.wikimedia.org/r/421052 (owner: 10Mark Bergsma) [12:29:31] (03PS2) 10Arturo Borrero Gonzalez: labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) [12:30:00] (03CR) 10jerkins-bot: [V: 04-1] labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [12:31:27] (03PS3) 10Arturo Borrero Gonzalez: labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) [12:31:54] (03CR) 10jerkins-bot: [V: 04-1] labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [12:32:08] PROBLEM - Disk space on elastic1020 is CRITICAL: DISK CRITICAL - free space: /srv 61163 MB (12% inode=99%) [12:34:16] (03PS4) 10Arturo Borrero Gonzalez: labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) [12:34:29] (03CR) 10Jcrespo: "ddl method seems to work, but pt-online-schema change doesn't. It also seems to do a test alter (with pt-online-schema-change?) even on dd" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/419725 (owner: 10Rduran) [12:34:41] (03CR) 10jerkins-bot: [V: 04-1] labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [12:36:29] (03PS5) 10Arturo Borrero Gonzalez: labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) [12:36:58] (03CR) 10jerkins-bot: [V: 04-1] labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [12:41:02] (03CR) 10Jcrespo: [C: 04-1] "While forcing the debug mode, I think it is clear where the error is:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/419725 (owner: 10Rduran) [12:41:14] (03PS1) 10Odder: Correct high-density logos for the Dutch Low Saxon Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421521 (https://phabricator.wikimedia.org/T190051) [12:42:55] (03CR) 10Jcrespo: "The changes to osc_host are pure refactoring and no functionality change? Should I have a look at it?" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/421340 (owner: 10Rduran) [12:43:28] (03CR) 10Jcrespo: [V: 032 C: 032] Add requirements.txt with pymysql and tabulate [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/419365 (owner: 10Rduran) [12:44:36] (03CR) 10Jcrespo: [C: 04-1] Add flake8 config and requirement (031 comment) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/420015 (owner: 10Rduran) [12:45:24] (03CR) 10Jcrespo: [C: 04-1] "The rest can go as is." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/420015 (owner: 10Rduran) [12:45:29] (03PS6) 10Arturo Borrero Gonzalez: labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) [12:45:58] (03CR) 10jerkins-bot: [V: 04-1] labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [12:47:17] (03CR) 10Jcrespo: [C: 04-1] "I see it is already fixed on the next one, I was reviewing in chronological order." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/420015 (owner: 10Rduran) [12:47:59] PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 51.01, 35.74, 28.73 [12:48:33] !log oblivian@puppetmaster2001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=api_appserver,service=apache2 [12:48:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:45] (03CR) 10Jcrespo: "This is ok, the problem is that all further changes are dependent on osc_host.py merge, and we cannot merge that to HEAD yet, because it w" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/420746 (owner: 10Rduran) [12:49:53] what is happening on mw1227? [12:50:08] (03PS7) 10Arturo Borrero Gonzalez: labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) [12:50:35] (03CR) 10jerkins-bot: [V: 04-1] labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [12:51:12] !log oblivian@puppetmaster2001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=apache2 [12:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:08] RECOVERY - Disk space on elastic1020 is OK: DISK OK [12:54:09] (03PS1) 10Sbisson: kartotherian/tilerator: set Last-Modified header [puppet] - 10https://gerrit.wikimedia.org/r/421522 (https://phabricator.wikimedia.org/T187300) [12:56:08] RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 14.08, 19.05, 23.59 [12:59:10] (03PS1) 10BBlack: remove lvs1007-10 from lvs_class_hosts [puppet] - 10https://gerrit.wikimedia.org/r/421526 [12:59:36] (03PS3) 10KartikMistry: apertium: New upstream release [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/419351 (https://phabricator.wikimedia.org/T189075) [13:00:00] (03CR) 10BBlack: [C: 032] remove lvs1007-10 from lvs_class_hosts [puppet] - 10https://gerrit.wikimedia.org/r/421526 (owner: 10BBlack) [13:00:09] (03CR) 10jerkins-bot: [V: 04-1] apertium: New upstream release [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/419351 (https://phabricator.wikimedia.org/T189075) (owner: 10KartikMistry) [13:04:43] PROBLEM - Request latencies on argon is CRITICAL: CRITICAL - apiserver_request_latencies is 63072362 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:05:30] (03PS9) 10Jcrespo: Add port of osc_host.sh [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/419725 (owner: 10Rduran) [13:06:45] (03PS8) 10Arturo Borrero Gonzalez: labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) [13:07:36] (03PS1) 10Giuseppe Lavagetto: pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 [13:07:43] RECOVERY - Request latencies on argon is OK: OK - apiserver_request_latencies is 6386 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:08:53] (03CR) 10Arturo Borrero Gonzalez: [C: 032] labs: monitoring: include openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/421516 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [13:11:08] 10Operations, 10Traffic, 10Beta-Cluster-reproducible, 10Patch-For-Review, 10Performance-Team (Radar): PHP fatal errors causing Varnish to return 503 - "Junk after gzip data" - https://phabricator.wikimedia.org/T125938#4075622 (10BBlack) 05Open>03Resolved a:03BBlack The above took care of it from th... [13:14:01] (03PS1) 10Arturo Borrero Gonzalez: labs: monitoring: install python-novaclient [puppet] - 10https://gerrit.wikimedia.org/r/421528 (https://phabricator.wikimedia.org/T190312) [13:14:12] PROBLEM - Request latencies on chlorine is CRITICAL: CRITICAL - apiserver_request_latencies is 24536291 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:14:22] (03PS10) 10Jcrespo: Add port of osc_host.sh [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/419725 (owner: 10Rduran) [13:14:44] (03CR) 10Arturo Borrero Gonzalez: [C: 032] labs: monitoring: install python-novaclient [puppet] - 10https://gerrit.wikimedia.org/r/421528 (https://phabricator.wikimedia.org/T190312) (owner: 10Arturo Borrero Gonzalez) [13:16:12] RECOVERY - Request latencies on chlorine is OK: OK - apiserver_request_latencies is 4541 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:16:16] (03CR) 10Jcrespo: "This is closer to what I want. Give it a look https://gerrit.wikimedia.org/r/#/c/419725/8..10/wmfmariadbpy/osc_host.py I think pt-osc exec" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/419725 (owner: 10Rduran) [13:18:01] (03CR) 10Jcrespo: "maybe we should rename warn to 2 options "interactive" (ask for confirmation) + "ignore errors", which is the whole point of that." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/419725 (owner: 10Rduran) [13:18:37] (03PS2) 10Giuseppe Lavagetto: pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 [13:21:12] PROBLEM - Request latencies on chlorine is CRITICAL: CRITICAL - apiserver_request_latencies is 24411586 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:24:12] RECOVERY - Request latencies on chlorine is OK: OK - apiserver_request_latencies is 4535 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:24:29] !log installing postgres security updates on rhenium [13:24:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:21] (03Abandoned) 10Muehlenhoff: Provide a systemd override unit for memcached [puppet] - 10https://gerrit.wikimedia.org/r/319820 (owner: 10Muehlenhoff) [13:32:29] (03PS1) 10Arturo Borrero Gonzalez: lab: monitoring: include apache mod rewrite [puppet] - 10https://gerrit.wikimedia.org/r/421531 (https://phabricator.wikimedia.org/T190515) [13:33:05] (03CR) 10jerkins-bot: [V: 04-1] lab: monitoring: include apache mod rewrite [puppet] - 10https://gerrit.wikimedia.org/r/421531 (https://phabricator.wikimedia.org/T190515) (owner: 10Arturo Borrero Gonzalez) [13:34:39] (03PS1) 10Herron: upgrade eqiad puppet masters to puppetdb4 [puppet] - 10https://gerrit.wikimedia.org/r/421532 (https://phabricator.wikimedia.org/T177253) [13:39:47] (03PS2) 10Arturo Borrero Gonzalez: lab: monitoring: include apache mod rewrite [puppet] - 10https://gerrit.wikimedia.org/r/421531 (https://phabricator.wikimedia.org/T190515) [13:41:52] (03PS3) 10Daimona Eaytoy: Enable $wgAbuseFilterProfile on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420687 (https://phabricator.wikimedia.org/T190137) [13:42:49] (03PS3) 10Arturo Borrero Gonzalez: lab: monitoring: include apache mod rewrite [puppet] - 10https://gerrit.wikimedia.org/r/421531 (https://phabricator.wikimedia.org/T190515) [13:46:30] (03CR) 10Arturo Borrero Gonzalez: [C: 032] lab: monitoring: include apache mod rewrite [puppet] - 10https://gerrit.wikimedia.org/r/421531 (https://phabricator.wikimedia.org/T190515) (owner: 10Arturo Borrero Gonzalez) [13:47:15] (03PS1) 10Jcrespo: Add script to fix replication on emergencies by reimporting from master [software] - 10https://gerrit.wikimedia.org/r/421536 [13:48:20] (03PS4) 10Daimona Eaytoy: Enable AbuseFilter runtime profile on more Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420672 (https://phabricator.wikimedia.org/T175954) [13:48:43] (03CR) 10Ottomata: [C: 031] "SHEESSHSHHHH here's elukey again always making things better" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/421490 (owner: 10Elukey) [13:48:53] (03CR) 10Jcrespo: [C: 032] Add script to fix replication on emergencies by reimporting from master [software] - 10https://gerrit.wikimedia.org/r/421536 (owner: 10Jcrespo) [13:49:36] (03PS1) 10Jcrespo: Revert "toolsdb: Deploy replication filter to avoid replica lag" [puppet] - 10https://gerrit.wikimedia.org/r/421537 [13:49:46] (03PS2) 10Jcrespo: Revert "toolsdb: Deploy replication filter to avoid replica lag" [puppet] - 10https://gerrit.wikimedia.org/r/421537 [13:49:58] (03CR) 10Mobrovac: [C: 031] pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 (owner: 10Giuseppe Lavagetto) [13:51:01] (03CR) 10Marostegui: "Nice!" [software] - 10https://gerrit.wikimedia.org/r/421536 (owner: 10Jcrespo) [13:54:40] (03PS2) 10Ema: varnish: cleanup after upgrade to v5 [puppet] - 10https://gerrit.wikimedia.org/r/416652 (https://phabricator.wikimedia.org/T188545) [13:55:50] (03PS3) 10Giuseppe Lavagetto: pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 [13:55:53] (03PS3) 10Ema: varnish: cleanup after upgrade to v5 [puppet] - 10https://gerrit.wikimedia.org/r/416652 (https://phabricator.wikimedia.org/T188545) [13:56:16] (03CR) 10jerkins-bot: [V: 04-1] pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 (owner: 10Giuseppe Lavagetto) [14:01:53] (03PS4) 10Giuseppe Lavagetto: pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 [14:09:43] (03CR) 10Jcrespo: [C: 032] Revert "toolsdb: Deploy replication filter to avoid replica lag" [puppet] - 10https://gerrit.wikimedia.org/r/421537 (owner: 10Jcrespo) [14:10:31] (03PS5) 10Giuseppe Lavagetto: pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 [14:12:17] (03CR) 10Ema: [V: 032 C: 032] "pcc looks good, merging. https://puppet-compiler.wmflabs.org/compiler03/10620/" [puppet] - 10https://gerrit.wikimedia.org/r/416652 (https://phabricator.wikimedia.org/T188545) (owner: 10Ema) [14:12:26] (03PS4) 10Ema: varnish: cleanup after upgrade to v5 [puppet] - 10https://gerrit.wikimedia.org/r/416652 (https://phabricator.wikimedia.org/T188545) [14:12:43] (03CR) 10Ema: [V: 032 C: 032] varnish: cleanup after upgrade to v5 [puppet] - 10https://gerrit.wikimedia.org/r/416652 (https://phabricator.wikimedia.org/T188545) (owner: 10Ema) [14:13:01] <_joe_> ema: ahem [14:13:07] <_joe_> I was waiting for jenkins... [14:13:16] (03CR) 10Giuseppe Lavagetto: [C: 032] pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 (owner: 10Giuseppe Lavagetto) [14:13:28] <_joe_> :P [14:13:34] (03PS6) 10Giuseppe Lavagetto: pooler_loop: manage case where pybal is turned off on the load-balancer [puppet] - 10https://gerrit.wikimedia.org/r/421527 [14:13:55] ha! my jenkins was faster [14:14:06] 10Operations: Upgrade naos to stretch (and rename to deploy2001) - https://phabricator.wikimedia.org/T190524#4075855 (10MoritzMuehlenhoff) [14:14:49] <_joe_> no you cheated, with v+2 [14:14:52] <_joe_> :P [14:15:15] ema never cheats, you should know by now _joe_ [14:16:00] <_joe_> oh don't get me wrong [14:16:09] <_joe_> I do cheat jenkins and merge-snipe others [14:16:18] <_joe_> it's annoying when others do it to you though [14:16:41] <_joe_> that should not be allowed [14:16:55] <_joe_> prun [14:16:59] (03PS2) 10Umherirrender: Replace wfGetLBFactory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414310 [14:20:34] (03PS1) 10Arturo Borrero Gonzalez: labs: graphite: enable several required apache modules [puppet] - 10https://gerrit.wikimedia.org/r/421540 (https://phabricator.wikimedia.org/T190515) [14:22:26] (03CR) 10Arturo Borrero Gonzalez: [C: 032] labs: graphite: enable several required apache modules [puppet] - 10https://gerrit.wikimedia.org/r/421540 (https://phabricator.wikimedia.org/T190515) (owner: 10Arturo Borrero Gonzalez) [14:26:52] PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[ensure_present_mod_uwsgi] [14:27:54] 10Operations: Upgrade naos to stretch (and rename to deploy2001) - https://phabricator.wikimedia.org/T190524#4075890 (10Dzahn) a:03Dzahn [14:42:59] (03PS1) 10Ema: VCL: use hfp only for uncacheable candidates for conditional requests [puppet] - 10https://gerrit.wikimedia.org/r/421542 (https://phabricator.wikimedia.org/T180712) [14:43:29] (03CR) 10jerkins-bot: [V: 04-1] VCL: use hfp only for uncacheable candidates for conditional requests [puppet] - 10https://gerrit.wikimedia.org/r/421542 (https://phabricator.wikimedia.org/T180712) (owner: 10Ema) [14:45:47] 10Operations, 10monitoring, 10Patch-For-Review: Check for long running screen/tmux should mention usernames - https://phabricator.wikimedia.org/T181409#4075933 (10Dzahn) done! examples: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=graphite2001&service=Long+running+screen%2Ftmux https... [14:46:04] 10Operations, 10monitoring, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348#4075935 (10Dzahn) [14:46:06] 10Operations, 10monitoring, 10Patch-For-Review: Check for long running screen/tmux should mention usernames - https://phabricator.wikimedia.org/T181409#4075934 (10Dzahn) 05Open>03Resolved [14:46:19] (03PS2) 10Ema: VCL: use hfp only for uncacheable candidates for conditional requests [puppet] - 10https://gerrit.wikimedia.org/r/421542 (https://phabricator.wikimedia.org/T180712) [14:51:20] (03CR) 10Reedy: [C: 031] Add cron job for expired userrights maintenance script [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [14:52:35] (03PS9) 10Dzahn: Add cron job for expired userrights maintenance script [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [14:55:13] (03CR) 10Dzahn: [C: 032] "puppet part looks good. script is already in production just run manually by people instead of a cron" [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [14:55:28] (03PS7) 10Muehlenhoff: Allow to selectively run time servers on Chrony [puppet] - 10https://gerrit.wikimedia.org/r/393581 (https://phabricator.wikimedia.org/T177742) [14:59:09] 10Operations, 10Pybal, 10Traffic: pybal 1.15.2 dies with obscure errors without python-prometheus-client - https://phabricator.wikimedia.org/T190527#4075954 (10Jgreen) [15:02:53] !log bawolff@tin Synchronized php-1.31.0-wmf.26/includes/api/ApiQueryUserContributions.php: T190507 (duration: 00m 59s) [15:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:38] (03CR) 10Muehlenhoff: [C: 032] Switch debdeploy clients to Python 3 [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/413397 (owner: 10Muehlenhoff) [15:06:18] (03PS2) 10RobH: adding cooltey to reserachers [puppet] - 10https://gerrit.wikimedia.org/r/420809 (https://phabricator.wikimedia.org/T190150) [15:07:26] (03PS1) 10Muehlenhoff: Bump changelog for new release [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/421543 [15:08:37] !log cache_codfw: begin reboots for retpoline kernel upgrades T188092 [15:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:42] (03PS1) 10Andrew Bogott: horizon: add a maintenance page, switched on via a hiera setting [puppet] - 10https://gerrit.wikimedia.org/r/421546 (https://phabricator.wikimedia.org/T189704) [15:09:54] (03CR) 10RobH: [C: 032] adding cooltey to reserachers [puppet] - 10https://gerrit.wikimedia.org/r/420809 (https://phabricator.wikimedia.org/T190150) (owner: 10RobH) [15:10:20] (03CR) 10jerkins-bot: [V: 04-1] horizon: add a maintenance page, switched on via a hiera setting [puppet] - 10https://gerrit.wikimedia.org/r/421546 (https://phabricator.wikimedia.org/T189704) (owner: 10Andrew Bogott) [15:11:26] 10Operations, 10DBA, 10MediaWiki-General-or-Unknown, 10MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#4075999 (10EddieGP) 05Open>03Resolved Thanks @Dzahn! With the cron... [15:12:27] (03PS2) 10Andrew Bogott: horizon: add a maintenance page, switched on via a hiera setting [puppet] - 10https://gerrit.wikimedia.org/r/421546 (https://phabricator.wikimedia.org/T189704) [15:12:53] (03CR) 10BryanDavis: [C: 031] Don't include mediawiki fonts list in toollabs::exec_environ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/421515 (https://phabricator.wikimedia.org/T190135) (owner: 10Muehlenhoff) [15:14:22] (03PS1) 10Elukey: statistics: force A-E: gzip to workaround Yarn's UI precompr. content [puppet] - 10https://gerrit.wikimedia.org/r/421547 [15:14:52] (03CR) 10Elukey: [C: 032] statistics: force A-E: gzip to workaround Yarn's UI precompr. content [puppet] - 10https://gerrit.wikimedia.org/r/421547 (owner: 10Elukey) [15:15:00] (03CR) 10Muehlenhoff: Don't include mediawiki fonts list in toollabs::exec_environ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/421515 (https://phabricator.wikimedia.org/T190135) (owner: 10Muehlenhoff) [15:16:43] (03CR) 10Muehlenhoff: [C: 032] Bump changelog for new release [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/421543 (owner: 10Muehlenhoff) [15:16:52] (03PS3) 10Andrew Bogott: horizon: add a maintenance page, switched on via a hiera setting [puppet] - 10https://gerrit.wikimedia.org/r/421546 (https://phabricator.wikimedia.org/T189704) [15:17:10] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Access request to stat1005 and stat1006 for cooltey - https://phabricator.wikimedia.org/T190150#4076013 (10RobH) 05Open>03Resolved a:03RobH No objections have been noted, so access to researchers group has been merged live. Please allow 30 minut... [15:18:48] (03PS4) 10Andrew Bogott: horizon: add a maintenance page, switched on via a hiera setting [puppet] - 10https://gerrit.wikimedia.org/r/421546 (https://phabricator.wikimedia.org/T189704) [15:19:48] (03CR) 10Andrew Bogott: [C: 032] horizon: add a maintenance page, switched on via a hiera setting [puppet] - 10https://gerrit.wikimedia.org/r/421546 (https://phabricator.wikimedia.org/T189704) (owner: 10Andrew Bogott) [15:24:22] (03CR) 10Giuseppe Lavagetto: "See the comments on the control file, but as it is, this is an acceptable first step to at least try dynomite out." (034 comments) [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 (owner: 10Aaron Schulz) [15:24:28] (03CR) 10Giuseppe Lavagetto: [C: 031] [WIP] Initial debianization [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 (owner: 10Aaron Schulz) [15:25:58] (03PS1) 10Andrew Bogott: horizon: specify the 503 error page for maintenance mode [puppet] - 10https://gerrit.wikimedia.org/r/421549 (https://phabricator.wikimedia.org/T189704) [15:26:46] (03CR) 10Andrew Bogott: [C: 032] horizon: specify the 503 error page for maintenance mode [puppet] - 10https://gerrit.wikimedia.org/r/421549 (https://phabricator.wikimedia.org/T189704) (owner: 10Andrew Bogott) [15:27:14] (03CR) 10Dzahn: [C: 032] "[terbium:~] $ sudo crontab -u www-data -l | grep -A1 expired" [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [15:29:18] 10Operations, 10Parsoid, 10Release-Engineering-Team (Watching / External): Provide an archive endpoint for older Parsoid debs (on releases.wikimedia.org or elsewhere) - https://phabricator.wikimedia.org/T150672#4076058 (10ssastry) [15:29:51] (03PS1) 10Andrew Bogott: labtestweb: take labtesthorizon out of maintenance mode [puppet] - 10https://gerrit.wikimedia.org/r/421551 [15:30:36] (03CR) 10Andrew Bogott: [C: 032] labtestweb: take labtesthorizon out of maintenance mode [puppet] - 10https://gerrit.wikimedia.org/r/421551 (owner: 10Andrew Bogott) [15:30:39] 10Operations, 10Parsoid, 10Release-Engineering-Team (Watching / External): Provide an archive endpoint for older Parsoid debs (on releases.wikimedia.org or elsewhere) - https://phabricator.wikimedia.org/T150672#4076061 (10Dzahn) a:03Dzahn [15:33:53] PROBLEM - High CPU load on API appserver on mw1290 is CRITICAL: CRITICAL - load average: 98.75, 39.83, 28.68 [15:34:18] (03PS1) 10Arturo Borrero Gonzalez: labs: graphite: install uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/421552 (https://phabricator.wikimedia.org/T190515) [15:34:46] (03CR) 10jerkins-bot: [V: 04-1] labs: graphite: install uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/421552 (https://phabricator.wikimedia.org/T190515) (owner: 10Arturo Borrero Gonzalez) [15:37:12] (03PS2) 10Herron: upgrade eqiad puppet masters to puppetdb4 [puppet] - 10https://gerrit.wikimedia.org/r/421532 (https://phabricator.wikimedia.org/T177253) [15:37:52] RECOVERY - High CPU load on API appserver on mw1290 is OK: OK - load average: 27.20, 31.69, 27.82 [15:38:11] (03CR) 10Herron: [C: 032] upgrade eqiad puppet masters to puppetdb4 [puppet] - 10https://gerrit.wikimedia.org/r/421532 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [15:43:25] !log uploaded debdeploy 0.0.99.3 to apt.wikimedia.org (now based on Python 3 for the clients) [15:43:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:37] (03CR) 10Imarlier: [WIP] Initial debianization (032 comments) [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 (owner: 10Aaron Schulz) [15:49:30] (03PS2) 10Muehlenhoff: Don't include mediawiki fonts list in toollabs::exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/421515 (https://phabricator.wikimedia.org/T190135) [15:55:46] 10Operations, 10ops-eqiad: install conf1004-6 ssd upgrades - https://phabricator.wikimedia.org/T190230#4076157 (10Cmjohnson) a:05Cmjohnson>03Joe @Joe The ssds have been installed. The servers are currently powered off. Please resolve if satisfied [15:56:15] (03CR) 10Muehlenhoff: [WIP] Initial debianization (033 comments) [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 (owner: 10Aaron Schulz) [16:06:59] 10Operations, 10cloud-services-team, 10netops: modify labs-hosts1-vlans for http load of installer kernel - https://phabricator.wikimedia.org/T190424#4076177 (10ayounsi) a:03RobH For ACLs, please be more specific, ideally mentioning a source/destination IP(s) and port(s). Taking a random labvirt* host as... [16:09:12] (03PS1) 10Andrew Bogott: labweb: clean up a bunch of obsolete horizon code [puppet] - 10https://gerrit.wikimedia.org/r/421557 [16:09:14] (03PS1) 10Andrew Bogott: labweb: remove obsolete wikitech code [puppet] - 10https://gerrit.wikimedia.org/r/421558 [16:14:13] (03PS2) 10Andrew Bogott: labweb: clean up a bunch of obsolete horizon code [puppet] - 10https://gerrit.wikimedia.org/r/421557 [16:14:27] (03PS2) 10Andrew Bogott: labweb: remove obsolete wikitech code [puppet] - 10https://gerrit.wikimedia.org/r/421558 [16:14:54] (03CR) 10Andrew Bogott: [C: 032] labweb: clean up a bunch of obsolete horizon code [puppet] - 10https://gerrit.wikimedia.org/r/421557 (owner: 10Andrew Bogott) [16:16:46] (03CR) 10Andrew Bogott: [C: 032] labweb: remove obsolete wikitech code [puppet] - 10https://gerrit.wikimedia.org/r/421558 (owner: 10Andrew Bogott) [16:18:33] 10Operations, 10cloud-services-team, 10netops: modify labs-hosts1-vlans for http load of installer kernel - https://phabricator.wikimedia.org/T190424#4076204 (10RobH) so the symptoms of this were us trying to PXE boot labvirt1021 (10.64.20.40) and labvirt1022 (10.64.20.41). During the PXE boot, it gets the... [16:18:38] (03PS1) 10Cmjohnson: Removal of mgmt dns db1043 [dns] - 10https://gerrit.wikimedia.org/r/421560 (https://phabricator.wikimedia.org/T187542) [16:18:46] 10Operations, 10cloud-services-team, 10netops: modify labs-hosts1-vlans for http load of installer kernel - https://phabricator.wikimedia.org/T190424#4076207 (10RobH) a:05RobH>03faidon [16:19:16] (03PS2) 10Cmjohnson: Removal of mgmt dns db1043 [dns] - 10https://gerrit.wikimedia.org/r/421560 (https://phabricator.wikimedia.org/T187542) [16:19:29] (03CR) 10Cmjohnson: [C: 032] Removal of mgmt dns db1043 [dns] - 10https://gerrit.wikimedia.org/r/421560 (https://phabricator.wikimedia.org/T187542) (owner: 10Cmjohnson) [16:21:08] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1043 - https://phabricator.wikimedia.org/T187542#4076225 (10Cmjohnson) [16:21:26] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4076229 (10Cmjohnson) [16:21:28] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1043 - https://phabricator.wikimedia.org/T187542#3978426 (10Cmjohnson) 05Open>03Resolved [16:22:16] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#4076231 (10Cmjohnson) [16:22:27] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#4076230 (10Fjalapeno) @kchapman we are interested in picking this up in Reading Infrastructure, but haven't been able to get to it. We would still like... [16:22:31] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#3976137 (10Cmjohnson) 05Open>03Resolved [16:24:33] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#4076236 (10Fjalapeno) For context: We have lots of client code with work arounds for getting the right sizes of images. So much duplication and bugs. We... [16:24:59] (03PS1) 10Cmjohnson: Removing mgmt dns rcs1001-1002 [dns] - 10https://gerrit.wikimedia.org/r/421564 (https://phabricator.wikimedia.org/T170157) [16:25:31] (03PS5) 10Andrew Bogott: Move WMCS VMs back to the default environment. [puppet] - 10https://gerrit.wikimedia.org/r/410069 [16:25:37] (03PS2) 10Cmjohnson: Removing mgmt dns rcs1001-1002 [dns] - 10https://gerrit.wikimedia.org/r/421564 (https://phabricator.wikimedia.org/T170157) [16:26:07] (03CR) 10Cmjohnson: [C: 032] Removing mgmt dns rcs1001-1002 [dns] - 10https://gerrit.wikimedia.org/r/421564 (https://phabricator.wikimedia.org/T170157) (owner: 10Cmjohnson) [16:28:36] 10Operations, 10ops-eqiad, 10Analytics, 10Wikimedia-Stream, and 2 others: decommission rcs100[12] - https://phabricator.wikimedia.org/T170157#4076259 (10Cmjohnson) [16:29:06] 10Operations, 10ops-eqiad, 10Analytics, 10Wikimedia-Stream, and 2 others: decommission rcs100[12] - https://phabricator.wikimedia.org/T170157#3420970 (10Cmjohnson) 05Open>03Resolved [16:32:27] 10Operations, 10Ops-Access-Requests: Requesting access to stats machines for Lucas Werkmeister - https://phabricator.wikimedia.org/T190415#4076271 (10RobH) a:03Lucas_Werkmeister_WMDE [16:35:01] (03PS2) 10Arturo Borrero Gonzalez: labs: graphite: install uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/421552 (https://phabricator.wikimedia.org/T190515) [16:36:04] (03CR) 10Arturo Borrero Gonzalez: [C: 032] labs: graphite: install uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/421552 (https://phabricator.wikimedia.org/T190515) (owner: 10Arturo Borrero Gonzalez) [16:36:58] (03PS1) 10Cmjohnson: Removing mgmt dns for db1009 [dns] - 10https://gerrit.wikimedia.org/r/421567 (https://phabricator.wikimedia.org/T189216) [16:37:24] (03PS2) 10Cmjohnson: Removing mgmt dns for db1009 [dns] - 10https://gerrit.wikimedia.org/r/421567 (https://phabricator.wikimedia.org/T189216) [16:37:43] (03CR) 10Cmjohnson: [C: 032] Removing mgmt dns for db1009 [dns] - 10https://gerrit.wikimedia.org/r/421567 (https://phabricator.wikimedia.org/T189216) (owner: 10Cmjohnson) [16:38:30] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4076284 (10Cmjohnson) [16:38:36] 10Operations, 10DBA, 10cloud-services-team, 10Patch-For-Review: Failover m5 master from db1009 to db1073 - https://phabricator.wikimedia.org/T189005#4076286 (10Cmjohnson) [16:38:39] 10Operations, 10DBA, 10hardware-requests, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#4076287 (10Cmjohnson) [16:38:46] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4035423 (10Cmjohnson) 05Open>03Resolved [16:39:10] (03CR) 10Andrew Bogott: [C: 032] Move WMCS VMs back to the default environment. [puppet] - 10https://gerrit.wikimedia.org/r/410069 (owner: 10Andrew Bogott) [16:39:19] (03PS6) 10Andrew Bogott: Move WMCS VMs back to the default environment. [puppet] - 10https://gerrit.wikimedia.org/r/410069 [16:41:06] 10Operations, 10DBA, 10hardware-requests, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#4076293 (10Marostegui) [16:41:09] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1043 - https://phabricator.wikimedia.org/T187542#4076292 (10Marostegui) [16:42:16] 10Operations, 10DBA, 10hardware-requests, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3189449 (10Marostegui) [16:42:19] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4076294 (10Marostegui) [16:42:37] 10Operations, 10DBA, 10hardware-requests, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3200902 (10Marostegui) [16:42:41] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1029 and db1031 - https://phabricator.wikimedia.org/T184054#4076296 (10Marostegui) [16:48:06] (03CR) 10BryanDavis: Don't include mediawiki fonts list in toollabs::exec_environ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/421515 (https://phabricator.wikimedia.org/T190135) (owner: 10Muehlenhoff) [16:49:15] (03PS1) 10Cmjohnson: Removing mgmt dns logstash1001-1003 [dns] - 10https://gerrit.wikimedia.org/r/421570 (https://phabricator.wikimedia.org/T175830) [16:50:26] (03Abandoned) 10Andrew Bogott: nova-network dnsmasq: set a deployment-appropriate cname for 'puppet' [puppet] - 10https://gerrit.wikimedia.org/r/393841 (https://phabricator.wikimedia.org/T181375) (owner: 10Andrew Bogott) [16:50:32] (03PS2) 10Cmjohnson: Removing mgmt dns logstash1001-1003 [dns] - 10https://gerrit.wikimedia.org/r/421570 (https://phabricator.wikimedia.org/T175830) [16:50:47] (03CR) 10Cmjohnson: [C: 032] Removing mgmt dns logstash1001-1003 [dns] - 10https://gerrit.wikimedia.org/r/421570 (https://phabricator.wikimedia.org/T175830) (owner: 10Cmjohnson) [16:54:59] 10Operations, 10ops-eqiad, 10Wikimedia-Logstash, 10hardware-requests, 10Patch-For-Review: decommission logstash100[1-3] - https://phabricator.wikimedia.org/T175830#4076324 (10Cmjohnson) [16:55:14] 10Operations, 10ops-eqiad, 10Wikimedia-Logstash, 10hardware-requests, 10Patch-For-Review: decommission logstash100[1-3] - https://phabricator.wikimedia.org/T175830#3604520 (10Cmjohnson) 05Open>03Resolved [16:57:29] (03PS1) 10Cmjohnson: Removing mgmt dns for iridium [dns] - 10https://gerrit.wikimedia.org/r/421571 (https://phabricator.wikimedia.org/T172487) [16:59:29] (03PS25) 10Paladox: Phabricator: Support php 7.2 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [16:59:50] (03Abandoned) 10Cmjohnson: Removing mgmt dns for iridium [dns] - 10https://gerrit.wikimedia.org/r/421571 (https://phabricator.wikimedia.org/T172487) (owner: 10Cmjohnson) [16:59:52] (03PS26) 10Paladox: Phabricator: Support php 7.2 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [17:02:38] (03PS1) 10Arturo Borrero Gonzalez: labs: monitoring: get libapache2-mod-uwsgi insalled [puppet] - 10https://gerrit.wikimedia.org/r/421572 (https://phabricator.wikimedia.org/T190515) [17:02:45] (03PS27) 10Paladox: Phabricator: Support php 7.2 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [17:03:03] (03CR) 10jerkins-bot: [V: 04-1] labs: monitoring: get libapache2-mod-uwsgi insalled [puppet] - 10https://gerrit.wikimedia.org/r/421572 (https://phabricator.wikimedia.org/T190515) (owner: 10Arturo Borrero Gonzalez) [17:03:13] 10Operations, 10Cloud-Services, 10Developer-Relations: Use the term "developer account" for Wikimedia LDAP accounts - https://phabricator.wikimedia.org/T179461#4076362 (10Quiddity) [17:05:22] 10Operations, 10ops-codfw, 10Traffic: cp2006: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T190540#4076372 (10ema) [17:05:34] (03PS1) 10Imarlier: coal: add logging, and handle ValueError case [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) [17:05:37] 10Operations, 10ops-codfw, 10Traffic: cp2006: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T190540#4076383 (10ema) p:05Triage>03Normal [17:06:55] (03PS2) 10Imarlier: coal: add logging, and handle ValueError case [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) [17:07:20] (03PS1) 10RobH: decom of db1001, db1011, and db1016 [puppet] - 10https://gerrit.wikimedia.org/r/421574 (https://phabricator.wikimedia.org/T190262) [17:08:02] o/ is there someone that can help with T188913? i basically can't access the obama page on the beta cluster [17:08:03] T188913: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913 [17:09:21] 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4076401 (10ayounsi) [17:09:37] (03PS2) 10BryanDavis: toolforge: Add missing *.wikipedia.org and schemes to CSP policy [puppet] - 10https://gerrit.wikimedia.org/r/421472 (https://phabricator.wikimedia.org/T130748) [17:11:34] (03PS2) 10Arturo Borrero Gonzalez: labs: monitoring: get libapache2-mod-uwsgi insalled [puppet] - 10https://gerrit.wikimedia.org/r/421572 (https://phabricator.wikimedia.org/T190515) [17:14:25] 10Operations, 10Cloud-Services, 10Developer-Relations: Use the term "developer account" for Wikimedia LDAP accounts - https://phabricator.wikimedia.org/T179461#4076423 (10Quiddity) 05Open>03Resolved Closing as resolved, with agreement to follow task title recommendation. [17:15:13] 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4076425 (10RobH) Just changd on asw switch stack: robh@asw-a-eqiad# show | compare [edit interfaces] + ge-2/0/0 { + description db1001; + disable; + } + g... [17:15:53] (03CR) 10Muehlenhoff: Don't include mediawiki fonts list in toollabs::exec_environ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/421515 (https://phabricator.wikimedia.org/T190135) (owner: 10Muehlenhoff) [17:15:58] (03PS3) 10Muehlenhoff: Don't include mediawiki fonts list in toollabs::exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/421515 (https://phabricator.wikimedia.org/T190135) [17:16:03] (03CR) 10Imarlier: "What the log output now looks like, under normal runtime conditions:" [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [17:16:19] (03CR) 10Dzahn: "we talked about using PHP_INI_SCAN_DIR to tell php to scan a dir for .ini files so that we can only add to puppet what we actually customi" [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [17:17:53] (03PS1) 10RobH: decom old db systems db1001, db1011, db1016 [dns] - 10https://gerrit.wikimedia.org/r/421578 (https://phabricator.wikimedia.org/T190262) [17:18:03] (03CR) 10jerkins-bot: [V: 04-1] decom old db systems db1001, db1011, db1016 [dns] - 10https://gerrit.wikimedia.org/r/421578 (https://phabricator.wikimedia.org/T190262) (owner: 10RobH) [17:19:11] (03CR) 10RobH: [C: 032] decom of db1001, db1011, and db1016 [puppet] - 10https://gerrit.wikimedia.org/r/421574 (https://phabricator.wikimedia.org/T190262) (owner: 10RobH) [17:20:29] (03PS3) 10Arturo Borrero Gonzalez: labs: monitoring: get libapache2-mod-uwsgi insalled [puppet] - 10https://gerrit.wikimedia.org/r/421572 (https://phabricator.wikimedia.org/T190515) [17:21:39] (03PS2) 10RobH: decom old db systems db1001, db1011, db1016 [dns] - 10https://gerrit.wikimedia.org/r/421578 (https://phabricator.wikimedia.org/T190262) [17:21:57] 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4076475 (10ayounsi) [17:22:00] (03PS4) 10Arturo Borrero Gonzalez: labs: monitoring: get libapache2-mod-uwsgi insalled [puppet] - 10https://gerrit.wikimedia.org/r/421572 (https://phabricator.wikimedia.org/T190515) [17:22:28] (03CR) 10RobH: [C: 032] decom old db systems db1001, db1011, db1016 [dns] - 10https://gerrit.wikimedia.org/r/421578 (https://phabricator.wikimedia.org/T190262) (owner: 10RobH) [17:22:46] (03CR) 10Arturo Borrero Gonzalez: [C: 032] labs: monitoring: get libapache2-mod-uwsgi insalled [puppet] - 10https://gerrit.wikimedia.org/r/421572 (https://phabricator.wikimedia.org/T190515) (owner: 10Arturo Borrero Gonzalez) [17:23:08] 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4044272 (10ayounsi) Removed them from the table from description. [17:26:47] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1016 - https://phabricator.wikimedia.org/T190179#4076488 (10RobH) a:05RobH>03Cmjohnson [17:26:52] RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:27:10] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4076492 (10RobH) a:05RobH>03Cmjohnson [17:27:16] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1001 - https://phabricator.wikimedia.org/T190262#4076495 (10RobH) a:05RobH>03Cmjohnson [17:30:21] PROBLEM - Host iridium is DOWN: PING CRITICAL - Packet loss = 100% [17:30:21] PROBLEM - Host iridium.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:31:11] that must be a decom? [17:31:23] that has not been used in a long time. it's old-phab [17:31:53] robh: ^ was still in puppet? [17:34:00] odd mutante everything was checked off [17:34:01] https://phabricator.wikimedia.org/T172487 [17:37:30] RECOVERY - Host iridium.mgmt is UP: PING OK - Packet loss = 0%, RTA = 149.99 ms [17:42:25] cmjohnson: hmm.. i wonder ... maybe it is related to the puppetmaster/puppetdb work [17:42:29] (03PS1) 10Arturo Borrero Gonzalez: graphite: labs: create /srv/carbon/whisper/archived_metrics/ [puppet] - 10https://gerrit.wikimedia.org/r/421587 (https://phabricator.wikimedia.org/T190512) [17:42:38] i heard something about hosts reappearing [17:43:08] (03PS1) 10Umherirrender: Use namespaced PHPUnit\Framework\TestCase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421588 (https://phabricator.wikimedia.org/T188166) [17:43:17] (03PS3) 10Imarlier: coal: add logging, and handle ValueError case [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) [17:43:28] (03CR) 10Arturo Borrero Gonzalez: [C: 032] graphite: labs: create /srv/carbon/whisper/archived_metrics/ [puppet] - 10https://gerrit.wikimedia.org/r/421587 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [17:44:08] (03CR) 10Jforrester: [C: 031] Use namespaced PHPUnit\Framework\TestCase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421588 (https://phabricator.wikimedia.org/T188166) (owner: 10Umherirrender) [17:45:40] mutante: maybe...the disks are wiped...not sure if this is meant for complete decom or adding back to spares..task seems to suggest it's going bye bye. robh any insight? [17:47:18] cmjohnson: i checked warranty expiration ... and WOW: [17:47:20] HW warranty expiration:2017-03-23 [17:47:40] first i thought "that's today" but it's actually 1 year ago of course [17:48:11] 10Operations, 10ops-codfw, 10Traffic: cp2006, cp2010: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T190540#4076593 (10ema) [17:48:17] was about to say something about the odds of that... [17:48:28] (03PS28) 10Paladox: Phabricator: Support php 7.2 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [17:48:40] (03PS29) 10Paladox: Phabricator: Support php 7.2 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [17:53:53] mutante: i know..i think we could get more use out of it..considering some things are just now going away after 7 years [17:56:43] what system? [17:56:56] ohh, iridium [17:57:12] hrmm, its out of warranty but only a year [17:57:19] reclaim to spare would liekly be better if its stillw orking [17:57:32] but if not, decom. [18:01:44] 10Operations, 10Release Pipeline, 10Release-Engineering-Team (Watching / External): Update Debian package of Blubber - https://phabricator.wikimedia.org/T190551#4076636 (10dduvall) [18:02:09] 10Operations, 10Release Pipeline, 10Release-Engineering-Team (Watching / External): Update Debian package of Blubber - https://phabricator.wikimedia.org/T190551#4076646 (10dduvall) p:05Triage>03Normal [18:05:02] (03CR) 10Krinkle: [C: 04-1] "A few Qs." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [18:05:10] (03PS16) 10Bstorm: wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) [18:06:41] (03Abandoned) 10Sbisson: Configure maps source for localized labels [puppet] - 10https://gerrit.wikimedia.org/r/420315 (https://phabricator.wikimedia.org/T112948) (owner: 10Sbisson) [18:08:47] (03Draft1) 10Paladox: Gerrit: Switch gc back on [puppet] - 10https://gerrit.wikimedia.org/r/421593 (https://phabricator.wikimedia.org/T190045) [18:08:57] (03PS2) 10Paladox: Gerrit: Switch gc back on [puppet] - 10https://gerrit.wikimedia.org/r/421593 (https://phabricator.wikimedia.org/T190045) [18:09:06] (03CR) 10Bstorm: "I'm going to merge this so that the files are there (they won't do anything or run on their own). This will wait on T190470 to actually r" [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [18:09:14] (03CR) 10Bstorm: [C: 032] wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [18:13:57] 10Operations, 10ops-eqiad, 10hardware-requests: decom iridium - https://phabricator.wikimedia.org/T172487#4076696 (10RobH) [18:15:19] 10Operations, 10ops-eqiad, 10hardware-requests: decom iridium - https://phabricator.wikimedia.org/T172487#3499956 (10RobH) So this had a different port listed and disabled than the actual system port. I just fixed it, so it'll stop calling into puppet and monitoring. Task description updated, ready for dec... [18:28:25] !log sbisson@tin Started deploy [kartotherian/deploy@a66ff1d]: Deploying i18n feature to maps-test* [18:28:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:54] !log sbisson@tin Finished deploy [kartotherian/deploy@a66ff1d]: Deploying i18n feature to maps-test* (duration: 00m 29s) [18:28:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:15] (03PS1) 10Ottomata: Add $kafka_enable_auto_commit and $kafka_enable_auto_offset_store params [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/421599 [18:34:24] (03PS4) 10Imarlier: coal: add logging, and handle ValueError case [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) [18:34:27] (03CR) 10Imarlier: coal: add logging, and handle ValueError case (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [18:42:41] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#4076790 (10Tgr) The problem here is that TechCom is using Phabricator in a different way from the rest of the movement. The normal way is that you crea... [18:53:52] 10Operations, 10netops, 10WMF-NDA: Avoid US RTT for eqsin traffic - https://phabricator.wikimedia.org/T190559#4076857 (10ayounsi) p:05Triage>03Normal [18:59:47] !log sbisson@tin Started deploy [kartotherian/deploy@f716cde]: Deploying i18n feature to maps-test* (take 2) [18:59:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:10] !log sbisson@tin Finished deploy [kartotherian/deploy@f716cde]: Deploying i18n feature to maps-test* (take 2) (duration: 06m 23s) [19:06:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:49] (03PS1) 10Urbanecm: Enable Flow on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421606 (https://phabricator.wikimedia.org/T190500) [19:07:22] !log sbisson@tin Started deploy [kartotherian/deploy@f716cde]: Deploying i18n feature to maps-test* (take 3) [19:07:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:10] (03CR) 10Chad: "Why do we need to disable the other git gc settings?" [puppet] - 10https://gerrit.wikimedia.org/r/421593 (https://phabricator.wikimedia.org/T190045) (owner: 10Paladox) [19:10:53] (03CR) 10Paladox: "> Why do we need to disable the other git gc settings?" [puppet] - 10https://gerrit.wikimedia.org/r/421593 (https://phabricator.wikimedia.org/T190045) (owner: 10Paladox) [19:11:09] (03PS30) 10Dzahn: Phabricator: Support php 7.2 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [19:11:41] !log sbisson@tin Finished deploy [kartotherian/deploy@f716cde]: Deploying i18n feature to maps-test* (take 3) (duration: 04m 19s) [19:11:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:12:09] no_justification https://groups.google.com/forum/#!topic/repo-discuss/lVR37Pm4G3c :) [19:15:22] (03CR) 10Dzahn: [C: 032] Phabricator: Support php 7.2 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [19:16:38] (03CR) 1020after4: "nice work! thanks dzahn and paladox!" [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [19:16:43] :) [19:17:08] (03CR) 10Dzahn: [C: 032] "thanks! i like it now that it's much smaller and we are using the conf.d dir. please revert your cherry-pick and run again with regular ma" [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [19:17:59] !log sbisson@tin Started deploy [kartotherian/deploy@f716cde]: Deploying i18n feature to maps-test* (take 4) [19:18:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:57] !log sbisson@tin Finished deploy [kartotherian/deploy@f716cde]: Deploying i18n feature to maps-test* (take 4) (duration: 06m 58s) [19:25:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:04] (03CR) 10BryanDavis: "> wss://tools.wmflabs.org (I think wss: is not included with 'self'" [puppet] - 10https://gerrit.wikimedia.org/r/421472 (https://phabricator.wikimedia.org/T130748) (owner: 10BryanDavis) [19:30:25] Can someone help me by merging https://gerrit.wikimedia.org/r/#/c/421573/ in the operations/puppet repo? It's a change to coal.py, which is a perf team utility that gets deployed via puppet. [19:31:27] (03PS3) 10Andrew Bogott: toolforge: Add missing *.wikipedia.org and schemes to CSP policy [puppet] - 10https://gerrit.wikimedia.org/r/421472 (https://phabricator.wikimedia.org/T130748) (owner: 10BryanDavis) [19:32:04] (03CR) 10Andrew Bogott: [C: 032] toolforge: Add missing *.wikipedia.org and schemes to CSP policy [puppet] - 10https://gerrit.wikimedia.org/r/421472 (https://phabricator.wikimedia.org/T130748) (owner: 10BryanDavis) [19:33:56] (03CR) 10Krinkle: [C: 04-1] coal: add logging, and handle ValueError case (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [19:37:05] (03CR) 10Krinkle: coal: add logging, and handle ValueError case (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [19:37:07] (03CR) 10Krinkle: [C: 031] coal: add logging, and handle ValueError case [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [19:38:44] mutante: If you have minute, ^ is ready for deploy to fix an issue from yesterday's deploy. [19:38:54] (03CR) 10Dzahn: [C: 032] coal: add logging, and handle ValueError case [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [19:39:06] i was already looking at it and the +1 helped :) [19:39:11] (03PS5) 10Dzahn: coal: add logging, and handle ValueError case [puppet] - 10https://gerrit.wikimedia.org/r/421573 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [19:39:11] Thx [19:39:48] ++ thanks :-) [19:46:18] marlier: and now it's actually on the server, got a little distraction here [19:46:34] merged on puppetmaster [19:48:43] Sweet, I'll keep an eye on it. Thanks, mutante [19:50:13] mutante works https://phab-stretch.wmflabs.org :) [19:50:26] (only needed two puppet runs once i added the phab class). [19:50:37] second run is because i had to do bin/storage upgrade after the first run [19:50:52] (which wont be needed in prod as the mysql server is located off the host) [19:52:01] paladox: cool! :) [19:52:13] thanks for the stretch work [19:52:52] your welcome :) [19:53:18] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#4077084 (10Paladox) [19:53:22] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Someday): Add support for stretch in the phabricator puppet class - https://phabricator.wikimedia.org/T187127#4077083 (10Paladox) 05Open>03Resolved [19:53:49] 10Operations, 10Phabricator, 10Release-Engineering-Team (Someday): Add support for stretch in the phabricator puppet class - https://phabricator.wikimedia.org/T187127#3965428 (10Paladox) [19:54:17] (03CR) 10Aaron Schulz: [C: 031] Replace wfGetLBFactory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414310 (owner: 10Umherirrender) [19:58:20] 10Operations, 10Phabricator, 10Release-Engineering-Team: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077108 (10Paladox) [19:58:52] 10Operations, 10Phabricator, 10Release-Engineering-Team: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077123 (10Paladox) [19:58:58] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3948368 (10Paladox) [19:59:13] 10Operations, 10Phabricator, 10Release-Engineering-Team: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077108 (10Paladox) p:05Triage>03High Inherriting status from parent task [19:59:29] 10Operations, 10Phabricator, 10Release-Engineering-Team: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077126 (10mmodell) [20:01:00] 10Operations, 10Phabricator, 10Release-Engineering-Team: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077108 (10mmodell) [20:02:37] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2192875 (10Andrew) As of today there are 16 shinken alerts (most puppet but at least one disk warning) on this project, and three VMs that are shut down but not... [20:05:41] (03CR) 10Aaron Schulz: [WIP] Initial debianization (031 comment) [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 (owner: 10Aaron Schulz) [20:13:04] (03PS2) 10Aaron Schulz: [WIP] Initial debianization [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 [20:17:33] (03PS1) 10Ottomata: Use --new.consumer for main -> jumbo mirror maker [puppet] - 10https://gerrit.wikimedia.org/r/421617 (https://phabricator.wikimedia.org/T189464) [20:17:55] (03CR) 10Ottomata: "Tested in labs, going to try to apply this monday" [puppet] - 10https://gerrit.wikimedia.org/r/421617 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [20:17:59] (03CR) 10jerkins-bot: [V: 04-1] Use --new.consumer for main -> jumbo mirror maker [puppet] - 10https://gerrit.wikimedia.org/r/421617 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [20:18:25] (03PS2) 10Ottomata: Use --new.consumer for main -> jumbo mirror maker [puppet] - 10https://gerrit.wikimedia.org/r/421617 (https://phabricator.wikimedia.org/T189464) [20:20:04] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#4077192 (10mmodell) [20:20:45] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#2990470 (10mmodell) [20:21:21] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#2990470 (10mmodell) [20:21:31] 10Operations, 10Phabricator, 10RelEng-Archive-FY201718-Q1, 10Patch-For-Review: setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#4077212 (10mmodell) [20:26:42] (03PS1) 10Dzahn: check_long_procs: don't cut off username [puppet] - 10https://gerrit.wikimedia.org/r/421620 (https://phabricator.wikimedia.org/T181409) [20:28:49] 10Operations, 10Analytics, 10DBA, 10EventBus, and 6 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4077242 (10mobrovac) 05Open>03Resolved a:03Pchelolo I agree that all of the issues have been fixed, but to my understanding the scope of th... [20:33:02] (03PS1) 10Ottomata: Don't add IPv6 in Cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/421621 (https://phabricator.wikimedia.org/T190571) [20:34:20] (03CR) 10Ottomata: [C: 032] Don't add IPv6 in Cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/421621 (https://phabricator.wikimedia.org/T190571) (owner: 10Ottomata) [20:43:03] (03PS2) 10Dzahn: check_long_procs: don't cut off username [puppet] - 10https://gerrit.wikimedia.org/r/421620 (https://phabricator.wikimedia.org/T181409) [20:45:11] (03CR) 10Dzahn: [C: 032] check_long_procs: don't cut off username [puppet] - 10https://gerrit.wikimedia.org/r/421620 (https://phabricator.wikimedia.org/T181409) (owner: 10Dzahn) [20:59:18] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler03/10636/" [puppet] - 10https://gerrit.wikimedia.org/r/421617 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [21:05:10] (03PS3) 10Aaron Schulz: [WIP] Initial debianization [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 [21:14:20] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_statistics_mediawiki] [21:18:00] PROBLEM - puppet last run on stat1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_statistics_mediawiki] [21:31:08] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585#4077379 (10ayounsi) Aiming at doing the asw-b to asw2-b migration on April 10th (3pm UTC, 11am EDT, 8am PDT), 4h. This is for the sake of picking a date,... [21:31:51] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_statistics_mediawiki] [21:32:12] I think those need git prune too no_justification ^^ [21:35:17] !log delete indices for deleted wikis (from deleted.dblist) in eqiad and codfw elasticsearch clusters: alswikiquote, alswiktionary, mowiki, mowiktionary, ukwikimedia [21:35:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:49] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585#4077402 (10ayounsi) [21:46:10] 10Operations, 10Phabricator, 10Release-Engineering-Team: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077453 (10Dzahn) a:03Dzahn [21:47:04] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585#4077456 (10Marostegui) From a DB point of view I don't think that will be possible on that given date. Enwiki master is on that row, as well as one of... [21:48:32] (03CR) 10Dzahn: [C: 031] admin: contint-admins to restart Jenkins via systemd [puppet] - 10https://gerrit.wikimedia.org/r/408555 (https://phabricator.wikimedia.org/T190277) (owner: 10Hashar) [21:50:53] twentyafterfour mutante was wondering about phab's db. [21:51:03] because it is stored externally. [21:51:18] we may be blocked on codfw getting db servers for the db [21:51:20] like gerrit2001 [21:51:27] can phab in codfw connect to the db? [21:55:03] 10Operations, 10monitoring, 10Services (watching): Add Reading Infrastructure engineers to contacts for RI-maintained services - https://phabricator.wikimedia.org/T189524#4077494 (10Dzahn) p:05Triage>03High [22:37:24] 10Operations, 10monitoring, 10Services (watching): Add Reading Infrastructure engineers to contacts for RI-maintained services - https://phabricator.wikimedia.org/T189524#4077634 (10Dzahn) ``` 1312,1348d1311 < < define contact{ < contact_name mholloway < alias... [22:39:18] 10Operations, 10monitoring, 10Services (watching): Add Reading Infrastructure engineers to contacts for RI-maintained services - https://phabricator.wikimedia.org/T189524#4077635 (10Dzahn) created the contacts in private repo. now we can add new contactgroups in the public repo that use them as members the... [22:44:42] PROBLEM - Disk space on elastic1023 is CRITICAL: DISK CRITICAL - free space: /srv 61138 MB (12% inode=99%) [22:50:42] RECOVERY - Disk space on elastic1023 is OK: DISK OK [22:51:06] (03PS1) 10Dzahn: icinga: add contactgroups mobileapps, readinglists [puppet] - 10https://gerrit.wikimedia.org/r/421664 (https://phabricator.wikimedia.org/T189524) [22:55:32] (03PS2) 10Dzahn: icinga: add contactgroups mobileapps, readinglists [puppet] - 10https://gerrit.wikimedia.org/r/421664 (https://phabricator.wikimedia.org/T189524) [22:56:14] (03CR) 10Dzahn: [C: 032] icinga: add contactgroups mobileapps, readinglists [puppet] - 10https://gerrit.wikimedia.org/r/421664 (https://phabricator.wikimedia.org/T189524) (owner: 10Dzahn) [23:05:05] duuh.. "check for a missing closing bracket" of course :p [23:05:51] (03PS1) 10Dzahn: icinga: add missing closing bracket in contacts.cfg [puppet] - 10https://gerrit.wikimedia.org/r/421667 [23:06:22] (03PS2) 10Dzahn: icinga: add missing closing bracket in contacts.cfg [puppet] - 10https://gerrit.wikimedia.org/r/421667 [23:07:37] (03CR) 10Dzahn: [C: 032] icinga: add missing closing bracket in contacts.cfg [puppet] - 10https://gerrit.wikimedia.org/r/421667 (owner: 10Dzahn) [23:08:52] PROBLEM - Check correctness of the icinga configuration on einsteinium is CRITICAL: Icinga configuration contains errors [23:12:36] (03PS1) 10Dzahn: icinga: rename bsitzmann to bearnd in reading infra groups [puppet] - 10https://gerrit.wikimedia.org/r/421668 (https://phabricator.wikimedia.org/T189524) [23:12:59] (03CR) 10jerkins-bot: [V: 04-1] icinga: rename bsitzmann to bearnd in reading infra groups [puppet] - 10https://gerrit.wikimedia.org/r/421668 (https://phabricator.wikimedia.org/T189524) (owner: 10Dzahn) [23:33:59] (03PS2) 10Dzahn: icinga: rename bsitzmann to bearnd in reading infra groups [puppet] - 10https://gerrit.wikimedia.org/r/421668 (https://phabricator.wikimedia.org/T189524) [23:34:21] (03CR) 10Dzahn: [C: 032] icinga: rename bsitzmann to bearnd in reading infra groups [puppet] - 10https://gerrit.wikimedia.org/r/421668 (https://phabricator.wikimedia.org/T189524) (owner: 10Dzahn) [23:38:57] RECOVERY - Check correctness of the icinga configuration on einsteinium is OK: Icinga configuration is correct [23:47:37] PROBLEM - puppet last run on lvs5003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:49:10] (03PS1) 10Dzahn: icinga/mobileapps: add mobileapp contacts to service [puppet] - 10https://gerrit.wikimedia.org/r/421676 (https://phabricator.wikimedia.org/T189524) [23:49:32] (03CR) 10jerkins-bot: [V: 04-1] icinga/mobileapps: add mobileapp contacts to service [puppet] - 10https://gerrit.wikimedia.org/r/421676 (https://phabricator.wikimedia.org/T189524) (owner: 10Dzahn) [23:53:50] (03PS2) 10Dzahn: icinga/mobileapps: add mobileapp contacts to service [puppet] - 10https://gerrit.wikimedia.org/r/421676 (https://phabricator.wikimedia.org/T189524)