[00:06:34] RECOVERY - puppet last run on cp3038 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [01:27:04] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:55:04] RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [01:55:14] PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:00:06] !log l10nupdate@tin LocalisationUpdate failed: git pull of core failed [02:00:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:00:22] orly [02:01:22] oh look [02:01:24] another casualty [02:01:25] reedy@tin:/var/lib/l10nupdate/mediawiki/core$ sudo -u l10nupdate git pull [02:01:25] remote: internal server error [02:01:25] fatal: protocol error: bad pack header [02:05:04] !log fixed localisationupdate clone of mw core on tin due to T151676 [02:05:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:05:17] T151676: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676 [02:08:00] cba running manually, it'll run again in 24h [02:23:14] RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [03:23:14] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 747.02 seconds [03:40:14] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 180.12 seconds [04:07:24] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:08:54] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=3638.30 Read Requests/Sec=5309.10 Write Requests/Sec=721.30 KBytes Read/Sec=23965.60 KBytes_Written/Sec=11018.40 [04:14:54] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=0.50 Read Requests/Sec=15.30 Write Requests/Sec=1.20 KBytes Read/Sec=173.60 KBytes_Written/Sec=112.00 [04:35:24] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [05:06:24] PROBLEM - puppet last run on mw1193 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:35:24] RECOVERY - puppet last run on mw1193 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:51:04] PROBLEM - puppet last run on elastic1035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[apt-transport-https] [06:57:51] (03PS3) 10Marostegui: site.pp: Change db1044 binlog to ROW [puppet] - 10https://gerrit.wikimedia.org/r/323504 (https://phabricator.wikimedia.org/T150802) [06:59:20] (03CR) 10Marostegui: [C: 032] site.pp: Change db1044 binlog to ROW [puppet] - 10https://gerrit.wikimedia.org/r/323504 (https://phabricator.wikimedia.org/T150802) (owner: 10Marostegui) [07:03:58] !log Stop MySQL on db1044 - (depooled) maintenance - T150802 [07:04:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:04:12] T150802: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802 [07:05:11] (03PS2) 10Marostegui: sanitarium2.my.cnf: Disable parallel replication [puppet] - 10https://gerrit.wikimedia.org/r/323145 [07:06:29] (03CR) 10Marostegui: [C: 032] sanitarium2.my.cnf: Disable parallel replication [puppet] - 10https://gerrit.wikimedia.org/r/323145 (owner: 10Marostegui) [07:08:04] !log Stop MySQl on db1095 - maintenance T150802 [07:08:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:22] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Added comment for db1044 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323505 (https://phabricator.wikimedia.org/T150802) (owner: 10Marostegui) [07:17:05] (03Merged) 10jenkins-bot: db-eqiad.php: Added comment for db1044 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323505 (https://phabricator.wikimedia.org/T150802) (owner: 10Marostegui) [07:18:54] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Added comments to db1044 status - T150802 (duration: 00m 45s) [07:19:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:19:05] T150802: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802 [07:20:04] RECOVERY - puppet last run on elastic1035 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [07:27:44] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging [07:32:14] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:33:42] (03PS1) 10Marostegui: db-eqiad.php: Repool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323791 (https://phabricator.wikimedia.org/T151272) [07:35:26] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323791 (https://phabricator.wikimedia.org/T151272) (owner: 10Marostegui) [07:36:10] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323791 (https://phabricator.wikimedia.org/T151272) (owner: 10Marostegui) [07:38:26] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1092 - T151272 (duration: 00m 47s) [07:38:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:37] T151272: db1092 crash - https://phabricator.wikimedia.org/T151272 [07:39:05] 06Operations, 10DBA, 13Patch-For-Review: db1092 crash - https://phabricator.wikimedia.org/T151272#2826351 (10Marostegui) 05Open>03Resolved I have repooled this server after a week of no issues. [07:42:46] !log Deploying alter table s5 dewiki.revision - T148967 [07:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:59] T148967: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967 [07:47:24] PROBLEM - puppet last run on labcontrol1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:57:23] 06Operations, 10DBA, 10Wikidata, 10netops, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2824480 (10Marostegui) Holding connections on the master: if there are 5-10 jobs running it shouldn't be a big deal as I assume only 10 c... [08:00:14] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [08:12:04] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:15:24] RECOVERY - puppet last run on labcontrol1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [08:41:04] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [08:44:29] (03Abandoned) 10Elukey: Force a 404 on each HTTP request landing to a non configured domain [puppet] - 10https://gerrit.wikimedia.org/r/322268 (https://phabricator.wikimedia.org/T137176) (owner: 10Elukey) [09:05:31] (03CR) 10Giuseppe Lavagetto: [C: 031] "makes sense, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/323517 (https://phabricator.wikimedia.org/T137345) (owner: 10Elukey) [09:09:03] !log Deploy ALTER table (add an index) db1040 (master) commonswiki.revision - T147305 [09:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:09:16] T147305: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305 [09:25:19] 06Operations, 10DBA, 10Wikidata, 10netops, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2826465 (10jcrespo) @Manuel, @Daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads on the pool of co... [09:30:35] (03PS3) 10Elukey: Avoid Redis IPsec replication if the host doesn't need it. [puppet] - 10https://gerrit.wikimedia.org/r/323517 (https://phabricator.wikimedia.org/T137345) [09:38:02] (03CR) 10Elukey: [C: 032] Avoid Redis IPsec replication if the host doesn't need it. [puppet] - 10https://gerrit.wikimedia.org/r/323517 (https://phabricator.wikimedia.org/T137345) (owner: 10Elukey) [09:40:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:41:14] PROBLEM - puppet last run on dbproxy1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:14] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:41:18] elukey: good morning :) If you are into redis, I found out a CLI utility that shows random stats about redis instances https://github.com/junegunn/redis-stat :D [09:41:23] it has colors! [09:42:13] hashar: nice!!! [09:43:17] there is even an embedded web server to show some kind of dashboard [09:43:26] then we probably get all of that sent to statsd/grafana [09:46:02] hashar: what I'd like to try first is sending stats to prometheus, because we might achieve a simialar thing.. but the CLI is awesome [09:49:04] elukey: isn't there a prometheus / diamond connector ? [09:49:17] eg get Prometheus to syphon the data collected/defined via diamond [09:50:23] (03PS1) 10Elukey: Add mc1019 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/323805 (https://phabricator.wikimedia.org/T137345) [09:50:31] or maybe Prometheus can read from graphite [09:51:44] hashar: afaiu there is a prometheus light http server listening on a port for each server that you monitor, and then the prometheus masters (not sure if called in this way but you got what I mean) polls regularly [09:51:48] <_joe_> bi,' [09:51:51] <_joe_> errr [09:52:10] <_joe_> yes, don't use diamond for new things I'd say [09:52:40] <_joe_> but I'll ask godog to write some docs/guidance for everyone to follow, if it's not already there [09:54:52] 06Operations, 10Monitoring: dbstore1001 backup jobs failed between 2016-10-19 and 2016-11-23 - https://phabricator.wikimedia.org/T151579#2826507 (10Volans) p:05Triage>03Normal [09:55:49] 06Operations, 10ops-codfw: db2042 disk predictive failure - https://phabricator.wikimedia.org/T150974#2826508 (10Volans) p:05Triage>03Normal [09:56:42] (03CR) 10Elukey: [C: 032] Add mc1019 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/323805 (https://phabricator.wikimedia.org/T137345) (owner: 10Elukey) [09:58:24] 06Operations, 10ops-codfw, 10DBA: db2042 disk predictive failure - https://phabricator.wikimedia.org/T150974#2826515 (10Marostegui) [10:10:14] RECOVERY - puppet last run on dbproxy1004 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:13:03] 06Operations, 10Traffic: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686#2826529 (10Joe) a:05Joe>03None [10:14:39] 06Operations, 10Traffic: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686#2824625 (10Joe) Please do not assign tickets to me directly, as others that could have more time to fix this would not look into it assuming I am, which is not true at the moment. [10:17:44] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is OK: Files ownership is ok. [10:17:58] (03PS1) 10Elukey: Add temporary override to mc1019 hiera config to allow Redis config [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) [10:18:11] <_joe_> !log stopping the dedicated commonswiki htmlCacheUpdate job runner, T151196 [10:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:23] T151196: Job queue size growing since ~12:00 on 2016-11-19 - https://phabricator.wikimedia.org/T151196 [10:18:44] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is OK: Files ownership is ok. [10:19:07] !log fixed permissions of files in /srv/mediawiki-staging on tin and mira [10:19:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:06] 06Operations, 10DBA, 06Labs: fstrim: Operation not supported on Labs DBs - https://phabricator.wikimedia.org/T151746#2826574 (10Volans) [10:34:22] !log fixed permissions of old /var/log/hhvm/error.log-20160829 on osmium [10:34:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:21] (03PS2) 10Elukey: Add temporary override to mc1019 hiera config to allow Redis config [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) [10:45:17] (03PS1) 10Jcrespo: redactatron: Integrate centralauth redaction into cols.txt [software/redactatron] - 10https://gerrit.wikimedia.org/r/323809 (https://phabricator.wikimedia.org/T103011) [10:48:52] 06Operations: Cron conflict for kafkatee logrotate on oxygen - https://phabricator.wikimedia.org/T151748#2826635 (10Volans) [10:49:47] (03CR) 10Jcrespo: "Seems to work as intended:" [software/redactatron] - 10https://gerrit.wikimedia.org/r/323809 (https://phabricator.wikimedia.org/T103011) (owner: 10Jcrespo) [10:50:27] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:50:58] 06Operations, 10Traffic: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686#2826652 (10Paladox) I believe this was fixed when godog restarted the api servers. Because there were a lot of errors being logged in #wikimedia-operations [10:54:23] (03CR) 10MarcoAurelio: "I'll schedule this for SWAT shortly. If someone has any objection to this being merged, please say so." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321660 (https://phabricator.wikimedia.org/T150752) (owner: 10MarcoAurelio) [10:54:53] (03CR) 10Marostegui: [C: 031] redactatron: Integrate centralauth redaction into cols.txt [software/redactatron] - 10https://gerrit.wikimedia.org/r/323809 (https://phabricator.wikimedia.org/T103011) (owner: 10Jcrespo) [10:58:28] doctaxon: hi, are you still getting the 502 errors described in T151686? [10:58:29] T151686: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686 [10:59:29] (03CR) 10MarcoAurelio: [C: 031] wikitech cloudadmin: remove right that no longer exists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323708 (owner: 10Alex Monk) [11:19:27] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [11:24:33] (03PS3) 10Elukey: Add temporary override to mc1019 hiera config to allow Redis config [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) [11:25:12] (03PS3) 10Jcrespo: [WIP] Create script to check that sanitarium filtering is working [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) [11:25:58] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Create script to check that sanitarium filtering is working [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) (owner: 10Jcrespo) [11:28:27] (03PS4) 10Elukey: [WIP] Add temporary override to mc1019 hiera config to allow Redis config [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) [11:32:34] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/4683/" [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) (owner: 10Elukey) [11:37:58] 06Operations: Cron conflict for kafkatee logrotate on oxygen - https://phabricator.wikimedia.org/T151748#2826801 (10elukey) Thanks! I already tried to fix another similar issue for kafkatee+oxygen in T132322, mentioning it since I might be the root cause :) [11:38:37] PROBLEM - puppet last run on cp3038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:46:05] 06Operations, 07Puppet: Update puppet-lint to 2.* - https://phabricator.wikimedia.org/T144667#2826830 (10hashar) CI now runs puppet-lint from the generic job that runs `rake`, hence there is not much more we can do on CI side. Switching to puppet-lint 2 can be done as I described in my previous comment, and th... [11:46:25] 06Operations, 07Puppet, 07Epic, 07Need-volunteer, 13Patch-For-Review: align puppet-lint config with coding style - https://phabricator.wikimedia.org/T93645#2221613 (10hashar) There is a proposal to bump `puppet-lint` to version 2 at T144667 [11:53:57] (03Draft2) 10MarcoAurelio: Add arbcom_cswiki to $private_wikis in realm.pp [puppet] - 10https://gerrit.wikimedia.org/r/323814 (https://phabricator.wikimedia.org/T151731) [11:59:14] (03PS1) 10Giuseppe Lavagetto: docker: add package provider [puppet] - 10https://gerrit.wikimedia.org/r/323815 [11:59:16] (03PS1) 10Giuseppe Lavagetto: calico: add module/profile to use as kubernetes networking [puppet] - 10https://gerrit.wikimedia.org/r/323816 [12:05:14] 06Operations, 06Performance-Team, 10Thumbor: Thumbor 404s when the original has a ? in its filename - https://phabricator.wikimedia.org/T150760#2826898 (10Gilles) [12:07:37] RECOVERY - puppet last run on cp3038 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [12:17:19] !log Killed the Wikidata json dumpers on snapshot1007 due to T151356. Will be restarted once a fix has been deployed. [12:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:30] T151356: Wikibase\Repo\Store\Sql\SqlEntityIdPager::fetchIds query slow - https://phabricator.wikimedia.org/T151356 [12:35:26] !log Stop replication db1052 (depooled) - maintenance - T150802 [12:35:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:39] T150802: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802 [12:45:37] Reports of Malwarebytes blocking Wikipedia access, anything we can do about it? [12:46:40] 06Operations, 10Traffic, 13Patch-For-Review: varnishapi.py AttributeError: VSM_Close - https://phabricator.wikimedia.org/T151561#2826978 (10ema) 05Open>03Resolved a:03ema [12:55:44] (03PS1) 10Ema: varnish: rename varnishlog4.py into varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/323822 (https://phabricator.wikimedia.org/T150660) [12:56:35] (03PS1) 10Marostegui: db-codfw.php: Repool db2049 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323823 (https://phabricator.wikimedia.org/T150876) [12:59:00] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2049 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323823 (https://phabricator.wikimedia.org/T150876) (owner: 10Marostegui) [12:59:50] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2049 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323823 (https://phabricator.wikimedia.org/T150876) (owner: 10Marostegui) [13:01:48] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2049 - T150876 (duration: 00m 45s) [13:01:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:59] T150876: db2049 overheated and restarted - https://phabricator.wikimedia.org/T150876 [13:04:36] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: db2049 overheated and restarted - https://phabricator.wikimedia.org/T150876#2827020 (10Marostegui) 05Open>03Resolved This server didn't have any other issue and not even when it gots its cpu burned for some hours. So I have pooled it back [13:41:05] (03PS1) 10Tim Landscheidt: puppetmaster: Do not generate different password hashes on each run [puppet] - 10https://gerrit.wikimedia.org/r/323828 (https://phabricator.wikimedia.org/T151760) [13:43:48] (03CR) 10Elukey: [C: 031] varnish: rename varnishlog4.py into varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/323822 (https://phabricator.wikimedia.org/T150660) (owner: 10Ema) [13:43:56] (03PS1) 10Ema: varnish: make python varnish scripts skip PURGE requests [puppet] - 10https://gerrit.wikimedia.org/r/323829 (https://phabricator.wikimedia.org/T151643) [13:44:16] (03CR) 10Tim Landscheidt: "For testing I suggest:" [puppet] - 10https://gerrit.wikimedia.org/r/323828 (https://phabricator.wikimedia.org/T151760) (owner: 10Tim Landscheidt) [13:45:57] (03CR) 10Ema: [C: 032] varnish: rename varnishlog4.py into varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/323822 (https://phabricator.wikimedia.org/T150660) (owner: 10Ema) [13:46:00] (03CR) 10Elukey: [C: 031] varnish: make python varnish scripts skip PURGE requests [puppet] - 10https://gerrit.wikimedia.org/r/323829 (https://phabricator.wikimedia.org/T151643) (owner: 10Ema) [13:49:17] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/lib/python2.7/dist-packages/varnishlog.py] [13:57:05] 06Operations, 10Traffic, 13Patch-For-Review: python-varnishapi daemons seeing "Log overrun" constantly - https://phabricator.wikimedia.org/T151643#2823504 (10ema) Me and @elukey have been working on this a bit on Friday. Specifically, we have tried to bump `vsl_space`: ``` vsl_space · Units: by... [13:59:17] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [14:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161128T1400). [14:00:04] Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [14:01:10] I can SWAT today! [14:01:16] Urbanecm: around for SWAT? [14:06:22] 06Operations, 06Performance-Team, 10Thumbor: Thumbor should handle "temp" thumbnail requests - https://phabricator.wikimedia.org/T151441#2827124 (10Gilles) [14:06:42] 06Operations, 10Icinga, 10scap: expose hosts in maintenance state so we can prevent scap from running on them - https://phabricator.wikimedia.org/T100777#2827126 (10hashar) +#scap there might be another task already [14:07:20] Sorry for my lateness, I'm ready for SWAT. [14:07:35] no problem, I was just checking if you are around [14:07:44] starting with swat, you are the only one today [14:07:54] Okay. [14:09:26] (03PS5) 10Zfilipin: HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322247 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:09:45] Hello. I'll have a config patch to add in a few minutes. [14:09:57] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 602 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4461481 keys, up 28 days 5 hours - replication_delay is 602 [14:09:59] Dereckson: sure, I'm around, ping me when ready [14:10:15] * Dereckson nods. [14:11:32] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322247 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:11:42] (03CR) 10Urbanecm: [C: 031] "Fine for me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323695 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) [14:11:57] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4463437 keys, up 28 days 5 hours - replication_delay is 0 [14:11:57] db2068 should be under warranty [14:11:59] Urbanecm_: can you test 322247 at test server, once it is there? [14:12:11] (03Merged) 10jenkins-bot: HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322247 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:12:15] 322247 is testable, the others aren't. [14:12:34] Urbanecm_: yes, the other two are throttling, right [14:13:03] Yeah [14:13:59] 06Operations, 05Goal: Make services manageable by systemd - https://phabricator.wikimedia.org/T97402#2827146 (10hashar) 05Open>03Resolved a:03hashar Most services have or are being migrated to Jessie and they are ported to systemd/base::service_unit in the process. I dont think this task services any p... [14:14:15] (03PS3) 10Aude: Move interwiki sorting orders to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323556 (https://phabricator.wikimedia.org/T111023) [14:14:30] Urbanecm_: 322247 is at mwdebug1002, please test [14:14:36] (03CR) 10Aude: [C: 032] "need to verify that the meta wiki sorting orders are not missing anything" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323556 (https://phabricator.wikimedia.org/T111023) (owner: 10Aude) [14:15:16] Going to test it. [14:15:55] (03CR) 10Aude: [C: 04-2] "@daniel InterwikiSorting is not yet deployed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323556 (https://phabricator.wikimedia.org/T111023) (owner: 10Aude) [14:16:11] zeljkof: 322247 works, you can deploy it. [14:16:24] Urbanecm_: ok, deploying [14:16:28] Thanks [14:16:53] (03PS1) 10Rush: labsdb: cleanup labs/db/views presentation [puppet] - 10https://gerrit.wikimedia.org/r/323830 [14:16:57] (03PS4) 10Jcrespo: [WIP] Create script to check that sanitarium filtering is working [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) [14:18:10] !log zfilipin@tin Synchronized static/images/project-logos/: SWAT: [[gerrit:322247|HD logos for multiple wikis (T150618)]] (duration: 00m 47s) [14:18:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:21] T150618: [GOAL] HD logo for all Wikipedias - https://phabricator.wikimedia.org/T150618 [14:18:42] Urbanecm_: logos are up, now the settings file... [14:18:51] Okay... [14:19:11] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:322247|HD logos for multiple wikis (T150618)]] (duration: 00m 49s) [14:19:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:31] Urbanecm_: ok, all done, please test on production sites [14:19:42] gah, sorry [14:19:50] (03PS3) 10Zfilipin: [throttle] Exception for #MOWomenOnWikipedia Edit-A-Thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323555 (https://phabricator.wikimedia.org/T151650) (owner: 10Urbanecm) [14:19:57] clicked +2 then clicked -2 (what i wanted) [14:20:19] zeljkof: working. [14:20:36] Urbanecm_: great, working on 323555 [14:20:41] zeljkof: okay [14:20:47] PROBLEM - HP RAID on db2068 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, Controller, Battery/Capacitor - Failed: 1I:1:4 [14:20:48] ACKNOWLEDGEMENT - HP RAID on db2068 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, Controller, Battery/Capacitor - Failed: 1I:1:4 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T151763 [14:20:52] 06Operations, 10ops-codfw: Degraded RAID on db2068 - https://phabricator.wikimedia.org/T151763#2827157 (10ops-monitoring-bot) [14:21:55] I will take care of that [14:21:58] 06Operations, 10ops-codfw: Degraded RAID on db2068 - https://phabricator.wikimedia.org/T151763#2827157 (10jcrespo) This one should be under warranty. [14:22:19] ^ that was fast jaime! :) [14:23:25] See my comment at 14:11:57: db2068 should be under warranty [14:24:17] Urbanecm_: looks like jenkins is busy, still waiting for rebase [14:24:39] zeljkof: Okay, please poke me when I'll be required to do something. [14:24:57] PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:24:59] (03PS3) 10Rush: Adding views for two PageAssessments tables for Labs [puppet] - 10https://gerrit.wikimedia.org/r/321845 (owner: 10Kaldari) [14:25:01] Urbanecm_: is there anything you can check once both patches are deployed? [14:25:28] zeljkof: I think I can't check anything before and even after deployment... [14:26:15] Urbanecm_: that's what I've thought :) in that case, you are pretty much free to go :) I will finish the deploy [14:26:29] (03PS1) 10Dereckson: Allow sysop to revoke users from some groups on ne.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323833 (https://phabricator.wikimedia.org/T148171) [14:26:41] (03CR) 10Rush: [C: 032 V: 032] Adding views for two PageAssessments tables for Labs [puppet] - 10https://gerrit.wikimedia.org/r/321845 (owner: 10Kaldari) [14:26:44] Okay, thanks zeljkof. [14:26:48] Bye! [14:27:36] 06Operations, 10ops-codfw: Degraded RAID on db2068 - https://phabricator.wikimedia.org/T151763#2827173 (10Marostegui) That output is indeed correct, the disk has failed and should be replaced: ``` hpssacli ctrl slot=0 pd all show detail physicaldrive 1I:1:4 Port: 1I Box: 1 Bay: 4... [14:27:45] zeljkof: this one: https://gerrit.wikimedia.org/r/323833 - I'm adding it to the calendar. [14:28:07] I also see we've a throttle rule: 323555, I'll look this one just after to check the date [14:28:13] Dereckson: ok, not sure what is going on with jenkins, waiting for one rebase for a long time... [14:29:26] ah you've already merged 323555 [14:29:28] hashar: any idea why is jenkins so busy? 323555,3 is waiting for 9 minutes for rebase :( https://integration.wikimedia.org/zuul/ [14:29:36] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Create script to check that sanitarium filtering is working [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) (owner: 10Jcrespo) [14:29:47] zeljkof: it is full [14:29:57] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [14:30:01] Dereckson: I'm still waiting for 323555 to rebase :( [14:30:05] just +2 it [14:30:07] PROBLEM - cassandra-c CQL 10.64.32.204:9042 on restbase1012 is CRITICAL: connect to address 10.64.32.204 and port 9042: Connection refused [14:30:07] PROBLEM - cassandra-c CQL 10.64.0.119:9042 on restbase1011 is CRITICAL: connect to address 10.64.0.119 and port 9042: Connection refused [14:30:10] that will enter the gate-and-submit [14:30:17] PROBLEM - cassandra-c SSL 10.64.0.119:7001 on restbase1011 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [14:30:17] hashar: ok, thanks [14:30:17] which takes priority over everything else [14:30:17] PROBLEM - cassandra-c SSL 10.64.32.204:7001 on restbase1012 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [14:30:25] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323555 (https://phabricator.wikimedia.org/T151650) (owner: 10Urbanecm) [14:30:27] PROBLEM - cassandra-c service on restbase1011 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed [14:30:46] the pool will be made larger tomorrow [14:30:48] that will help [14:30:57] PROBLEM - Check systemd state on restbase1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:30:57] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4466567 keys, up 28 days 6 hours - replication_delay is 0 [14:31:06] (03Merged) 10jenkins-bot: [throttle] Exception for #MOWomenOnWikipedia Edit-A-Thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323555 (https://phabricator.wikimedia.org/T151650) (owner: 10Urbanecm) [14:31:07] PROBLEM - Check systemd state on restbase1012 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:31:27] PROBLEM - cassandra-c service on restbase1012 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed [14:31:28] !log labsdb1001 maintain-views --databases testwiki enwikivoyage enwiki --table page_assessments --debug [14:31:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:52] (03PS1) 10Niharika29: Convert wikis to numerical sorting and uca collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323834 (https://phabricator.wikimedia.org/T149002) [14:31:55] (03PS2) 10Dereckson: Allow sysop to revoke users from some groups on ne.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323833 (https://phabricator.wikimedia.org/T148171) [14:33:34] !log labsdb1001 maintain-views --databases testwiki enwikivoyage enwiki --table page_assessments_projects --debug [14:33:36] hashar: should I rebase 323557 (another patch) or just +2 it? [14:33:38] !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:323555|[throttle] Exception for #MOWomenOnWikipedia Edit-A-Thon (T151650)]] (duration: 00m 45s) [14:33:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:51] (03PS2) 10Zfilipin: [throttle] Remove old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323557 (owner: 10Urbanecm) [14:33:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:51] zeljkof: your guess? [14:33:51] T151650: #MOWomenOnWikipedia Edit-A-Thon on 2016-12-01 - https://phabricator.wikimedia.org/T151650 [14:33:59] zeljkof: patches need to be fast forward [14:34:02] so you always need to rebase [14:34:06] then just +2 [14:34:10] hashar: well, I can rebase, but then not wait for jenkins :) [14:34:13] exactly [14:34:21] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323557 (owner: 10Urbanecm) [14:34:24] though you want to make sure the rebased patch is still valid (most of the time it will :D ) [14:35:15] (03Merged) 10jenkins-bot: [throttle] Remove old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323557 (owner: 10Urbanecm) [14:36:12] 06Operations, 10ops-codfw, 10DBA: Degraded RAID on db2068 - https://phabricator.wikimedia.org/T151763#2827191 (10Volans) p:05Triage>03Normal [14:37:23] !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:323557|[throttle] Remove old throttle rules]] (duration: 00m 44s) [14:37:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:55] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323833 (https://phabricator.wikimedia.org/T148171) (owner: 10Dereckson) [14:39:01] (03PS3) 10Zfilipin: Allow sysop to revoke users from some groups on ne.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323833 (https://phabricator.wikimedia.org/T148171) (owner: 10Dereckson) [14:39:10] (03CR) 10Zfilipin: Allow sysop to revoke users from some groups on ne.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323833 (https://phabricator.wikimedia.org/T148171) (owner: 10Dereckson) [14:39:18] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323833 (https://phabricator.wikimedia.org/T148171) (owner: 10Dereckson) [14:39:55] (03Merged) 10jenkins-bot: Allow sysop to revoke users from some groups on ne.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323833 (https://phabricator.wikimedia.org/T148171) (owner: 10Dereckson) [14:42:57] RECOVERY - Check systemd state on restbase1011 is OK: OK - running: The system is fully operational [14:43:27] RECOVERY - cassandra-c service on restbase1011 is OK: OK - cassandra-c is active [14:44:27] RECOVERY - cassandra-c SSL 10.64.0.119:7001 on restbase1011 is OK: SSL OK - Certificate restbase1011-c valid until 2017-09-12 15:34:08 +0000 (expires in 288 days) [14:44:33] Dereckson: can you test 323833 at mwdebug1002, once it is there? [14:44:36] yes [14:45:07] RECOVERY - cassandra-c CQL 10.64.0.119:9042 on restbase1011 is OK: TCP OK - 0.000 second response time on 10.64.0.119 port 9042 [14:45:11] Dereckson: ok, it's there, please test [14:46:09] oh, mwdebug1002, not mwdebug1001, ok [14:46:38] Dereckson: that's what the docs say to do :) https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#Test_Canary [14:46:43] works [14:46:49] ok, deploying then [14:47:46] zeljkof: so, to test anything related to user rights group, it's testable by checking https://en.wikipedia.org/wiki/Special:ListGroupRights or the equivalent for the wiki modified [14:51:38] Dereckson: thanks [14:52:05] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:323833|Allow sysop to revoke users from some groups on ne.wikipedia (T148171)]] (duration: 00m 45s) [14:52:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:17] T148171: Proposal for changing Importer, Patroller, Rollbackers & Autopatrolled user right settings on Nepali Wikipedia - https://phabricator.wikimedia.org/T148171 [14:52:25] Thanks for the deploy. [14:52:27] Dereckson: deployed, please check on production [14:52:57] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [14:53:07] RECOVERY - Check systemd state on restbase1012 is OK: OK - running: The system is fully operational [14:53:27] RECOVERY - cassandra-c service on restbase1012 is OK: OK - cassandra-c is active [14:53:54] !log EU SWAT finished [14:54:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:47] thank you for deploying with #releng, see you soon! :) [14:55:07] RECOVERY - cassandra-c CQL 10.64.32.204:9042 on restbase1012 is OK: TCP OK - 0.000 second response time on 10.64.32.204 port 9042 [14:55:17] RECOVERY - cassandra-c SSL 10.64.32.204:7001 on restbase1012 is OK: SSL OK - Certificate restbase1012-c valid until 2017-09-12 15:34:16 +0000 (expires in 288 days) [14:59:35] (03PS2) 10Ema: varnish: make python varnish scripts skip PURGE requests [puppet] - 10https://gerrit.wikimedia.org/r/323829 (https://phabricator.wikimedia.org/T151643) [14:59:49] (03CR) 10Ema: [C: 032 V: 032] varnish: make python varnish scripts skip PURGE requests [puppet] - 10https://gerrit.wikimedia.org/r/323829 (https://phabricator.wikimedia.org/T151643) (owner: 10Ema) [15:00:05] addshore: Dear anthropoid, the time has come. Please deploy RevisionSlider updates (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161128T1500). [15:00:05] Addshore: A patch you scheduled for RevisionSlider updates is about to be deployed. Please be available during the process. [15:00:06] zeljkof: awesome! my go! :) [15:00:07] PROBLEM - check_mysql on lutetium is CRITICAL: Cant connect to local MySQL server through socket /tmp/mysql.sock (2) [15:00:29] addshore: go! :) [15:05:07] RECOVERY - check_mysql on lutetium is OK: Uptime: 184 Threads: 1 Questions: 24956 Slow queries: 0 Opens: 5084 Flush tables: 2 Open tables: 64 Queries per second avg: 135.630 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [15:06:44] bah, zeljkof git fetch was working fine for you I am guessing on tin? [15:07:07] internal server error, fatal: protocol error: bad pack header for me... I think I just saw a ticket about this [15:07:07] addshore: yes. problems? [15:07:17] uh oh [15:07:20] did I break anything? [15:07:42] https://phabricator.wikimedia.org/T151676 [15:07:59] (03PS1) 10Gehel: elasticsearch - Relforge cluster should be accessible from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/323839 [15:08:32] addshore: see mail from chad to wikitech, I think [15:08:42] there was something wrong with gerrit over the weekend [15:09:34] ahh okay, well, I'll give that a go on tin for core [15:13:04] !log fixed /srv/mediawiki-staging/php-1.29.0-wmf.3 clone of mw core on tin due to T151676 [15:13:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:16] T151676: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676 [15:14:16] (03CR) 10DCausse: [C: 031] elasticsearch - Relforge cluster should be accessible from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/323839 (owner: 10Gehel) [15:19:19] (03CR) 10Faidon Liambotis: [C: 04-1] RAID: get RAID status improvement for MegaCLI [puppet] - 10https://gerrit.wikimedia.org/r/322249 (https://phabricator.wikimedia.org/T151043) (owner: 10Volans) [15:20:35] (03PS2) 10Gehel: elasticsearch - Relforge cluster should be accessible from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/323839 [15:21:36] hmm, zeljkof something is still not going right for me here.... :/ [15:23:01] hashar, ostriches: could you take a look at problems addshore is having? [15:23:55] (03CR) 10Gehel: "Puppet compiler seems to agree: https://puppet-compiler.wmflabs.org/4686/" [puppet] - 10https://gerrit.wikimedia.org/r/323839 (owner: 10Gehel) [15:23:57] I have merged 4 things in extensions/RevisionSlider on php-1.29.0-wmf.3 branch, and rebased and updated submodules as per normal [15:24:18] scap pulled to mwdebug1002, but the code updates are not there. [15:24:34] poor addshore [15:25:09] supposedly mediawiki/core clones from this week-end issue should all have been fixed [15:25:15] just prior to this I have to fix the clone [15:25:47] I got the error exactly as was in the email & ticket, followed the steps and then the git fetch worked just fine. [15:25:50] so you get code on tin which is not pulled ? [15:26:57] wait, no, the code updates for some reaosn are not on tin either [15:27:17] It looks like it depends on the git version as to whether things break [15:27:19] (03CR) 10Alexandros Kosiaris: [C: 031] elasticsearch - Relforge cluster should be accessible from labs networks (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/323839 (owner: 10Gehel) [15:27:22] Your branch and 'origin/wmf/1.29.0-wmf.3' have diverged, and have 4 and 4 different commits each, respectively. [15:29:04] okay "git log HEAD..origin/wmf/1.29.0-wmf.3" still shows my commits [15:29:04] (03PS3) 10Jcrespo: Add arbcom_cswiki to $private_wikis in realm.pp [puppet] - 10https://gerrit.wikimedia.org/r/323814 (https://phabricator.wikimedia.org/T151731) (owner: 10MarcoAurelio) [15:29:44] git rebase results in "You have unstaged changes." [15:30:09] apparently I overlooked that on the first pass [15:30:24] (03CR) 10Jcrespo: [C: 032] Add arbcom_cswiki to $private_wikis in realm.pp [puppet] - 10https://gerrit.wikimedia.org/r/323814 (https://phabricator.wikimedia.org/T151731) (owner: 10MarcoAurelio) [15:31:12] (03PS3) 10Gehel: elasticsearch - Relforge cluster should be accessible from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/323839 [15:32:29] hashar: would the apparent change in api.php on tin on the branch be causing this (relating to https://phabricator.wikimedia.org/T151702) ? [15:32:56] addshore: turn it into a local commit? [15:32:58] (03PS2) 10Andrew Bogott: puppetmaster: Do not generate different password hashes on each run [puppet] - 10https://gerrit.wikimedia.org/r/323828 (https://phabricator.wikimedia.org/T151760) (owner: 10Tim Landscheidt) [15:33:08] And kick ori for leaving it as a live hack? ;) [15:33:22] tbh, it could be a commit on gerrit [15:34:14] addshore: or you stash and unstash for simplicity [15:34:23] (03CR) 10Andrew Bogott: [C: 032] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/323828 (https://phabricator.wikimedia.org/T151760) (owner: 10Tim Landscheidt) [15:34:35] and if that is a local hack [15:34:40] definitely wants to be either sent to Gerrit [15:34:48] or exported under /srv/patches [15:35:06] !log restarting db1069 (sanitarium) instances to apply new replication filters T151752 [15:35:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:16] T151752: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752 [15:35:20] ^this may create temporary lag (a few minutes) on labs replicas [15:35:22] /srv/patches isn't necessary [15:35:26] it wasn't a security fix [15:35:49] live hacks must be tracked somewhere [15:35:53] either gerrit or /srv/patches [15:36:10] we can't have dozens of developers randomly crafting commits in randoms repos [15:36:34] hashar: sure, I'm meaning it should just go on gerrit [15:36:34] (03CR) 10Daniel Kinzler: [C: 031] Use entity types for the repoNamespaces Wikibase client setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323347 (owner: 10Hoo man) [15:36:41] !log Turned change by ori to local commit on tin in core on .3 branch [15:36:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:05] Reedy: if it is not sensible sure :} [15:37:13] sorry, I misunderstood [15:37:19] it's a public patch... to fix an explosion onwiki [15:37:33] :) [15:38:03] okay, and now my code is on mwdebug1002 just fine :) Thanks! [15:38:17] That all can down to be overlooking the failed rebase the first time around.... [15:38:19] *me [15:39:03] !log addshore@tin Started scap: RevisionSlider updates - [[gerrit:323384]], [[gerrit:323520]], [[gerrit:323521]], [[gerrit:323808]] [15:39:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:08] (03PS4) 10Gehel: elasticsearch - Relforge cluster should be accessible from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/323839 [15:45:19] marostegui, could have I accidentaly reverted your alter table, or did it already finish? [15:45:38] which one? db1069? [15:45:43] yes [15:45:53] let me check [15:46:10] (03CR) 10Gehel: [C: 032] elasticsearch - Relforge cluster should be accessible from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/323839 (owner: 10Gehel) [15:46:28] jynus: yep, it died [15:46:34] :-/ [15:46:56] how long was it running for, 7 hours? [15:47:20] maybe it is not worth it for labs [15:47:24] yes, around that [15:47:40] is it revision? [15:47:45] yep [15:47:54] if it is, I would skip db1069 and labs1/3 [15:48:30] jynus: sounds good to me, less alters :) [15:48:43] too much lag for not many rewards [15:48:53] we can do them on labsdb1009/10/11 [15:49:10] much faster and innodb [15:49:49] sure, I was doing it in across the shard for consistency really [15:50:11] db1095 is importing now, right? [15:50:42] we need to restart it when it finishes for T151752 [15:50:43] T151752: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752 [15:50:57] jynus: yes it is [15:51:18] jynus: I think it will be ready tomorrow morning, is that a blocker? [15:52:25] well, it can be added at any time [15:55:08] 06Operations, 10ops-ulsfo, 10netops: lvs4002 power supply failure - https://phabricator.wikimedia.org/T151273#2827362 (10RobH) Since both this and recently died power supply on cp4008 are out of warranty, the current plan is to steal the other power supply from cp4008 to replace the bad one in lvs4002. [15:59:03] (03PS1) 10MarcoAurelio: [DNM] Initial configuration for arbcom_cs.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323843 (https://phabricator.wikimedia.org/T151731) [16:01:59] (03PS2) 10MarcoAurelio: [DNM] Initial configuration for arbcom_cs.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323843 (https://phabricator.wikimedia.org/T151731) [16:05:15] jynus, to clarify, the patch to block replication of arbcom_cswiki is not active on all servers it needs to be yet? [16:05:36] Krenair, I am preciselly now deploying it [16:05:47] but I think it has to wait for deploy on sanitarium2 [16:05:53] because there is maintanance ongoing [16:05:59] will clarify on the ticket soon [16:06:13] ty [16:07:01] jynus / Krenair Initial configuration is now uploaded as well, but of course it won't be deployed yet [16:07:03] (03CR) 10Alex Monk: [C: 04-2] "It seems the DB servers may not be ready to block replication of this yet, marking as CR-2 until status is clarified" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323843 (https://phabricator.wikimedia.org/T151731) (owner: 10MarcoAurelio) [16:07:33] yep, just leaving a -2 to avoid any nasty surprises [16:08:16] (03CR) 10MarcoAurelio: "This does not need CR-2. It's already tagged as DNM (do not merge) and it's of course waiting for the pertinent approvals." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323843 (https://phabricator.wikimedia.org/T151731) (owner: 10MarcoAurelio) [16:08:32] meh, just saw [16:09:20] (03PS2) 10Rush: labsdb: cleanup labs/db/views presentation [puppet] - 10https://gerrit.wikimedia.org/r/323830 [16:09:34] (03CR) 10Alex Monk: "No patch truly *needs* -2, we use it anyway for good measure" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323843 (https://phabricator.wikimedia.org/T151731) (owner: 10MarcoAurelio) [16:10:09] 06Operations, 10Monitoring: icinga "max concurrent checks" limits reached - https://phabricator.wikimedia.org/T1242#2827381 (10akosiaris) 05Open>03Resolved a:03akosiaris This has been resolved. The limit has been raised since the deprecation of check_sslXNN and the migration of neon to einsteinium/tegmen... [16:10:21] Does jenkins know about DNM? [16:10:25] I bet he'll just merge it anyway [16:10:30] (03PS3) 10GWicke: RESTBase configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323696 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) [16:10:40] Reedy: I doubt it [16:10:48] addshore: Jenkins is a dick [16:10:56] (03CR) 10GWicke: [C: 031] RESTBase configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323696 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) [16:11:30] xD [16:13:23] shall arbcom_cs be added to https://phabricator.wikimedia.org/diffusion/ODNS/browse/master/templates/wikimedia.org;c82c8257b2e8c5296d1c48150e9366576dc2bb11$536 ? [16:14:01] yes mafk [16:14:12] that repo [16:14:14] but not that file [16:14:50] (03CR) 10Jcrespo: [C: 031] labsdb: cleanup labs/db/views presentation [puppet] - 10https://gerrit.wikimedia.org/r/323830 (owner: 10Rush) [16:14:59] it's in templates/wikipedia.org [16:15:25] says 'parking' [16:16:09] (03PS1) 10Rush: maintain-views: allow table specification for custom view [puppet] - 10https://gerrit.wikimedia.org/r/323848 [16:16:43] templates/wikipedia.org:arbcom-de 600 IN DYNA geoip!text-addrs [16:16:58] mafk, which file says that? [16:18:01] I'm curious, how resilient is SCAP, for example, if my ssh session dropped mid sync... what would happen? [16:18:19] (03PS2) 10Rush: maintain-views: allow table specification for custom view [puppet] - 10https://gerrit.wikimedia.org/r/323848 [16:18:24] addshore: I think it'll keep running [16:18:25] addshore: it would die mid-sync [16:18:36] Reedy: would it? [16:18:39] Dunno... [16:18:42] Hah! [16:18:50] addshore: use tmux (or screen) [16:18:53] I know with agent forwarding, it would cause problems [16:18:54] the python script is attached to the tty of the ssh session [16:19:29] Krenair, mafk: https://phabricator.wikimedia.org/T151752#2827396 [16:19:56] We don't use agent forwarding [16:20:04] jynus: certainly not an emergency [16:20:11] Dereckson: yes, I should use screen (not used tmux before) for scap syncs, I always forget how long they take... [16:20:18] Krenair: But we did [16:20:19] Reedy: yeah, we got rid of that bit but the command and control process on the deploy server is still attached to a tty so I'm pretty sure it would die with the ssh session [16:20:27] jynus, thanks. we're in no rush to get wikis created, let's wait [16:20:41] tmux > screen [16:21:00] Maybe we should try and find out someday [16:21:05] syncing a readme or something [16:21:18] readme will finish too soon [16:21:50] addshore: tmux offers you a nice config, an easy way to divide the window in panels and a nice bottom bar, it uses a client/server model [16:21:55] mafk: the import jynus mentions should be finished tomorrow or maybe even later today [16:22:21] Dereckson: by your description it sounds like I have seen people using that before! [16:22:23] marostegui: dije que /no/ era una emergencia :) [16:22:35] addshore: http://www.dayid.org/comp/tm.html will explain you how to do in tmux what you currently do in screen [16:22:43] mafk: haha :) [16:23:18] I think sometimes people don't understand what I try to say. Probably my fault. [16:23:22] mafk: you're also handling dns/apache part? [16:23:28] Dereckson: yep [16:23:32] ok [16:23:35] sort of [16:23:46] If I'm lost, I'll ring the bell [16:23:47] mafk: No, I did, I was just giving some more extra information so you could know when it will be finished :-) [16:23:53] 06Operations, 10Beta-Cluster-Infrastructure, 10Gerrit, 13Patch-For-Review: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2827415 (10ArielGlenn) A git pull on tin for ​operations/dumps.git worked just fine. Maybe knock that off the list? [16:27:57] 06Operations, 10Traffic: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686#2827426 (10elukey) @Paladox: would you mind to write down the URLs that you are trying to access? I know that you put all the data but having the actual links would help me and other people not super used to write API... [16:28:20] (03PS1) 10Thcipriani: SCAP: Bump version to 3.4.0-1 [puppet] - 10https://gerrit.wikimedia.org/r/323852 [16:28:46] (03CR) 10Thcipriani: [C: 04-1] "Needs new package on carbon before approval" [puppet] - 10https://gerrit.wikimedia.org/r/323852 (owner: 10Thcipriani) [16:28:47] 06Operations, 10ops-codfw: mw2092 - disk issue - https://phabricator.wikimedia.org/T151427#2827428 (10Papaul) a:03Papaul [16:29:10] (03Draft2) 10MarcoAurelio: DNS configuration for arbcom_cs.wikipedia.org [dns] - 10https://gerrit.wikimedia.org/r/323851 (https://phabricator.wikimedia.org/T151731) [16:29:23] 06Operations, 10Traffic: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686#2827429 (10Paladox) @elukey Hi it should all be fixed now. The link I tried was en.wikipedia.org, trying to add something to the watchlist and then removed it caused problems but should all be fixed now by godog restar... [16:29:47] DNS config: https://gerrit.wikimedia.org/r/#/c/323851/ [16:29:54] 06Operations, 10Gerrit, 07Beta-Cluster-reproducible, 13Patch-For-Review: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2827430 (10Krenair) [16:30:19] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169#2827435 (10hashar) `gfind . -type f ! -path '*/.git/*' -a ! -name '*.*' -exec file --mime {} +|grep x-python|cut -d\ -f1` sho... [16:31:11] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Add shell scripts CI validations - https://phabricator.wikimedia.org/T148494#2827438 (10hashar) 170 files based on `gfind . -type f ! -path '*/.git/*' -a ! -name '*.*' -exec file --mime {} +|grep x-shell|cut -d\ -f1` ``` ./fil... [16:34:03] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Shell access to californium for bd808 - https://phabricator.wikimedia.org/T151424#2816784 (10RobH) @Volans: Please note the 3 day wait only applies on business days. So the 3 day wait actually ends on Monday, November 28th. Thanks! [16:34:28] !log addshore@tin Finished scap: RevisionSlider updates - [[gerrit:323384]], [[gerrit:323520]], [[gerrit:323521]], [[gerrit:323808]] (duration: 55m 25s) [16:34:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:18] !log RevisionSlider updates window finished 35 mins late! [16:35:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:22] (03PS5) 10Jcrespo: Create script to check that sanitarium filtering is working [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) [16:43:45] 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2827486 (10GWicke) @gilles, the client can communicate the exact format(s) it prefers using either the URL, or via the Accept header. For the vast ma... [16:44:00] (03CR) 10Rush: [C: 032] maintain-views: allow table specification for custom view [puppet] - 10https://gerrit.wikimedia.org/r/323848 (owner: 10Rush) [16:44:02] (03CR) 10jenkins-bot: [V: 04-1] Create script to check that sanitarium filtering is working [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) (owner: 10Jcrespo) [16:45:31] 06Operations, 06Multimedia, 10Traffic, 13Patch-For-Review, and 2 others: Thumbnails failing to render sporadically (ERR_CONNECTION_CLOSED or ERR_SSL_BAD_RECORD_MAC_ALERT) - https://phabricator.wikimedia.org/T148917#2827488 (10BBlack) Can't be a regression of the specific TLS bug we had here. [16:46:17] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:31] (03PS3) 10Rush: labsdb: cleanup labs/db/views presentation [puppet] - 10https://gerrit.wikimedia.org/r/323830 [16:49:56] (03PS1) 10Marostegui: db-codfw.php: Depool db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323854 (https://phabricator.wikimedia.org/T149553) [16:49:58] <_joe_> !log re-started the commonswiki htmlCacheUpdate dedicated jobrunner (T151196) [16:50:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:09] T151196: Job queue size growing since ~12:00 on 2016-11-19 - https://phabricator.wikimedia.org/T151196 [16:50:16] <_joe_> !log upgrading HHVM on canaries [16:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:03] (03PS2) 10Hoo man: Use entity types for the repoNamespaces Wikibase client setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323347 [16:52:29] 06Operations, 10Analytics, 10Monitoring: Switch jmxtrans from statsd to graphite line protocol - https://phabricator.wikimedia.org/T73322#2827542 (10Milimetric) 05Open>03declined Shifting focus to using Prometheus instead. [16:55:07] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [16:55:32] ^^^ boron...fixing. [16:55:57] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:56:57] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [16:58:03] (03PS2) 10Dzahn: admin: create group striker-users [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) [17:00:07] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 237 seconds ago with 0 failures [17:02:27] (03CR) 10Andrew Bogott: [C: 031] "This needs to wait until Tuesday, right? Or Wednesday?" [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) (owner: 10Dzahn) [17:02:41] (03CR) 10MarcoAurelio: [C: 031] Allow __NOINDEX__ on all namespaces on meta. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321713 (https://phabricator.wikimedia.org/T150245) (owner: 10Dereckson) [17:04:55] (03CR) 10MarcoAurelio: "Another solution would be to add this to the 'shell' user group and make it implicit for bots, sysops, etc." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303183 (owner: 10BryanDavis) [17:07:38] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169#2827575 (10Volans) @hashar thanks for the update, this is still on my radar, but I didn't had time to work on/drive it yet [17:14:22] (03CR) 10Daniel Kinzler: "I'm not sure I understand the intended migration path. Can we have that as a check-list in the ticket?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323556 (https://phabricator.wikimedia.org/T111023) (owner: 10Aude) [17:15:17] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:15:19] (03CR) 10BryanDavis: "> Another solution would be to add this to the 'shell' user group and" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303183 (owner: 10BryanDavis) [17:17:51] (03CR) 10MarcoAurelio: "So an 'autopatrolled' with 'autopatrol' => 'true', and then add 'autopatrol' to shell, shellmanagers, sysops, cloudadmins and contentadmin" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303183 (owner: 10BryanDavis) [17:20:57] <_joe_> !log turned on the second commonswiki htmlCacheUpdate dedicated jobrunner (T151196) [17:21:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:08] T151196: Job queue size growing since ~12:00 on 2016-11-19 - https://phabricator.wikimedia.org/T151196 [17:22:49] (03CR) 10Jcrespo: [C: 031] db-codfw.php: Depool db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323854 (https://phabricator.wikimedia.org/T149553) (owner: 10Marostegui) [17:23:57] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:25:57] RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:30:16] bblack: hi! quick question... in the event of a need to run purgeList.php to remove some banner from Varnish, who has permissions to do so? Any deployer? Is it, just follow the instructions here: https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment#Run_a_maintenance_script_on_a_wiki ? [17:31:00] AndyRussG: anyone with access to tin can [17:31:18] cat/echo | mwscript purgeList.php [17:31:19] profit [17:32:55] Reedy: cool thx! Mmm I don't understand what u meant here: "cat/echo | mwscript purgeList.php" ^ [17:33:08] (pipe from an empty echo?) [17:33:17] well, cat/echo isn't valid either [17:33:20] either as a one liner [17:33:21] * AndyRussG tries it [17:33:36] echo "https://foo.bar/whatever" | mwscript purgeList.php [17:33:51] or a cat a file if you've some list of it [17:34:00] Reedy: ah right gotcha! [17:35:32] Yeah I didn't notice the "<" in usage here... https://www.mediawiki.org/wiki/Manual:PurgeList.php [17:36:24] thx much! [17:38:00] Reedy: can u think of any ill effects of testing it on a test banner? [17:38:34] Similar to purging any cache I guess.. [17:38:57] 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2827706 (10RobH) a:05mark>03RobH [17:39:15] 06Operations, 10ops-codfw: mw2092 - disk issue - https://phabricator.wikimedia.org/T151427#2827709 (10Papaul) a:05Papaul>03Volans Disk replacement and re-image complete. [17:42:17] (03PS5) 10Elukey: [WIP] Add temporary override to mc1019 hiera config to allow Redis config [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) [17:44:19] (03PS6) 10Elukey: [WIP] Add temporary override to mc1019 hiera config to allow Redis config [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) [17:46:17] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:18] 06Operations, 10ops-codfw: mw2092 - disk issue - https://phabricator.wikimedia.org/T151427#2827769 (10Volans) a:05Volans>03None De-assigning from myself, I just cleaned it's conftool status, better to have some with more expertise on this cluster to check it before re-adding it to production. [17:50:33] (03PS7) 10Elukey: [WIP] Add temporary override to mc1019 hiera config to allow Redis config [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) [17:50:42] (03PS2) 10BryanDavis: Add autopatrolled group for wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303183 [17:50:53] (03CR) 10jenkins-bot: [V: 04-1] Add autopatrolled group for wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303183 (owner: 10BryanDavis) [17:54:25] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/4689/" [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) (owner: 10Elukey) [17:56:51] (03PS3) 10BryanDavis: Add autopatrolled group for wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303183 [17:57:59] (03PS8) 10Elukey: [WIP] Add temporary dc to Redis config to allow a eqiad replica [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) [17:58:19] (03CR) 10BryanDavis: "PS2 partially adresses feedback from MarcoAurelio. PS3 is a manual rebase." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303183 (owner: 10BryanDavis) [18:00:04] gehel: Respected human, time to deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161128T1800). Please do the needful. [18:00:04] SMalyshev: A patch you scheduled for Weekly Wikidata query service deployment window is about to be deployed. Please be available during the process. [18:00:16] (03CR) 10BryanDavis: "> So an 'autopatrolled' with 'autopatrol' => 'true', and then add" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303183 (owner: 10BryanDavis) [18:00:35] SMalyshev: deploy on wdqs-test done, feel free to check [18:00:56] (03PS3) 10Dzahn: admin: create group striker-users [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) [18:01:58] gehel: give me a min [18:02:22] SMalyshev: as many as you want! [18:02:53] (03CR) 10Elukey: "Added Faidon, Giuseppe and Alex to get some feedback. This code review is still in WIP but it would be great to know if the idea is worth " [puppet] - 10https://gerrit.wikimedia.org/r/323807 (https://phabricator.wikimedia.org/T137345) (owner: 10Elukey) [18:03:32] gehel: looks ok to me [18:03:40] (03CR) 10Dzahn: [C: 031] "i believe the waiting period has ended already and the ticket says "The 3 days waiting period will end Sat, Nov 26, 00:55 UTC" too. but i " [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) (owner: 10Dzahn) [18:04:01] (03PS4) 10Dzahn: admin: create group striker-users [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) [18:05:17] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:05:32] 06Operations, 10Traffic, 13Patch-For-Review: HTTP/1.1 keepalive for local nginx->varnish conns - https://phabricator.wikimedia.org/T107749#2827829 (10BBlack) Should have resolved/rejected this back in T107749#2662491 - at this point it's just a collector of semi-related commits, but I don't think we plan to... [18:05:34] (03PS5) 10Dzahn: admin: create group striker-users, add b808 [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) [18:05:37] 06Operations, 10Traffic, 13Patch-For-Review: Support websockets in cache_misc - https://phabricator.wikimedia.org/T134870#2827831 (10BBlack) [18:05:40] 06Operations, 10Traffic, 13Patch-For-Review: HTTP/1.1 keepalive for local nginx->varnish conns - https://phabricator.wikimedia.org/T107749#2827830 (10BBlack) 05stalled>03declined [18:06:08] (03CR) 10Alex Monk: "Maybe it was a mistake to do group-based instead of right-based password policies." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322094 (https://phabricator.wikimedia.org/T150951) (owner: 10Reedy) [18:07:48] (03PS3) 10Smalyshev: Limit concurrent connections by client IP [puppet] - 10https://gerrit.wikimedia.org/r/319010 (https://phabricator.wikimedia.org/T108488) [18:08:59] !log deploying latest WDQS GUI and Blazegraph [18:09:00] (03CR) 10Dzahn: "no, actually neither, what robh said "@Volans: Please note the 3 day wait only applies on business days. So the 3 day wait actually ends o" [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) (owner: 10Dzahn) [18:09:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:55] (03CR) 10Andrew Bogott: "ok then :)" [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) (owner: 10Dzahn) [18:11:22] (03CR) 10Dzahn: [C: 032] "alright, with +1 from Andrew and jenkins-bot likes it" [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) (owner: 10Dzahn) [18:11:28] (03PS6) 10Dzahn: admin: create group striker-users, add b808 [puppet] - 10https://gerrit.wikimedia.org/r/323121 (https://phabricator.wikimedia.org/T151424) [18:12:39] SMalyshev: deployment complete for WDQS, tests look good [18:14:17] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [18:14:48] (03PS1) 10BBlack: tlsproxy: remove unused keepalives complexity [puppet] - 10https://gerrit.wikimedia.org/r/323865 (https://phabricator.wikimedia.org/T107749) [18:14:50] (03PS1) 10BBlack: tlsproxy: be explicit about Conn:close [puppet] - 10https://gerrit.wikimedia.org/r/323866 (https://phabricator.wikimedia.org/T107749) [18:14:59] gehel: great! [18:15:18] SMalyshev: I'll still deploy the connection limiting on nginx [18:15:51] gehel: thanks :) let's see if somebody complains (shouldn't but who knows) [18:16:38] 06Operations, 10scap: Decide on /var/lib vs /home as locations of homedir for mwdeploy - https://phabricator.wikimedia.org/T86971#980532 (10demon) I see zero reason we can't move the homedir to /var/lib. [18:19:16] (03PS4) 10Gehel: Limit concurrent connections by client IP [puppet] - 10https://gerrit.wikimedia.org/r/319010 (https://phabricator.wikimedia.org/T108488) (owner: 10Smalyshev) [18:19:38] (03PS1) 10Chad: Move mwdeploy home to /var/lib where it belongs, it's a system user [puppet] - 10https://gerrit.wikimedia.org/r/323867 (https://phabricator.wikimedia.org/T86971) [18:22:32] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Shell access to californium for bd808 - https://phabricator.wikimedia.org/T151424#2816784 (10Dzahn) on californium: ``` Notice: /Stage[main]/Admin/Admin::Hashgroup[striker-users]/Admin::Group[striker-users]/Group[striker-users]/ensure: created Notice... [18:22:54] 06Operations, 10Ops-Access-Requests: Shell access to californium for bd808 - https://phabricator.wikimedia.org/T151424#2827898 (10Dzahn) [18:23:12] (03CR) 10Gehel: [C: 032] Limit concurrent connections by client IP [puppet] - 10https://gerrit.wikimedia.org/r/319010 (https://phabricator.wikimedia.org/T108488) (owner: 10Smalyshev) [18:23:17] (03PS5) 10Gehel: Limit concurrent connections by client IP [puppet] - 10https://gerrit.wikimedia.org/r/319010 (https://phabricator.wikimedia.org/T108488) (owner: 10Smalyshev) [18:23:29] 06Operations, 10Ops-Access-Requests: Shell access to californium for bd808 - https://phabricator.wikimedia.org/T151424#2816784 (10Dzahn) 05Open>03Resolved a:03Dzahn [18:24:17] (03PS6) 10Jcrespo: Create script to check that sanitarium filtering is working [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) [18:26:13] (03PS1) 10EBernhardson: Increase Cirrus interwiki load test to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323868 (https://phabricator.wikimedia.org/T149740) [18:27:46] (03CR) 10jenkins-bot: [V: 04-1] Create script to check that sanitarium filtering is working [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) (owner: 10Jcrespo) [18:29:50] (03CR) 10Chad: [C: 04-1] "And by ASAP I mean by Friday, but not today." [puppet] - 10https://gerrit.wikimedia.org/r/323655 (https://phabricator.wikimedia.org/T151676) (owner: 10Reedy) [18:33:17] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [18:34:49] 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2827948 (10Ottomata) BTW we should name the new box something other than stat1001. We can use an element name if one is available, and that makes sense. This box hosts severa... [18:37:33] (03CR) 10Dzahn: [C: 031] ""/home is a fairly standard concept, but it is clearly a site-specific filesystem. [9] The setup will differ from host to host. Therefore," [puppet] - 10https://gerrit.wikimedia.org/r/323867 (https://phabricator.wikimedia.org/T86971) (owner: 10Chad) [18:42:10] (03PS1) 10Andrew Bogott: lighttpd nodes: customize lighttpd.tmpfile.conf [puppet] - 10https://gerrit.wikimedia.org/r/323869 (https://phabricator.wikimedia.org/T142932) [18:48:38] (03PS2) 10Dzahn: Shinken wmflabs: remove Chris McMahon [puppet] - 10https://gerrit.wikimedia.org/r/323536 (owner: 10Hashar) [18:50:15] (03CR) 10Dzahn: [C: 032] Shinken wmflabs: remove Chris McMahon [puppet] - 10https://gerrit.wikimedia.org/r/323536 (owner: 10Hashar) [18:50:51] (03PS2) 10Dzahn: Remove Antoine from beta cluster notifications [puppet] - 10https://gerrit.wikimedia.org/r/323537 (owner: 10Hashar) [18:52:20] 06Operations, 10Ops-Access-Requests, 06Discovery, 06Maps, and 2 others: Requesting access to analytics-privatedata-users for technical user discovery-stats - https://phabricator.wikimedia.org/T151063#2828111 (10Ottomata) INNNNTeresting. The `stats` group is also [[ https://github.com/wikimedia/operations-... [18:53:17] (03CR) 10Rush: [C: 031] lighttpd nodes: customize lighttpd.tmpfile.conf [puppet] - 10https://gerrit.wikimedia.org/r/323869 (https://phabricator.wikimedia.org/T142932) (owner: 10Andrew Bogott) [18:53:57] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [18:54:04] (03CR) 10Dzahn: [C: 032] Remove Antoine from beta cluster notifications [puppet] - 10https://gerrit.wikimedia.org/r/323537 (owner: 10Hashar) [18:54:49] (03CR) 10Yuvipanda: "Isn't this a complete NOOP because tmpfiles is a systemd feature and Trusty runs upstart?" [puppet] - 10https://gerrit.wikimedia.org/r/323869 (https://phabricator.wikimedia.org/T142932) (owner: 10Andrew Bogott) [18:54:57] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4517415 keys, up 28 days 10 hours - replication_delay is 0 [18:56:02] (03CR) 10Yuvipanda: [C: 04-1] "(feel free to override my -1 if upstart also provides this functionality)" [puppet] - 10https://gerrit.wikimedia.org/r/323869 (https://phabricator.wikimedia.org/T142932) (owner: 10Andrew Bogott) [18:57:29] 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2828163 (10elukey) >>! In T149911#2827948, @Ottomata wrote: > BTW we should name the new box something other than stat1001. We can use an element name if one is available, and... [19:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161128T1900). [19:00:04] hoo and ebernhardson: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [19:00:31] Who wants to SWAT? I can also deploy myself [19:01:21] I can SWAT today [19:01:35] great [19:02:42] jouncebot: swat roulette [19:03:46] hoo: "Want" is such a strong word ;-) [19:03:57] ^ [19:04:13] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323347 (owner: 10Hoo man) [19:08:14] (03PS1) 10Chad: Add open.dblist that covers non-closed and non-deleted wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323873 [19:09:01] (03PS1) 10ArielGlenn: remove get_lockinfo and bogus info about when run started [dumps] - 10https://gerrit.wikimedia.org/r/323874 (https://phabricator.wikimedia.org/T133547) [19:09:16] (03CR) 10Jcrespo: "This must be an old version of pep8 running, the standard agrees with my style: https://www.python.org/dev/peps/pep-0008/#should-a-line-br" [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) (owner: 10Jcrespo) [19:10:59] (03PS1) 10Chad: Make flowprivate.dblist auto-managed by dblist math [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323875 [19:11:32] hrm. This is a bad time for the gate-and-submit queue evidently [19:12:15] thcipriani: fun! [19:12:25] Yeah :/ [19:14:09] (03CR) 10ArielGlenn: [C: 04-1] "This requires changes to other scripts (dumps) that use the private dblist for flow first." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323875 (owner: 10Chad) [19:14:48] (03Abandoned) 10Chad: Make flowprivate.dblist auto-managed by dblist math [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323875 (owner: 10Chad) [19:14:59] ostriches: it can go but not today [19:15:06] if you wanna give me til tomorrow [19:15:07] It's not worth it :p [19:15:08] lol [19:15:10] heh [19:15:12] ok [19:16:42] (03CR) 10Tim Landscheidt: "@yuvipanda: You may be right. Looking again on Trusty instances, I see "/lib/systemd/systemd-udevd --daemon" and "/lib/systemd/systemd-lo" [puppet] - 10https://gerrit.wikimedia.org/r/323869 (https://phabricator.wikimedia.org/T142932) (owner: 10Andrew Bogott) [19:17:30] come on jenkins... review me please [19:19:48] ostriches: any chance it could be contint + dump repo problem? ^^ [19:20:36] apergos jenkins is slow tonight [19:20:40] https://integration.wikimedia.org/zuul/ [19:20:46] (03Merged) 10jenkins-bot: Use entity types for the repoNamespaces Wikibase client setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323347 (owner: 10Hoo man) [19:21:30] paladox: and just as you say it, it finished p [19:21:36] all right, I'll just be patient [19:21:48] lol [19:22:00] apergos most tests use nodepool and can get slow at peak times [19:22:32] (03PS3) 10ArielGlenn: move config defaults to a separate method [dumps] - 10https://gerrit.wikimedia.org/r/322451 (https://phabricator.wikimedia.org/T133547) [19:22:35] (03PS1) 10Jdlrobson: Disable Popups A/B test on Russian and Italian Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323878 (https://phabricator.wikimedia.org/T144490) [19:23:02] is there any space in the swat window for a small config change thcipriani ? [19:23:03] I don't care if it takes awhile, it's not broken and that's the important thing [19:23:10] !log Stupidly issued iptables -F on restbase2012 [19:23:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:29] hoo: https://gerrit.wikimedia.org/r/#/c/323347/ is live on mwdebug1002, if there is anything to check there [19:23:55] thcipriani: I can briefly check enwiki on that one [19:24:15] jdlrobson: sure, go ahead and put it on the list. caveat emptor: jenkins is making swat pretty slow. [19:24:27] thanks thcipriani added to wiki page [19:24:28] [ https://gerrit.wikimedia.org/r/323878 Disable Popups A/B test on Russian and Italian Wikipedias ] [19:24:32] thanks [19:24:37] (03CR) 10Dzahn: "compiler still says there is a change" [puppet] - 10https://gerrit.wikimedia.org/r/323349 (owner: 10Paladox) [19:24:53] urandom: ^ fixed restbase2012 via console [19:25:05] thcipriani: Looks good [19:25:14] hoo: ok going live. [19:25:22] godog: thank you! [19:25:30] np [19:25:39] hoo: is your second change this one? https://gerrit.wikimedia.org/r/#/c/323871/ [19:26:05] thcipriani: Yes, but I need to update the Wikidata repo first [19:26:09] so kind of [19:26:17] you probably want to continue with the other changes first [19:27:05] ok, thanks [19:27:05] (03PS2) 10Jdlrobson: Disable Popups A/B test on Russian and Italian Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323878 (https://phabricator.wikimedia.org/T144490) [19:27:18] !log thcipriani@tin Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:323347|Use entity types for the repoNamespaces Wikibase client setting]] (duration: 00m 45s) [19:27:27] ^ hoo first change live everywhere [19:27:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:42] Great :) [19:27:56] thcipriani: Second change: https://gerrit.wikimedia.org/r/323880 [19:27:56] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323868 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:28:01] (03PS2) 10Thcipriani: Increase Cirrus interwiki load test to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323868 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:28:04] (03CR) 10Thcipriani: Increase Cirrus interwiki load test to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323868 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:28:07] Feel free to +2 at any time… Jenkins usually takes a bit on that repo [19:28:09] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323868 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:28:14] some days jenkins [19:28:18] 06Operations, 10ops-codfw, 06Analytics-Kanban, 10EventBus: rack/setup kafka2003 - https://phabricator.wikimedia.org/T150340#2828323 (10Ottomata) [19:28:30] (03PS1) 10Ottomata: Add kafka2003 to main-codfw Kafka cluster and configure eventbus there [puppet] - 10https://gerrit.wikimedia.org/r/323881 (https://phabricator.wikimedia.org/T150340) [19:28:47] hoo: +2'd [19:28:52] thanks [19:29:09] !log swift eqiad-prod: ms-be1027 to weight 3000 - T136631 [19:29:10] (03PS2) 10Ottomata: Add kafka2003 to main-codfw Kafka cluster and configure eventbus there [puppet] - 10https://gerrit.wikimedia.org/r/323881 (https://phabricator.wikimedia.org/T150340) [19:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:20] T136631: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631 [19:29:37] (03PS3) 10Jdlrobson: Disable Popups A/B test on Russian and Italian Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323878 (https://phabricator.wikimedia.org/T144490) [19:31:11] (03CR) 10Dzahn: [C: 04-1] "in modules/phabricator/manifests/redirector.pp we already have variables $mysql_user and $mysql_pass that are changed by this." [puppet] - 10https://gerrit.wikimedia.org/r/323349 (owner: 10Paladox) [19:31:13] (03PS3) 10ArielGlenn: move some adds/changes-specific code out of miscdumpslib [dumps] - 10https://gerrit.wikimedia.org/r/322452 (https://phabricator.wikimedia.org/T133547) [19:31:23] (03Merged) 10jenkins-bot: Increase Cirrus interwiki load test to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323868 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:31:39] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323878 (https://phabricator.wikimedia.org/T144490) (owner: 10Jdlrobson) [19:31:48] (03PS4) 10Thcipriani: Disable Popups A/B test on Russian and Italian Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323878 (https://phabricator.wikimedia.org/T144490) (owner: 10Jdlrobson) [19:31:53] (03CR) 10Thcipriani: Disable Popups A/B test on Russian and Italian Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323878 (https://phabricator.wikimedia.org/T144490) (owner: 10Jdlrobson) [19:31:58] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323878 (https://phabricator.wikimedia.org/T144490) (owner: 10Jdlrobson) [19:32:30] ebernhardson: your change is live on mwdebug1002 if there is anything to check there [19:32:53] thcipriani: should be all good, nothing really to check it's just increasing a % [19:33:00] * thcipriani nods [19:33:03] going live everywhere [19:33:22] (03PS4) 10Rush: labsdb: cleanup labs/db/views presentation [puppet] - 10https://gerrit.wikimedia.org/r/323830 [19:34:46] (03Merged) 10jenkins-bot: Disable Popups A/B test on Russian and Italian Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323878 (https://phabricator.wikimedia.org/T144490) (owner: 10Jdlrobson) [19:34:49] !log thcipriani@tin Synchronized wmf-config/CirrusSearch-production.php: SWAT: [[gerrit:323868|Increase Cirrus interwiki load test to 100%]] (T149740) (duration: 00m 46s) [19:34:54] ^ ebernhardson live everywhere [19:34:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:00] T149740: Run load tests of cross-project searching to verify its stability - https://phabricator.wikimedia.org/T149740 [19:35:41] (03PS3) 10ArielGlenn: start moving adds/changes methods out to incr_dumps module [dumps] - 10https://gerrit.wikimedia.org/r/322491 (https://phabricator.wikimedia.org/T133547) [19:35:41] jdlrobson: your change is live on mwdebug1002, check please [19:35:44] (03PS3) 10Ottomata: Add kafka2003 to main-codfw Kafka cluster and configure eventbus there [puppet] - 10https://gerrit.wikimedia.org/r/323881 (https://phabricator.wikimedia.org/T150340) [19:35:53] on it! [19:36:26] (03CR) 10Ottomata: [C: 032 V: 032] Add kafka2003 to main-codfw Kafka cluster and configure eventbus there [puppet] - 10https://gerrit.wikimedia.org/r/323881 (https://phabricator.wikimedia.org/T150340) (owner: 10Ottomata) [19:36:28] (03CR) 10Dzahn: "use "$mysql_admin_user" and "$mysql_admin_pass" instead of "$mysql_user" and "$mysql_pass" should be the easiest way to avoid that side ef" [puppet] - 10https://gerrit.wikimedia.org/r/323349 (owner: 10Paladox) [19:37:02] thcipriani: thanks! [19:38:46] (03CR) 10Rush: [C: 032 V: 032] labsdb: cleanup labs/db/views presentation [puppet] - 10https://gerrit.wikimedia.org/r/323830 (owner: 10Rush) [19:38:51] thcipriani: looksgood [19:38:54] (03PS7) 10Paladox: Phabricator: Allow setting mysql.user and mysql.pass (part2) [puppet] - 10https://gerrit.wikimedia.org/r/323349 [19:39:01] (03PS5) 10Rush: labsdb: cleanup labs/db/views presentation [puppet] - 10https://gerrit.wikimedia.org/r/323830 [19:39:02] jdlrobson: ok, going live [19:39:37] (03PS8) 10Paladox: Phabricator: Allow setting mysql.user and mysql.pass (part2) [puppet] - 10https://gerrit.wikimedia.org/r/323349 [19:40:16] (03CR) 10Rush: [V: 032] labsdb: cleanup labs/db/views presentation [puppet] - 10https://gerrit.wikimedia.org/r/323830 (owner: 10Rush) [19:40:37] PROBLEM - DPKG on kafka2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:40:47] PROBLEM - MD RAID on kafka2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:40:57] PROBLEM - Check systemd state on kafka2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:40:57] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:323878|Disable Popups A/B test on Russian and Italian Wikipedias]] T144490 (duration: 00m 45s) [19:41:03] ^ jdlrobson live everywhere [19:41:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:08] T144490: Stop Hovercards A/B tests - https://phabricator.wikimedia.org/T144490 [19:41:27] RECOVERY - DPKG on kafka2003 is OK: All packages OK [19:41:37] RECOVERY - MD RAID on kafka2003 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [19:41:47] RECOVERY - Check systemd state on kafka2003 is OK: OK - running: The system is fully operational [19:41:51] Oook [19:42:41] (03CR) 10Tim Landscheidt: [C: 04-1] "Yep, the issue lies with /etc/init.d/lighttpd. I'll explain on Phabricator." [puppet] - 10https://gerrit.wikimedia.org/r/323869 (https://phabricator.wikimedia.org/T142932) (owner: 10Andrew Bogott) [19:42:49] !log T151086: bootstrap of restbase2012-a.codfw.wmnet starting... [19:42:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:00] T151086: RESTBase cluster expansion - https://phabricator.wikimedia.org/T151086 [19:44:01] thcipriani: The Wikidata change has been merged [19:44:19] the relevant code is not used in web-request or jobs [19:44:20] ah, finally :) [19:44:28] so you can just deploy it [19:44:37] PROBLEM - puppet last run on kafka2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[python-tornado] [19:44:39] okie doke. Doing. [19:44:41] and I'll then test it myself on terbium (it's used in a maintenance script) [19:45:34] (03PS3) 10ArielGlenn: MiscDir becomes MiscDumpDir. naming is hard, etc. [dumps] - 10https://gerrit.wikimedia.org/r/322509 [19:48:09] (03PS2) 10Rush: gridengine: refactor and establish norms [puppet] - 10https://gerrit.wikimedia.org/r/322149 [19:48:35] (03PS1) 10Filippo Giunchedi: hhvm: fix hhvm-needs-restart logic for memory [puppet] - 10https://gerrit.wikimedia.org/r/323887 (https://phabricator.wikimedia.org/T151702) [19:48:57] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 637 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4528735 keys, up 28 days 11 hours - replication_delay is 637 [19:50:00] (03CR) 10Rush: [C: 032 V: 032] gridengine: refactor and establish norms [puppet] - 10https://gerrit.wikimedia.org/r/322149 (owner: 10Rush) [19:50:12] !log thcipriani@tin Synchronized php-1.29.0-wmf.3/extensions/Wikidata: SWAT: [[gerrit:323880|Update Wikibase: Use the "redirect" table in SqlEntityIdPager]] T151356 (duration: 02m 11s) [19:50:17] ^ hoo live now [19:50:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:24] T151356: Wikibase\Repo\Store\Sql\SqlEntityIdPager::fetchIds query slow - https://phabricator.wikimedia.org/T151356 [19:50:45] thcipriani: thanks, will test [19:50:52] (03PS1) 10Rush: labsdb: remove user_property fields deemed sensitive [puppet] - 10https://gerrit.wikimedia.org/r/323889 [19:53:41] (03PS1) 10Ottomata: Update version of python-tornado used from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/323890 (https://phabricator.wikimedia.org/T150340) [19:54:27] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw on kafka2001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_main-codfw/producer\.properties [19:54:27] PROBLEM - Check systemd state on kafka2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:54:58] yaya [19:55:40] (03CR) 10Ottomata: [C: 032 V: 032] Update version of python-tornado used from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/323890 (https://phabricator.wikimedia.org/T150340) (owner: 10Ottomata) [19:57:18] thanks for fitting me in thcipriani [19:57:31] RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw on kafka2001 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_main-codfw/producer\.properties [19:57:31] RECOVERY - Check systemd state on kafka2001 is OK: OK - running: The system is fully operational [19:57:31] RECOVERY - puppet last run on kafka2003 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [19:57:41] jdlrobson: no problem at all :) [19:57:52] (03PS3) 10ArielGlenn: move more incremental-related methods out to incr_dumps module [dumps] - 10https://gerrit.wikimedia.org/r/322510 (https://phabricator.wikimedia.org/T133547) [19:59:43] (03PS1) 10Ottomata: Fix typo for Kafka mirror maker jmxtrans attr [puppet] - 10https://gerrit.wikimedia.org/r/323892 [20:00:17] (03CR) 10Dzahn: "looks much better now http://puppet-compiler.wmflabs.org/4692/" [puppet] - 10https://gerrit.wikimedia.org/r/323349 (owner: 10Paladox) [20:00:39] (03PS2) 10Ottomata: Fix typo for Kafka mirror maker jmxtrans attr [puppet] - 10https://gerrit.wikimedia.org/r/323892 [20:00:43] (03CR) 10Ottomata: [C: 032 V: 032] Fix typo for Kafka mirror maker jmxtrans attr [puppet] - 10https://gerrit.wikimedia.org/r/323892 (owner: 10Ottomata) [20:02:57] (03PS1) 10Ottomata: Add kafka2003 into conftool for eventbus service Bug: T150340 [puppet] - 10https://gerrit.wikimedia.org/r/323893 [20:03:17] (03CR) 10Ottomata: [C: 032 V: 032] Add kafka2003 into conftool for eventbus service Bug: T150340 [puppet] - 10https://gerrit.wikimedia.org/r/323893 (owner: 10Ottomata) [20:03:30] (03CR) 10Rush: [C: 032 V: 032] "This was hotpatched and is the reality in production now" [puppet] - 10https://gerrit.wikimedia.org/r/323889 (owner: 10Rush) [20:03:53] (03PS2) 10Rush: labsdb: remove user_property fields deemed sensitive [puppet] - 10https://gerrit.wikimedia.org/r/323889 [20:04:04] (03CR) 10Rush: [V: 032] labsdb: remove user_property fields deemed sensitive [puppet] - 10https://gerrit.wikimedia.org/r/323889 (owner: 10Rush) [20:04:13] (03PS3) 10ArielGlenn: move methods that dump things into the IncrDump class in incr_dump [dumps] - 10https://gerrit.wikimedia.org/r/322511 (https://phabricator.wikimedia.org/T133547) [20:04:27] ottomata: I caught 'Andrew Otto: Add kafka2003 into conftool for eventbus service Bug: T150340 (d25b914)' on puppet-merge [20:04:27] T150340: rack/setup kafka2003 - https://phabricator.wikimedia.org/T150340 [20:04:43] ottomata: looked fine so I rolled w/ it [20:04:50] chasemp: thanks sorry [20:04:55] thats good [20:05:04] !log T150679 changes for user_properties view on labsdb1001 and 1003 [20:05:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:44] !log rolling restart of eventlogging-service-eventbus in eqiad to pick up new python-tornado version bump from jessie backports (so it doesn't bite us unexpectedly later) [20:05:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:43] !log Started dumpwikidatajson.sh on snapshot1007 (T151787) [20:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:53] T151787: No wikidata JSON dump for 2 weeks - https://phabricator.wikimedia.org/T151787 [20:13:18] (03PS2) 10ArielGlenn: add run method to the IncrDump class to be used by the generate wrapper [dumps] - 10https://gerrit.wikimedia.org/r/322512 (https://phabricator.wikimedia.org/T133547) [20:15:11] (03PS9) 10Dzahn: Phabricator: Allow setting mysql.user and mysql.pass (part2) [puppet] - 10https://gerrit.wikimedia.org/r/323349 (owner: 10Paladox) [20:16:53] jouncebot, next [20:16:53] In 0 hour(s) and 43 minute(s): Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161128T2100) [20:19:31] (03PS2) 10ArielGlenn: a bit of pylint: order of imports, var initialization type whines [dumps] - 10https://gerrit.wikimedia.org/r/322513 [20:22:20] !log wikitech-static: moved to REL1_28 [20:22:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:25:14] (03PS2) 10ArielGlenn: Change last few config options from 'incr' to 'misc' [dumps] - 10https://gerrit.wikimedia.org/r/322515 (https://phabricator.wikimedia.org/T133547) [20:26:51] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:27:11] PROBLEM - are wikitech and wt-static in sync on silver is CRITICAL: wikitech-static CRIT - failed to fetch timestamp from wikitech-static [20:27:31] PROBLEM - are wikitech and wt-static in sync on labtestweb2001 is CRITICAL: wikitech-static CRIT - failed to fetch timestamp from wikitech-static [20:27:51] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [20:28:11] RECOVERY - are wikitech and wt-static in sync on silver is OK: wikitech-static OK - wikitech and wikitech-static in sync (69962 200000s) [20:28:31] RECOVERY - are wikitech and wt-static in sync on labtestweb2001 is OK: wikitech-static OK - wikitech and wikitech-static in sync (69979 200000s) [20:29:21] (apparently one particular Semantic extension isn't working in 1.28 yet) [20:30:58] It's...static content. Why does it need SMW? [20:32:17] to parse the content? ;p [20:32:26] it imports content from wikitech ostriches [20:32:31] Yes, I'm well aware. [20:32:40] which includes SMW syntax and such [20:32:46] I can't imagine that's actually useful though. If we're down enough that we're needing to use wikitech-static, I'm kinda doubting we care about the instance pages parsing. [20:33:17] Then again, I think the instance pages are dumb anyway. [20:34:00] lool [20:34:03] honestly, if something other than wikitech is dead enough that we can't reach wikitech, and we need to read wikitech to figure out what to do, we're probably already in pretty improbable and awful territory :) [20:34:04] okay [20:34:13] open a ticket, if ops agree I'll see if I can get rid of it for you [20:35:01] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:35:09] I think this will be an "if labs agrees"... you might want to put them on the ticket straight up [20:36:39] that's likely if it really is only labs pages using SMW functionality [20:43:59] I'm getting the problem reported in https://phabricator.wikimedia.org/T151676 [20:44:08] tried the quick fix but no joy [20:44:54] ostriches, ^ [20:46:07] edsanders: mw core or another repo? [20:46:28] mw core [20:47:25] edsanders: One sec, lemme tail the log while you're actively trying to fetch [20:47:36] Maybe we can finally spot the log with a reliably reproduction [20:47:49] ok [20:48:14] just did it: [20:48:20] $ git fetch gerrit [20:48:21] fatal: internal server error [20:48:21] remote: internal server error [20:48:21] fatal: protocol error: bad pack header [20:49:01] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4536224 keys, up 28 days 12 hours - replication_delay is 0 [20:49:24] Got it [20:50:23] Ok, there's two sha1s it's upset about [20:52:01] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:52:16] Reedy, edsanders: https://phabricator.wikimedia.org/P4534 [20:53:19] (03CR) 10ArielGlenn: [C: 032] fix new flake8 whines [dumps] - 10https://gerrit.wikimedia.org/r/322484 (owner: 10ArielGlenn) [20:53:41] oooh [20:55:07] Bad objects locally [20:55:48] Reedy: They're both the result of updating the CentralNotice submodule on a wmf branch [20:55:52] That...seems suspect. [20:56:04] Hmmm [20:56:15] isn't CentralNotice a special snowflake extension with a custom deployment branch name? [20:56:20] 06Operations, 10Annual-Report, 13Patch-For-Review: add subdomain for annual report 2016 - https://phabricator.wikimedia.org/T151798#2828774 (10Dzahn) [20:56:21] As they all track the same branch, it causes a lot of bumps [20:56:22] Er, not. [20:56:22] yeah [20:56:23] One is [20:56:30] Krenair: Yes, it is [20:56:32] wmf_deploy [20:56:35] 06Operations, 10Annual-Report, 13Patch-For-Review: add subdomain for annual report 2016 - https://phabricator.wikimedia.org/T151798#2828442 (10Dzahn) p:05Triage>03Normal [20:56:53] 06Operations, 10Annual-Report: add subdomain for annual report 2016 - https://phabricator.wikimedia.org/T151798#2828442 (10Dzahn) [20:56:57] !log filippo@tin Synchronized php-1.29.0-wmf.3/api.php: Revert bandaid from ori (duration: 00m 53s) [20:56:57] Wait, yes both they are. [20:56:59] Got confused. [20:57:03] Ok, that's suspect. [20:57:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:14] Have they done anything weird? [20:58:01] is there something I can do locally to get it working - I have patches to commit :) [20:58:39] Reedy, Krenair: The submodule subscription *should* point to the right branch though, if .gitmodules is to be believed. [20:58:45] edsanders: Fresh clones are known to work [20:58:50] Actually. [20:58:56] Do you have any wmf branches locally? [20:59:08] *me is curious* [20:59:23] ooh [20:59:27] let me check my other clone [20:59:39] how do I see all local branches? [20:59:45] git branch -a [20:59:56] * master [20:59:56] remotes/origin/HEAD -> origin/master [20:59:56] remotes/origin/REL1_23 [20:59:56] remotes/origin/REL1_27 [20:59:56] remotes/origin/master [21:00:04] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, Amir1, and yurik: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161128T2100). Please do the needful. [21:00:12] Ok so no. [21:00:14] Hmmmm [21:00:16] oh wait [21:00:19] I'm in the wrong folder atm [21:00:43] um [21:00:56] I have tons of "remotes/gerrit/wmf/1.xxx" [21:01:10] Ah ok let's try something [21:01:13] about 200 [21:01:21] One of my clones of core has.. [21:01:21] wmf/1.29.0-wmf.1 [21:01:21] wmf/1.29.0-wmf.2 [21:01:26] but fine for updating [21:01:32] edsanders: `git remote prune {gerrt,origin}` [21:01:34] no parsoid deploy today [21:01:34] Then try fetching [21:01:36] I'm curious. [21:01:36] (03CR) 10Dzahn: [C: 032] "no-op per compiler, watching on iridium" [puppet] - 10https://gerrit.wikimedia.org/r/323349 (owner: 10Paladox) [21:02:01] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:02:07] Eh, prune won't kill that, nvm [21:02:14] Can't hurt tho [21:02:35] oh - it worked! [21:02:41] Oooooh [21:02:44] Ok, fun times. [21:02:46] That's the workaround [21:02:50] And it's totally a wmf branch problem [21:02:58] (which means in time it'll magically disappear) [21:03:10] ostriches: Krenair sorry, CN issues? [21:03:16] thanks all [21:03:16] Eh, wmf branch problems [21:03:18] Not CN itself [21:03:26] paladox/twentyafterfour: applying change on iridium. the part2 of the "let us configure mysql user/pass in hiera" to get labs testing unblocked [21:04:09] and done, no-op on iridium [21:04:51] Reedy: Ok, so I'm about 99% sure master/release branches are fine. [21:04:58] Cleaning up the old wmf branches will help [21:05:01] I'm just poking at my .2 clone [21:05:04] And pruning the remote is the workaround [21:06:26] WFM [21:08:07] (03CR) 10ArielGlenn: [C: 032] move IncrDumpLib to miscdumpslib and rename classes and methods accordingly [dumps] - 10https://gerrit.wikimedia.org/r/322450 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [21:08:09] Waittttt [21:08:18] (03PS1) 10Andrew Bogott: Override dpkg to specify mode 1777 for /var/run/lighttpd [puppet] - 10https://gerrit.wikimedia.org/r/323905 [21:08:21] Did somebody prune all of the wmf/1.28.0-* branches already? [21:08:26] Or are they *missing*? [21:08:42] I'm sure they'd been pruned [21:08:53] wait.. [21:09:00] remotes/origin/wmf/1.27.0-wmf.1 [21:09:01] remotes/origin/wmf/1.27.0-wmf.10 [21:09:24] ostriches: ah K thx :) [21:09:55] (03CR) 10ArielGlenn: [C: 032] remove get_lockinfo and bogus info about when run started [dumps] - 10https://gerrit.wikimedia.org/r/323874 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [21:10:09] mmm, it just pruned it all [21:10:23] Reedy: So basically, I think it's because the old branches have disappeared (and gc pruned the unreferenced objects) but the local clone wants to know about those refs. [21:10:31] (03CR) 10jenkins-bot: [V: 04-1] Override dpkg to specify mode 1777 for /var/run/lighttpd [puppet] - 10https://gerrit.wikimedia.org/r/323905 (owner: 10Andrew Bogott) [21:10:41] I'm not *entirely* sure why this hasn't been a problem before, except that auto-gc is a new(ish) thing since we moved from 2.8 -> 2.12 [21:10:45] that sounds possible [21:11:22] ostriches: Did you not fix the RELEASE-NOTES for 1.28? [21:11:27] Reason #845 mw/core as deployed should be a "fork" of mw/core and not branches on the core repo. [21:11:32] Reedy: Prolly not :p [21:11:33] Whoops [21:11:48] (03PS2) 10Andrew Bogott: Override dpkg to specify mode 1777 for /var/run/lighttpd [puppet] - 10https://gerrit.wikimedia.org/r/323905 [21:11:50] I was just gonna nuke 1.28 from maste [21:11:50] r [21:12:10] I'll e-mail the announce list and mention it. Someone's gonna notice :p [21:12:18] xD [21:13:07] https://github.com/wikimedia/mediawiki/compare/1.28.0-rc.1...1.28.0 [21:15:12] (03CR) 10Dzahn: [C: 04-1] "thanks for doing this! i compiled it but we still have an issue:" [puppet] - 10https://gerrit.wikimedia.org/r/323333 (owner: 10BryanDavis) [21:15:48] (03CR) 10jenkins-bot: [V: 04-1] Override dpkg to specify mode 1777 for /var/run/lighttpd [puppet] - 10https://gerrit.wikimedia.org/r/323905 (owner: 10Andrew Bogott) [21:16:11] (03CR) 10ArielGlenn: [C: 032] move config defaults to a separate method [dumps] - 10https://gerrit.wikimedia.org/r/322451 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [21:16:45] (03PS3) 10Andrew Bogott: Override dpkg to specify mode 1777 for /var/run/lighttpd [puppet] - 10https://gerrit.wikimedia.org/r/323905 [21:16:48] 06Operations, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Asynchronous processing in production: one queue to rule them all - https://phabricator.wikimedia.org/T149408#2828878 (10GWicke) >>! In T149408#2805198, @Legoktm wrote: >>>! In T149408#2753327, @Pchelolo wrote: >> @bd808 What are t... [21:16:59] ostriches: https://gerrit.wikimedia.org/r/323908 [21:17:18] (03CR) 10Dzahn: "yea, that seems like a chicken/egg problem which would be fixed by the included hiera change, i guess" [puppet] - 10https://gerrit.wikimedia.org/r/323333 (owner: 10BryanDavis) [21:17:34] Reedy: https://lists.wikimedia.org/pipermail/mediawiki-announce/2016-November/000205.html [21:18:45] mutante: oh. does pcc not see hiera changes in the same patch set? [21:18:57] 06Operations, 06Analytics-Kanban: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2828882 (10RobH) [21:19:01] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:19:04] bd808: i am a bit surprised, but it seems like not [21:19:40] would have been nicer to show it's no-op, but the error seems to make sense [21:20:08] 06Operations, 06Analytics-Kanban: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2828882 (10RobH) [21:20:18] hmmm... I wonder if the problem is that the hiera role file needs to be moved too? [21:20:23] 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2768740 (10RobH) 05Open>03Resolved [21:20:36] bd808: ooh, yes [21:20:36] to role/common/logastash/collector.yaml [21:20:41] it does [21:20:50] but there is so much else in it [21:21:19] if we just created the new file with the new setting [21:21:25] 06Operations, 10ops-eqiad: update label/racktables visible label for thorium/wmf4726 - https://phabricator.wikimedia.org/T151818#2828917 (10RobH) [21:21:25] and not move it [21:21:36] i think then it would be a nice no-op.. should we try that? [21:21:38] I honestly don't quite understand how the magic hira roll lookup stuff works [21:22:18] I think all the other settings have defaults [21:22:29] but we need all of those settings on logstash100[123] [21:22:35] stuff in role/common/ does look at the role name [21:23:24] mutante: if you have time to mess with it feel free to amend. I have some other stuff I'm supposed to get done today. :) [21:23:27] can i just try if it compiles [21:23:31] ok! [21:23:38] (03PS1) 10RobH: setting dns entries for thorium [dns] - 10https://gerrit.wikimedia.org/r/323910 [21:25:13] (03CR) 10RobH: [C: 032] setting dns entries for thorium [dns] - 10https://gerrit.wikimedia.org/r/323910 (owner: 10RobH) [21:27:16] !log mobileapps deployed 1d09b98 [21:27:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:29:25] (03PS1) 10RobH: thorium setup params [puppet] - 10https://gerrit.wikimedia.org/r/323943 [21:30:00] (03CR) 10RobH: [C: 032] thorium setup params [puppet] - 10https://gerrit.wikimedia.org/r/323943 (owner: 10RobH) [21:30:52] (03PS3) 10Dzahn: logstash: Break logstash.pp up into individual classes [puppet] - 10https://gerrit.wikimedia.org/r/323333 (owner: 10BryanDavis) [21:31:59] (03CR) 10ArielGlenn: [C: 032] move some adds/changes-specific code out of miscdumpslib [dumps] - 10https://gerrit.wikimedia.org/r/322452 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [21:33:30] (03CR) 10Volans: [C: 04-1] "I've added a bunch of comments inline." (0314 comments) [puppet] - 10https://gerrit.wikimedia.org/r/323525 (https://phabricator.wikimedia.org/T150802) (owner: 10Jcrespo) [21:38:33] (03CR) 10Rush: [C: 031] Override dpkg to specify mode 1777 for /var/run/lighttpd [puppet] - 10https://gerrit.wikimedia.org/r/323905 (owner: 10Andrew Bogott) [21:39:51] PROBLEM - Kafka Broker Replica Max Lag on kafka2003 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [10000.0] [21:40:39] das ok^ [21:40:42] i'm balancing things with new broker [21:44:31] (03CR) 10ArielGlenn: [C: 032] start moving adds/changes methods out to incr_dumps module [dumps] - 10https://gerrit.wikimedia.org/r/322491 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [21:46:04] 06Operations, 10Icinga, 06Labs, 10Labs-Infrastructure, 10Monitoring: labtestcontrol2001 should not make Icinga page us - https://phabricator.wikimedia.org/T120047#2829025 (10Dzahn) Yes, it should be fine to re-enable these. The whole ticket was about not sending SMS, that is about the special "sms" conta... [21:46:26] (03PS4) 10Andrew Bogott: Override dpkg to specify mode 1777 for /var/run/lighttpd [puppet] - 10https://gerrit.wikimedia.org/r/323905 [21:47:59] (03CR) 10Andrew Bogott: [C: 032] Override dpkg to specify mode 1777 for /var/run/lighttpd [puppet] - 10https://gerrit.wikimedia.org/r/323905 (owner: 10Andrew Bogott) [21:50:24] (03CR) 10Dzahn: [C: 04-1] "no, like you suspected this is because the yaml file in role/common/ also needs to be moved. i tested it by compiling PS3 where i just put" [puppet] - 10https://gerrit.wikimedia.org/r/323333 (owner: 10BryanDavis) [21:51:46] (03PS4) 10Dzahn: logstash: Break logstash.pp up into individual classes [puppet] - 10https://gerrit.wikimedia.org/r/323333 (owner: 10BryanDavis) [21:52:13] (03PS1) 10Andrew Bogott: /fully/qualify an exec path [puppet] - 10https://gerrit.wikimedia.org/r/323971 [21:53:51] (03CR) 10Andrew Bogott: [C: 032] /fully/qualify an exec path [puppet] - 10https://gerrit.wikimedia.org/r/323971 (owner: 10Andrew Bogott) [21:53:59] ostriches the run gc button is available to run for everyne [21:54:02] everyone [21:54:08] i see it on operations/puppet [21:54:29] (03PS4) 10Reedy: Make and re-use list of all elevated WMF groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322094 (https://phabricator.wikimedia.org/T150951) [21:54:38] (03CR) 10ArielGlenn: [C: 032] MiscDir becomes MiscDumpDir. naming is hard, etc. [dumps] - 10https://gerrit.wikimedia.org/r/322509 (owner: 10ArielGlenn) [21:55:10] (03CR) 10Reedy: [C: 032] Make and re-use list of all elevated WMF groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322094 (https://phabricator.wikimedia.org/T150951) (owner: 10Reedy) [21:55:46] (03Merged) 10jenkins-bot: Make and re-use list of all elevated WMF groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322094 (https://phabricator.wikimedia.org/T150951) (owner: 10Reedy) [21:55:56] paladox: Try now? [21:56:02] Ok [21:56:21] ostriches it is now hidden [21:56:22] thanks [21:56:23] Reedy, you don't appear to have written the definition into the file? [21:56:43] Do we really care enough? [21:56:45] paladox: Mistake on my part. [21:56:46] I certainly don't [21:56:54] ostriches didn't we raise the object limit? [21:56:56] to 100mb [21:57:38] ostriches operations/puppet is showing 20mb [21:58:01] I see it is a limit for puppet [22:00:04] dapatrick, bawolff, and Reedy: Dear anthropoid, the time has come. Please deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161128T2200). [22:00:28] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: OATHAuth for more groups and bump their password requirements (duration: 00m 45s) [22:00:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:00:42] (03Draft1) 10Paladox: Phabricator: Create user vcs and group vcs [puppet] - 10https://gerrit.wikimedia.org/r/323972 [22:00:44] (03Draft2) 10Paladox: Phabricator: Create user vcs and group vcs [puppet] - 10https://gerrit.wikimedia.org/r/323972 [22:01:15] (03PS1) 10Chad: Clean up old branches, programatically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323974 [22:01:25] !log reedy@tin Synchronized wmf-config/CommonSettings.php: OATHAuth for more groups and bump their password requirements (duration: 00m 45s) [22:01:32] paladox: I set it a limit on specific repos [22:01:37] Really, most people never need 100mb [22:01:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:01:40] Oh [22:01:40] But a few repos do [22:01:43] So, there we are. [22:01:47] thanks [22:01:56] ostriches i managed to get phabricator to install now [22:01:57] on labs [22:02:02] Just a few errors left [22:02:03] That's nice [22:02:07] mostly related to users [22:02:31] ostriches https://phabricator.wikimedia.org/P4526 [22:02:32] Hmm... That creates groups we don't want on wikis [22:02:58] ostriches i am hopping i fixed it in https://gerrit.wikimedia.org/r/323972 [22:03:02] better fix that [22:04:24] (03PS1) 10Reedy: Only add OATHAuth right to group if it exists on the wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323975 [22:04:51] RECOVERY - Kafka Broker Replica Max Lag on kafka2003 is OK: OK: Less than 50.00% above the threshold [1000.0] [22:05:09] (03CR) 10Reedy: [C: 032] Only add OATHAuth right to group if it exists on the wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323975 (owner: 10Reedy) [22:05:38] (03Merged) 10jenkins-bot: Only add OATHAuth right to group if it exists on the wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323975 (owner: 10Reedy) [22:06:23] (03CR) 10ArielGlenn: [C: 032] move more incremental-related methods out to incr_dumps module [dumps] - 10https://gerrit.wikimedia.org/r/322510 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [22:07:04] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Only add oathauth-enable if the group exists on the wiki (duration: 00m 43s) [22:07:04] (03PS3) 10Reedy: Log users elevated groups on login attempts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 [22:07:10] (03CR) 10Reedy: "Rebased so we can use https://gerrit.wikimedia.org/r/#/c/322094" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 (owner: 10Reedy) [22:07:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:52] 06Operations, 10Icinga, 06Labs, 10Labs-Infrastructure, 10Monitoring: labtestcontrol2001 should not make Icinga page us - https://phabricator.wikimedia.org/T120047#2829166 (10Dzahn) How to check directly on einsteinium in the actually genereated results which checks are paging via the "sms" contact group.... [22:12:06] (03PS2) 10Chad: Clean up old branches, programatically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323974 [22:14:19] 06Operations, 10Icinga, 06Labs, 10Labs-Infrastructure, 10Monitoring: labtestcontrol2001 should not make Icinga page us - https://phabricator.wikimedia.org/T120047#2829180 (10Dzahn) proof that the same service "nova_conductor" gets different contact_groups whether it's on a "test" host or not: ``` [eins... [22:17:47] 06Operations, 06Analytics-Kanban: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2829198 (10RobH) [22:19:06] (03PS4) 10Reedy: Log users elevated groups on login attempts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 [22:20:28] (03PS5) 10Reedy: Log users elevated groups on login attempts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 [22:25:40] (03Abandoned) 10Andrew Bogott: lighttpd nodes: customize lighttpd.tmpfile.conf [puppet] - 10https://gerrit.wikimedia.org/r/323869 (https://phabricator.wikimedia.org/T142932) (owner: 10Andrew Bogott) [22:28:00] (03PS4) 10ArielGlenn: move methods that dump things into the IncrDump class in incr_dump [dumps] - 10https://gerrit.wikimedia.org/r/322511 (https://phabricator.wikimedia.org/T133547) [22:29:40] PROBLEM - configured eth on thorium is CRITICAL: Return code of 255 is out of bounds [22:30:40] RECOVERY - configured eth on thorium is OK: OK - interfaces up [22:33:30] (03PS6) 10Reedy: Log users elevated groups on login attempts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 [22:36:36] (03CR) 10Gergő Tisza: [C: 031] Log users elevated groups on login attempts (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 (owner: 10Reedy) [22:39:13] (03PS7) 10Reedy: Log users elevated groups on login attempts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 [22:39:17] (03CR) 10Reedy: [C: 032] Log users elevated groups on login attempts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 (owner: 10Reedy) [22:40:19] (03Merged) 10jenkins-bot: Log users elevated groups on login attempts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321938 (owner: 10Reedy) [22:43:28] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Log users elevated groups on login attempts (duration: 00m 47s) [22:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:47:15] 06Operations, 06Analytics-Kanban: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2829242 (10RobH) [22:48:00] 06Operations, 06Analytics-Kanban: setup/install thorium/wmf4726 as stat1001 replacement - https://phabricator.wikimedia.org/T151816#2828882 (10RobH) a:05RobH>03Ottomata Assigning this to @Ottomata for his review. The system has been installed and calls into puppet, it can have its info appended into site.... [22:52:50] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:00:54] (03CR) 10Kaldari: [C: 031] Convert wikis to numerical sorting and uca collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323834 (https://phabricator.wikimedia.org/T149002) (owner: 10Niharika29) [23:02:24] (03CR) 1020after4: [C: 032] Clean up old branches, programatically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323974 (owner: 10Chad) [23:09:03] (03PS5) 10ArielGlenn: move methods that dump things into the IncrDump class in incr_dump [dumps] - 10https://gerrit.wikimedia.org/r/322511 (https://phabricator.wikimedia.org/T133547) [23:09:35] Dereckson: hey! you arund? [23:10:54] (03PS7) 10Jdlrobson: Switch MobileFrontend to extension registration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [23:10:59] (03CR) 10Jdlrobson: "rebased" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [23:11:22] (03CR) 10ArielGlenn: [C: 032] move methods that dump things into the IncrDump class in incr_dump [dumps] - 10https://gerrit.wikimedia.org/r/322511 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [23:11:59] (03PS1) 10Reedy: Fix undefined $user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323985 [23:12:02] (03PS3) 10Jdlrobson: Clean unused MobileFrontend variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315985 (owner: 10Dereckson) [23:12:06] Dereckson: any issues with me getting https://gerrit.wikimedia.org/r/#/c/314748 swatted at 4pm? [23:12:12] (03CR) 10Jdlrobson: [C: 031] Clean unused MobileFrontend variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315985 (owner: 10Dereckson) [23:12:48] ^ cc yurik since it touches zero [23:12:48] (03CR) 10Reedy: [C: 032] Fix undefined $user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323985 (owner: 10Reedy) [23:13:02] (03CR) 10Jdlrobson: [C: 031] Switch MobileFrontend to extension registration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [23:13:32] (03Merged) 10jenkins-bot: Fix undefined $user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323985 (owner: 10Reedy) [23:14:06] (03PS3) 10ArielGlenn: add run method to the IncrDump class to be used by the generate wrapper [dumps] - 10https://gerrit.wikimedia.org/r/322512 (https://phabricator.wikimedia.org/T133547) [23:14:08] (03CR) 10Yurik: [C: 031] Clean unused MobileFrontend variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315985 (owner: 10Dereckson) [23:14:40] !log reedy@tin Synchronized wmf-config/CommonSettings.php: fix undefined user (duration: 00m 48s) [23:14:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:24] (03PS1) 10Jforrester: Labs: Don't enable VectorResponsive, it's nowhere near ready yet [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323986 [23:17:50] (03CR) 10ArielGlenn: [C: 032] add run method to the IncrDump class to be used by the generate wrapper [dumps] - 10https://gerrit.wikimedia.org/r/322512 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [23:19:50] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [23:20:29] (03Abandoned) 10Chad: Add open.dblist that covers non-closed and non-deleted wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323873 (owner: 10Chad) [23:20:33] !log demon@tin Synchronized scap/plugins/clean.py: Completeness, testing, etc (duration: 00m 43s) [23:20:36] (03CR) 10Jforrester: "Revert forthcoming in If171ff3." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/225642 (owner: 10Legoktm) [23:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:20:59] (03PS3) 10ArielGlenn: move options specific to adds/changes into args dict [dumps] - 10https://gerrit.wikimedia.org/r/322514 (https://phabricator.wikimedia.org/T133547) [23:23:58] (03CR) 10Mattflaschen: [C: 032] "No commits to this file since Nov 22, 2015" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323986 (owner: 10Jforrester) [23:24:24] (03PS2) 10Chad: MWVersion: Use the version directly from multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321598 [23:24:32] (03Merged) 10jenkins-bot: Labs: Don't enable VectorResponsive, it's nowhere near ready yet [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323986 (owner: 10Jforrester) [23:25:10] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1435 [23:26:21] (03PS2) 10Brian Wolff: Expand Content-Security-Policy on upload test to fr. [puppet] - 10https://gerrit.wikimedia.org/r/318490 (https://phabricator.wikimedia.org/T117618) [23:27:34] (03CR) 10ArielGlenn: [C: 032] move options specific to adds/changes into args dict [dumps] - 10https://gerrit.wikimedia.org/r/322514 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [23:27:38] bblack: I was wondering, if you had some time this week, if you could look at https://gerrit.wikimedia.org/r/#/c/318490/ ? I'd like to extend the test of using CSP on uploads to some larger wikis, since it seemed to work fine on the smaller one [23:27:57] !log mattflaschen@tin Synchronized wmf-config/InitialiseSettings-labs.php: Beta Cluster only (duration: 00m 44s) [23:28:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:14] (03PS3) 10ArielGlenn: Change last few config options from 'incr' to 'misc' [dumps] - 10https://gerrit.wikimedia.org/r/322515 (https://phabricator.wikimedia.org/T133547) [23:30:06] (03CR) 10Chad: [C: 032] MWVersion: Use the version directly from multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321598 (owner: 10Chad) [23:30:10] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1735 [23:30:44] (03Merged) 10jenkins-bot: MWVersion: Use the version directly from multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321598 (owner: 10Chad) [23:32:53] (03CR) 10ArielGlenn: [C: 032] Change last few config options from 'incr' to 'misc' [dumps] - 10https://gerrit.wikimedia.org/r/322515 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [23:33:43] jdlrobson: if you're available to test it with me, yes we can schedule it for swat [23:33:49] !log demon@tin Synchronized w/: (no message) (duration: 00m 48s) [23:33:59] Dereckson: lets do it! [23:33:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:09] Dereckson: do you want me to add the patches to wikitech ? [23:34:23] https://gerrit.wikimedia.org/r/#/c/315985 and https://gerrit.wikimedia.org/r/#/c/315985 [23:34:25] * Dereckson nods. [23:34:33] on it! [23:34:51] I can focus on config variables, logs while you'll check everything is still fine at features level [23:35:08] Yay it didn't break [23:35:10] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2035 [23:35:14] * ostriches wipes sweat off brow [23:36:26] (03PS3) 10ArielGlenn: a bit of pylint: order of imports, var initialization type whines [dumps] - 10https://gerrit.wikimedia.org/r/322513 [23:36:41] Dereckson: scheduled :) https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1036985&oldid=1036940 [23:37:27] (03CR) 10ArielGlenn: [C: 032] a bit of pylint: order of imports, var initialization type whines [dumps] - 10https://gerrit.wikimedia.org/r/322513 (owner: 10ArielGlenn) [23:37:37] ostriches: … yet. [23:37:47] Eh, it would've broken fast and hard [23:37:49] Or not at all [23:38:07] jouncebot: refresh [23:38:09] I refreshed my knowledge about deployments. [23:40:10] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2335 [23:40:45] (03Abandoned) 10ArielGlenn: move some methods into miscdumpslib that will be reused for other misc dumps [dumps] - 10https://gerrit.wikimedia.org/r/322453 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [23:45:10] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2635 [23:47:33] (03PS2) 10Chad: Move 2 web entry points to w/ where they belong [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321599 [23:48:41] (03PS1) 10Chad: Remove default.html [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323990 [23:49:00] 06Operations, 10ops-codfw, 06DC-Ops: ms-be2025 controller failure - https://phabricator.wikimedia.org/T151201#2829400 (10fgiunchedi) @papaul I think the culprit might be a failing / failed cache module, see also ``` Cache Status: Permanently Disabled Cache Status Details: Cache disabled; battery/capacitor i... [23:50:10] RECOVERY - check_mysql on lutetium is OK: Uptime: 31684 Threads: 4 Questions: 5941567 Slow queries: 61 Opens: 1392203 Flush tables: 2 Open tables: 64 Queries per second avg: 187.525 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [23:51:32] (03PS3) 10Filippo Giunchedi: prometheus: add vhtcpd stats via node-exporter [puppet] - 10https://gerrit.wikimedia.org/r/323559 (https://phabricator.wikimedia.org/T147429) [23:54:39] (03CR) 10Chad: [C: 032] Move 2 web entry points to w/ where they belong [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321599 (owner: 10Chad) [23:56:04] (03Merged) 10jenkins-bot: Move 2 web entry points to w/ where they belong [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321599 (owner: 10Chad) [23:56:39] !log demon@tin Started scap: moving some stuff around, pruned old branches too [23:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log