[02:10:51] 06Operations, 10Ops-Access-Requests: Access needed for people.wikimedia.org for showcasing - https://phabricator.wikimedia.org/T143465#2569112 (10Legoktm) You know you can also use tools.wmflabs.org for this kind of stuff too? [02:23:24] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.15) (duration: 10m 26s) [02:23:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:29:09] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Aug 22 02:29:09 UTC 2016 (duration 5m 45s) [02:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:25:26] RECOVERY - MariaDB Slave Lag: s2 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.22 seconds [05:46:15] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL - No data received from host [05:46:52] Platonides: FP *!*@kartik.lustfield.net [05:50:05] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.064 second response time [06:27:56] RECOVERY - Disk space on scb1001 is OK: DISK OK [06:44:06] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 531 bytes in 0.055 second response time [06:52:05] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.051 second response time [06:57:43] (03CR) 10Muehlenhoff: [C: 031] "Looks fine. The additional diskspace for the font packages is negligable, even if they might be entirely unnecessary on API servers." [puppet] - 10https://gerrit.wikimedia.org/r/231284 (https://phabricator.wikimedia.org/T84777) (owner: 10Dzahn) [07:08:07] (03PS4) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/283201 (https://phabricator.wikimedia.org/T132529) [07:23:12] 06Operations, 10Packaging, 10Phabricator: upload php-mailparse and python-phabricator to jessie - https://phabricator.wikimedia.org/T138689#2407363 (10MoritzMuehlenhoff) @mmodell: The php-mailparse package currently in Debian stretch uses a fairly recent modern packaging and a few build dependencies which a... [07:25:06] 06Operations, 10Packaging, 10Phabricator: upload php-mailparse and python-phabricator to jessie - https://phabricator.wikimedia.org/T138689#2570992 (10mmodell) @MoritzMuehlenhoff: That's what I was trying to suggest with my previous comment. What we have now works, though I haven't tried to compile it for je... [07:29:22] 06Operations, 10Packaging, 10Phabricator: upload php-mailparse and python-phabricator to jessie - https://phabricator.wikimedia.org/T138689#2570995 (10MoritzMuehlenhoff) @mmodell : Sure, I'll take care of that [07:32:14] 06Operations, 05codfw-rollout: Turn on etcd TLS for intra-cluster communications - https://phabricator.wikimedia.org/T135128#2570997 (10Joe) [07:32:16] 06Operations, 07Technical-Debt, 05codfw-rollout: Reduce etcd technical debt - https://phabricator.wikimedia.org/T135122#2570998 (10Joe) [07:32:18] 06Operations, 13Patch-For-Review: Create backup/restore scripts for etcd - https://phabricator.wikimedia.org/T135129#2570996 (10Joe) 05Open>03Resolved [07:40:24] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 355 bytes in 0.004 second response time [07:50:15] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.093 second response time [08:03:43] 06Operations, 07HHVM: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2571031 (10Joe) [08:18:25] 06Operations, 07HHVM: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2571048 (10Joe) Number of systems to reimage: - Jobrunners: 25 - Appservers: 123 - Api: 97 - Videoscalers: 3 - Script/deployment servers: 4 So we need to reimage 252 servers. Videoscalers and the script... [08:22:47] 06Operations, 07HHVM: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2571031 (10MoritzMuehlenhoff) Six of the image scalers in codfw also need to be reimaged; all except mw208[67] [08:23:58] (03PS1) 10Muehlenhoff: package_builder: Add pkg-php-tools to list of installed packages [puppet] - 10https://gerrit.wikimedia.org/r/305963 [08:24:00] (03PS1) 10Muehlenhoff: package_builder: Add dh-php5 to list of installed packages [puppet] - 10https://gerrit.wikimedia.org/r/305964 [08:24:02] (03PS1) 10Muehlenhoff: package_builder: Add php5-dev to list of installed packages [puppet] - 10https://gerrit.wikimedia.org/r/305965 [08:26:33] (03CR) 10Muehlenhoff: [C: 032] package_builder: Add pkg-php-tools to list of installed packages [puppet] - 10https://gerrit.wikimedia.org/r/305963 (owner: 10Muehlenhoff) [08:27:07] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM: Beta-cluster web server fills up /var/log with Apache logs - https://phabricator.wikimedia.org/T75262#2571091 (10hashar) 05Open>03Resolved a:03hashar [08:27:20] (03CR) 10Muehlenhoff: [C: 032] package_builder: Add dh-php5 to list of installed packages [puppet] - 10https://gerrit.wikimedia.org/r/305964 (owner: 10Muehlenhoff) [08:28:02] (03CR) 10Muehlenhoff: [C: 032] package_builder: Add php5-dev to list of installed packages [puppet] - 10https://gerrit.wikimedia.org/r/305965 (owner: 10Muehlenhoff) [08:30:54] 06Operations, 06WMF-Legal, 06WMF-NDA-Requests: ZhouZ needs access to WMF-NDA group - https://phabricator.wikimedia.org/T98722#2571095 (10Aklapper) >>! In T98722#2544134, @Qgil wrote: > What about a task / email by their manager? It should be simple to verify the manager's username/email. Task comment, prob... [08:32:10] 06Operations, 06Services, 15User-mobrovac: Move all Node.JS services to Jessie and Node 4 - https://phabricator.wikimedia.org/T124989#2571097 (10hashar) Rest of services currently on SCA will be migrated to Jessie / SCB via T96017. Feel free to close this T124989 since all the other sub tasks have been comp... [08:33:37] (03CR) 10Filippo Giunchedi: [C: 031] mediawiki: include fonts in role::mediawiki::webserver [puppet] - 10https://gerrit.wikimedia.org/r/231284 (https://phabricator.wikimedia.org/T84777) (owner: 10Dzahn) [08:36:23] 06Operations, 10Packaging, 10Phabricator: upload php-mailparse and python-phabricator to jessie - https://phabricator.wikimedia.org/T138689#2571099 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff php-mailparse 2.1.6-1~jessie1 has been built for jessie-wikimedia and uploaded to carbon. Clo... [08:39:34] (03PS1) 10Giuseppe Lavagetto: puppetmaster: install conftool [puppet] - 10https://gerrit.wikimedia.org/r/305967 [08:39:39] 06Operations, 10Graphite, 06Labs: lots of graphite metrics under "instances" created - https://phabricator.wikimedia.org/T143405#2571110 (10fgiunchedi) @yuvipanda the above would remove also recent files, sth like `find . -type f -mtime +672 -delete` and delete empty directories too afterwards [08:40:55] RECOVERY - HHVM jobrunner on mw1162 is OK: HTTP OK: HTTP/1.1 200 OK - 222 bytes in 0.015 second response time [08:41:09] (03PS2) 10Giuseppe Lavagetto: puppetmaster: install conftool [puppet] - 10https://gerrit.wikimedia.org/r/305967 [08:41:11] !log restarted hhvm on mw1162, was deadlocked [08:41:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:43:20] <_joe_> !log restarting hhvm on mw1278, deadlock in HPHP::Treadmill::getAgeOldestRequest [08:43:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:51:32] (03PS2) 10Muehlenhoff: statsd proxy: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/303532 [08:52:09] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster: install conftool [puppet] - 10https://gerrit.wikimedia.org/r/305967 (owner: 10Giuseppe Lavagetto) [08:54:26] (03PS3) 10Muehlenhoff: statsd proxy: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/303532 [08:55:15] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.027 second response time [08:58:06] (03CR) 10Muehlenhoff: [C: 032] statsd proxy: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/303532 (owner: 10Muehlenhoff) [09:06:22] (03PS1) 10Giuseppe Lavagetto: wmflib: fix typo in conftool function [puppet] - 10https://gerrit.wikimedia.org/r/305968 [09:07:11] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] wmflib: fix typo in conftool function [puppet] - 10https://gerrit.wikimedia.org/r/305968 (owner: 10Giuseppe Lavagetto) [09:12:32] !log stopping db2034 for cloning and reimage [09:12:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:16:09] there will be mediawiki errors regarding 10.192.32.5, ignore those, they are not user-facing [09:21:37] (03PS1) 10Muehlenhoff: Kafka brokers: Limit access to production and fundraising networks [puppet] - 10https://gerrit.wikimedia.org/r/305969 [09:21:45] so ignore those 15 errors/minute [09:23:18] (03PS5) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/283201 (https://phabricator.wikimedia.org/T132529) [09:27:26] <_joe_> grr I underestimated the complexity of interacting tools and puppet-compiler [09:27:48] (03PS1) 10Jcrespo: Reinstall db2034 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/305970 [09:41:28] Platonides: hello, could you add the patch to one of the SWAT windows at https://wikitech.wikimedia.org/wiki/Deployments#Week_of_August_22nd ? <-- 20:29:45 < Platonides> I guess I can +2 my own patch backported by another user? 20:31:21 < bd.808> a backport to a release branch typically would only be merged during a SWAT release [09:49:15] (03PS1) 10Giuseppe Lavagetto: puppet_compiler: allow compiling manifests using the conftool function [puppet] - 10https://gerrit.wikimedia.org/r/305971 [09:54:51] (03PS1) 10Filippo Giunchedi: mariadb: add mysql/node prometheus metrics for db2034 [puppet] - 10https://gerrit.wikimedia.org/r/305972 (https://phabricator.wikimedia.org/T126757) [09:56:45] jynus: ^ [09:57:12] (03PS2) 10Giuseppe Lavagetto: puppet_compiler: allow compiling manifests using the conftool function [puppet] - 10https://gerrit.wikimedia.org/r/305971 [09:58:08] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppet_compiler: allow compiling manifests using the conftool function [puppet] - 10https://gerrit.wikimedia.org/r/305971 (owner: 10Giuseppe Lavagetto) [09:59:08] oh, I crashed my browser with my first prometheus query :-/ [10:01:06] heehe oops, too much data I guess [10:02:55] (03CR) 10Jcrespo: [C: 032] mariadb: add mysql/node prometheus metrics for db2034 [puppet] - 10https://gerrit.wikimedia.org/r/305972 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [10:03:01] (03PS2) 10Jcrespo: mariadb: add mysql/node prometheus metrics for db2034 [puppet] - 10https://gerrit.wikimedia.org/r/305972 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [10:13:25] (03PS2) 10Jcrespo: Reinstall db2034 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/305970 [10:16:20] (03CR) 10Jcrespo: [C: 032] Reinstall db2034 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/305970 (owner: 10Jcrespo) [10:17:32] !log starting rolling restart of elasticearch eqiad for JVM and elasticsearch upgrade [10:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:19:44] (03PS1) 10Filippo Giunchedi: mariadb: collect prometheus stats in codfw [puppet] - 10https://gerrit.wikimedia.org/r/305976 (https://phabricator.wikimedia.org/T126757) [10:24:22] (03CR) 10Jcrespo: [C: 031] mariadb: collect prometheus stats in codfw [puppet] - 10https://gerrit.wikimedia.org/r/305976 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [10:24:36] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [10:25:33] (03CR) 10Filippo Giunchedi: [C: 032] mariadb: collect prometheus stats in codfw [puppet] - 10https://gerrit.wikimedia.org/r/305976 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [10:25:42] (03PS2) 10Filippo Giunchedi: mariadb: collect prometheus stats in codfw [puppet] - 10https://gerrit.wikimedia.org/r/305976 (https://phabricator.wikimedia.org/T126757) [10:31:28] (03CR) 10Gehel: "Yes, relforge ends up not being monitored (bad). A specific role for relforge is waiting for review in https://gerrit.wikimedia.org/r/#/c/" [puppet] - 10https://gerrit.wikimedia.org/r/305519 (https://phabricator.wikimedia.org/T133844) (owner: 10Gehel) [10:35:57] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Traffic, 13Patch-For-Review: ULS GeoIP should not use meta.wm.o/geoiplookup - https://phabricator.wikimedia.org/T143270#2571251 (10Nikerabbit) [10:49:50] (03PS6) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/283201 (https://phabricator.wikimedia.org/T132529) [10:52:15] 06Operations, 06Services, 15User-mobrovac: Migrate SCA cluster to SCB (Jessie and Node 4.2) - https://phabricator.wikimedia.org/T96017#2571268 (10mobrovac) [10:52:24] 06Operations, 06Services, 15User-mobrovac: Move all Node.JS services to Jessie and Node 4 - https://phabricator.wikimedia.org/T124989#2571269 (10mobrovac) 05Open>03Resolved a:03mobrovac [10:58:54] (03PS7) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/283201 (https://phabricator.wikimedia.org/T132529) [11:00:04] akosiaris: Respected human, time to deploy wikidiff2 upgradeT140443 (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160822T1100). Please do the needful. [11:00:04] MaxSem: A patch you scheduled for wikidiff2 upgradeT140443 is about to be deployed. Please be available during the process. [11:02:38] (03CR) 10Giuseppe Lavagetto: [C: 032] scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/283201 (https://phabricator.wikimedia.org/T132529) (owner: 10Giuseppe Lavagetto) [11:09:26] PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: puppet fail [11:11:44] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: puppet fail [11:16:27] (03PS1) 10ArielGlenn: scheduler: on HUP, clean up zombies after killing child processes [dumps] - 10https://gerrit.wikimedia.org/r/305982 [11:19:05] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail [11:20:44] PROBLEM - puppet last run on bast3001 is CRITICAL: CRITICAL: puppet fail [11:32:05] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [11:32:48] <_joe_> mira is my fault [11:32:52] <_joe_> maybe the others too [11:33:21] <_joe_> yeah [11:33:29] <_joe_> at least I just found out how to fix those [11:33:45] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: puppet fail [11:35:06] <_joe_> uhm will take a bit longer than expected, let me revert for now [11:35:17] (03PS1) 10Giuseppe Lavagetto: Revert "scap: use conftool data to populate dsh groups" [puppet] - 10https://gerrit.wikimedia.org/r/305985 [11:35:33] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Revert "scap: use conftool data to populate dsh groups" [puppet] - 10https://gerrit.wikimedia.org/r/305985 (owner: 10Giuseppe Lavagetto) [11:40:15] RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:45:17] https://phabricator.wikimedia.org/T143545 <-- if someone can have a look at this it would be appreciated [11:45:43] I cannot understand people finding autowelcomes so harmful [11:45:44] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:45:51] but still... [11:46:10] Vito, how is that related to Operations ? [11:47:17] RECOVERY - puppet last run on bast3001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [11:48:31] Vito: #pywikibot instead? [11:51:34] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Traffic, 13Patch-For-Review: ULS GeoIP should not use meta.wm.o/geoiplookup - https://phabricator.wikimedia.org/T143270#2571316 (10BBlack) Ah, yes, I see now for country_code it does via https://github.com/wikimedia/mediawiki-extensions-Univ... [11:55:42] (03CR) 10BBlack: [C: 032] letsencrypt: de-duplicate subjects in acme-setup [puppet] - 10https://gerrit.wikimedia.org/r/305833 (owner: 10Alex Monk) [11:55:48] (03PS2) 10BBlack: letsencrypt: de-duplicate subjects in acme-setup [puppet] - 10https://gerrit.wikimedia.org/r/305833 (owner: 10Alex Monk) [11:55:55] (03CR) 10BBlack: [V: 032] letsencrypt: de-duplicate subjects in acme-setup [puppet] - 10https://gerrit.wikimedia.org/r/305833 (owner: 10Alex Monk) [11:57:24] !log restbase deploy start of e9e5ff1 [11:57:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:00:14] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2884 [12:00:16] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [12:00:44] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [12:01:27] (03PS1) 10Ema: cache_misc: Allow PATCH requests for grafana [puppet] - 10https://gerrit.wikimedia.org/r/305990 [12:03:20] Hallo [12:03:41] European time SWAT begins in an hour from now, does it? [12:03:55] andre__: lol, I wrote that in the wrong window [12:05:15] RECOVERY - check_mysql on fdb2001 is OK: Uptime: 2229727 Threads: 1 Questions: 92375233 Slow queries: 13312 Opens: 4406 Flush tables: 2 Open tables: 546 Queries per second avg: 41.428 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 126 [12:08:24] (03CR) 10Hashar: "The 30 minutes timeout is hardcoded in the job. We would want a way to vary it depending on the Gerrit project for which the build is tri" [debs/contenttranslation/giella-sme] - 10https://gerrit.wikimedia.org/r/294430 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [12:08:49] (03PS2) 10Ema: cache_misc: Allow PATCH requests [puppet] - 10https://gerrit.wikimedia.org/r/305990 [12:09:04] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:10:25] !log restbase deploy end of e9e5ff1 [12:10:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:10:38] Platonides, Glaisher: why was kartik banned? [12:11:16] 07.26 -!- mode/#wikimedia-operations [+b *!*@kartik.lustfield.net] by Platonides [12:11:26] must be the bot [12:11:55] aharoni: yup [12:13:18] Nemo_bis: which bot? and why is banning real people? [12:14:32] aharoni: yes it does [12:15:14] 06Operations, 10hardware-requests: codfw/eqiad:(4+4) hardware access request for ORES - https://phabricator.wikimedia.org/T142578#2539946 (10mark) @RobH: time frame is soon, no particular deadline but also no reason to delay this. [12:17:04] Nikerabbit: there are some false positives, hopefully a temporary thing until the triggering vandal gets sick [12:17:14] (I don't remember who can whitelist people, other than Platonides ) [12:23:48] (03PS1) 10Muehlenhoff: Use DB_LOG_AUTOREMOVE for openldap database [puppet] - 10https://gerrit.wikimedia.org/r/305992 (https://phabricator.wikimedia.org/T143302) [12:38:26] I've CR+2 the two wmf15 changes, so they weill be ready for SWAT already merged by Zuul. [12:52:39] for those tuning us now, remember there are mediawiki errors from db 10.192.32.5 that you should ignore [12:53:57] good afternoon [12:54:11] Hello. [12:56:16] (03CR) 10Filippo Giunchedi: [C: 031] "thanks Ema! FWIW I noticed this when trying to update grafana's config via the webui and it'd fail on PATCH" [puppet] - 10https://gerrit.wikimedia.org/r/305990 (owner: 10Ema) [12:58:42] Dereckson: I am just back from vacations so I am rather rusty on the deployment front :D [13:00:04] hashar, Dereckson, addshore, and aude: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160822T1300). [13:00:04] matt_flaschen, Urbanecm, Dereckson, Addshore, and Aharoni: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:17] Around [13:00:42] Dereckson: are you taking care of the couple wmf-15 changes? [13:00:48] *waves* [13:01:13] Present [13:01:21] hashar one is for addshore [13:01:31] er for Amir1 [13:01:52] proceed with the AS_HOOK_ERROR so ? :) [13:02:16] hey, I'm not officially a deployer yet [13:02:26] No one told me or added my name [13:02:30] * hashar brings deployer 101 handbook to Amir1 [13:02:33] now you are :) [13:02:47] Thanks :) [13:03:06] I guess if you are able to check the outcome of deployement, that is all what matters for now [13:03:07] Amir1: https://wikitech.wikimedia.org/wiki/How_to_deploy_code is pretty good as reference for extension backports [13:03:17] I checked the check list and the only thing I need is gerrit right for wmf-deployment [13:03:26] the intent is to eventually have everyone to be able to deploy during swat windows [13:03:27] I already have deployment operational rights [13:03:48] oh, Amir1 you need adding to the gerrit group? [13:03:53] yup [13:04:01] i believe all of us can do that [13:04:25] matt_flaschen: how long is your Flow scripts going to take ? [13:04:52] Dereckson: can you push the wmf.15 patches you had +2ed for Amir1 and yourself? [13:04:59] Amir1: your not in the deployment ldap group! :) [13:05:15] addshore: it wasn't in the check list there [13:05:19] Or I forgot [13:05:22] let me check [13:05:23] hashar: sure [13:05:42] but Amir1 none of that matters anyway as this is swat and others are here to deploy your changes :) [13:05:45] hashar, I think like 10 minutes. [13:05:55] hashar: before maybe do the config changes as they're more lightweight to test [13:06:07] https://wikitech.wikimedia.org/wiki/SWAT_deploys#New_SWAT_Team_member_check-list [13:06:25] ok lets do the config changes [13:06:37] Amir1: point 2 on that list :) "Shell and deploy access in production" [13:06:52] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305777 (https://phabricator.wikimedia.org/T143397) (owner: 10Urbanecm) [13:06:54] addshore: yeah, I already requested SWAT deploys before :) [13:06:54] (03CR) 10Hashar: [C: 032] "swat" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305777 (https://phabricator.wikimedia.org/T143397) (owner: 10Urbanecm) [13:07:29] (03Merged) 10jenkins-bot: Enable transwiki upload for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305777 (https://phabricator.wikimedia.org/T143397) (owner: 10Urbanecm) [13:07:52] https://github.com/wikimedia/operations-puppet/blob/production/modules/admin/data/data.yaml#L47 [13:07:56] isn't it enough? [13:07:56] Hallo. [13:08:21] yo Amir1 much confusion between Amirs today ;) [13:08:34] I have a SWAT patch sheduled. [13:08:35] I can login to tin, I can run mwscript (I didn't try it though) and mistakenly I logged in to terbuim [13:08:40] aharoni: ohhhh [13:09:02] I see, I thought it's about adding me to deployers [13:09:06] have fun you all! [13:09:10] :D [13:09:18] Urbanecm: your change is live on mw1099 [13:09:33] oh [13:09:37] I have just ran scap [13:09:40] sorry [13:09:55] not sure if there is much way to check whether it works properly via mw1099 is there? [13:09:56] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable transwiki upload for tcywiki T143397 (duration: 00m 58s) [13:09:57] T143397: Enable import in Tulu Wikipedia - https://phabricator.wikimedia.org/T143397 [13:10:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:10:23] A minimal check is Special:Import doesn't throw an exception I guess. [13:10:36] (03PS1) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/305996 (https://phabricator.wikimedia.org/T132529) [13:10:37] but indeed, that needs someone with the rights on tcy to check [13:10:56] It gives me You do not have permission to import pages from another wiki, for the following reason: [13:10:56] The action you have requested is limited to users in one of the groups: Administrators, Importers, Transwiki importers. [13:11:01] (03PS2) 10Hashar: [cleanup] Remove old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305780 (owner: 10Urbanecm) [13:11:17] yeah you are not in the group [13:11:22] I have no right to import so I can't check it fully. Should I ask them at the phab task? [13:11:24] hashar: I know :) [13:11:29] I guess you can reply on the task that it is now deployed [13:11:30] Urbanecm: ask to the original poster on Phabricator to check if it works [13:11:35] Okay [13:11:35] and one admin / importer on that wiki can confirm [13:11:38] Urbanecm: and move the task to the Done column [13:11:44] Urbanecm: once verified, you mark it resolved [13:11:46] workin g on it.. [13:11:58] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305780 (owner: 10Urbanecm) [13:12:16] Dereckson and hashar: Is it deployed everywhere? [13:12:24] yes [13:12:25] (03Merged) 10jenkins-bot: [cleanup] Remove old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305780 (owner: 10Urbanecm) [13:12:27] I have copy pasted the change into https://etherpad.wikimedia.org/p/SWAT [13:12:31] Okay, thanks. [13:12:59] live on mw1099: Remove old throttle rules [mediawiki-config] - https://gerrit.wikimedia.org/r/305780 [13:13:14] PROBLEM - Host db2034 is DOWN: PING CRITICAL - Packet loss = 100% [13:13:42] Dereckson: Asked. [13:13:54] Fine. [13:13:56] hashar: How can I test it? I can't change the time... [13:14:01] aharoni: your change is live on mw1099 [13:14:16] the code is fine urbanecm :) [13:14:34] it is just to manually verify that nothing goes wrong due to some typo that humans/CI would have missed I guess [13:14:37] Urbanecm: for misc config changes, going to the wiki and check it doesn't thrown a fatal error [13:14:37] Dereckson: is it live on any actual Wikipedia? [13:14:46] aharoni: no, only on mw1099.eqiad.wmnet [13:14:54] db2034 downtime has expired [13:14:55] RECOVERY - Host db2034 is UP: PING OK - Packet loss = 0%, RTA = 36.40 ms [13:14:58] aharoni: you need to follow https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug to test it [13:15:13] while I hadn't complete its install [13:15:16] (03PS1) 10KartikMistry: Deploy Compact Language Links out of beta for Tulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305997 [13:15:21] I will renew its alerts [13:15:23] Dereckson: well, let me try... I can only see that the code is live. [13:15:24] aharoni: there is an extension for Firefox and one for Chrome, you install it, pick mw1099 as server, turn the button on [13:15:32] yeah, I know, let's see... [13:16:12] alert's acks [13:16:18] jynus: thanks ! [13:16:33] !log hashar@tin Synchronized wmf-config/throttle.php: [cleanup] Remove old throttle rules (duration: 00m 48s) [13:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:48] Thanks for your deploys Dereckson and hashar [13:16:49] ! [13:16:51] Amir1: [cleanup] Remove old throttle rules [13:16:54] you're welcome [13:16:56] Amir1: https://chrome.google.com/webstore/detail/wikimediadebug/binmakecefompkjggiklgjenddjoifbb [13:17:05] Dereckson: are you sure it's deployed on mw1099? I don't see any difference there [13:17:08] Urbanecm: thank you for the config tweaks [13:17:14] Nikerabbit: let me check [13:17:16] You're welcome hashar [13:17:33] hashar: I guess you mean aharoni [13:17:40] (03PS3) 10Hashar: Restrict local upload on ar.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305573 (https://phabricator.wikimedia.org/T142450) (owner: 10Dereckson) [13:17:42] I don't have any patches for today SWAT [13:17:45] oh sorry [13:18:14] Dereckson: OK, I enabled the extension and selected mw 1099 [13:18:20] can I now test it on a live Wikipedia? [13:18:48] aharoni: yes [13:19:03] hmm, I still don't see a change on https://pl.wikipedia.org/wiki/Germanie . The error is still there. [13:19:08] Shall I wait a minute and test again? [13:19:14] doing the couple config changes that dereckson has proposed [13:19:34] (03CR) 10Hashar: [C: 032] Restrict local upload on ar.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305573 (https://phabricator.wikimedia.org/T142450) (owner: 10Dereckson) [13:19:39] aharoni: Nikerabbit: fixed [13:19:50] Nikerabbit: thanks to have noticed that [13:20:01] (03Merged) 10jenkins-bot: Restrict local upload on ar.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305573 (https://phabricator.wikimedia.org/T142450) (owner: 10Dereckson) [13:20:15] Dereckson: Nikerabbit Aha! Seems fixed. [13:20:22] Let me test another little something... [13:21:04] Yeah, OK, with the X-Wikimedia-Debug on it all seems fixed. [13:21:09] works for me too [13:21:18] live on mw1099 Restrict local upload on ar.wikipedia [mediawiki-config] - https://gerrit.wikimedia.org/r/305573 [13:21:46] This way of testing is pretty awesome, actually. First time I'm doing it. [13:22:08] hashar: ar works for upload navigation link, and for user barred to upload [13:22:14] hashar: I'll ask on the task for other group check [13:22:23] so looks good to me [13:22:51] The action you have requested is limited to users in one of the groups: Administrators, Editors, Reviewers, Uploaders. [13:22:54] seems good [13:23:17] yeah looks good to me [13:23:27] mw1099 points me to upload wizard or some other helper [13:23:53] that's a local help page explaining to go to the wizard or if rights to the local page [13:24:40] yup [13:24:45] which confirms the link is fine as well [13:24:47] scappino it [13:24:51] scapping it [13:25:09] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Restrict local upload on ar.wikipedia - T142450 (duration: 00m 49s) [13:25:11] T142450: create "uploader" permission in arwiki - https://phabricator.wikimedia.org/T142450 [13:25:11] (03PS2) 10Hashar: Enable WikidataPageBanner on ro.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304938 (https://phabricator.wikimedia.org/T142963) (owner: 10Dereckson) [13:25:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:25:28] https://gerrit.wikimedia.org/r/#/c/304938/ Enable WikidataPageBanner on ro.wikivoyage [13:25:32] I guess that is for aude as well [13:25:58] I dont even know what WikidataPageBanner is for :( [13:26:14] (03CR) 10Hashar: [C: 032] Enable WikidataPageBanner on ro.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304938 (https://phabricator.wikimedia.org/T142963) (owner: 10Dereckson) [13:26:40] (03Merged) 10jenkins-bot: Enable WikidataPageBanner on ro.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304938 (https://phabricator.wikimedia.org/T142963) (owner: 10Dereckson) [13:27:08] it is on mw1099 now [13:27:54] Testing ro.wikivoayge change. [13:28:25] * hashar tries https://ro.wikivoyage.org/wiki/Franța [13:28:48] So now that my change is deployed on mw1099 and I tested it and found that it works, what's the next step? Will it be deployed to production during the window? [13:29:02] aharoni: I'll deploy it just after hashar is done with the config changes [13:30:04] I dont even know how to test it :( [13:30:08] can't find a good page candidate [13:30:33] ah https://ro.wikivoyage.org/wiki/Rom%C3%A2nia [13:30:36] https://ro.wikivoyage.org/wiki/Salzburg doesn't seem to work [13:31:07] :( [13:31:15] hashar: can I help? [13:31:28] yeah [13:31:29] Enable WikidataPageBanner on ro.wikivoyage [mediawiki-config] - https://gerrit.wikimedia.org/r/304938 [13:31:33] is deployed on mw1099 [13:31:41] trying to figure out whether it works properly on ro.wikivoyage.org [13:31:57] ah [13:32:02] it works when we specify an image [13:32:10] it's the fetch image from wikidata property part which doesn't work [13:32:14] https://ro.wikivoyage.org/wiki/Salzburg [13:32:35] hashar: looking [13:32:43] Dereckson: I got the image showing [13:33:33] looks like the basic works and the image property can be figured out later? [13:33:51] * Dereckson nods [13:33:54] I see an image on Salzburg with the X-debug enabled, and no image without it. [13:34:05] echo $wgWPBBannerProperty [13:34:06] P948 [13:34:07] P948 (An Untitled Masterwork) - https://phabricator.wikimedia.org/P948 [13:34:15] aharoni: yes but I manually specified the image [13:34:50] report your finding on T142963 [13:34:50] T142963: Please enable WikidataPageBanner on ro.wikivoyage - https://phabricator.wikimedia.org/T142963 [13:34:55] and we can follow up after the window [13:34:55] aharoni: normally if you write it without argument, it should pick the value of the $wgWPBBannerProperty at https://www.wikidata.org/wiki/Q34713 [13:35:09] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable WikidataPageBanner on ro.wikivoyage - T142963 (duration: 00m 49s) [13:35:09] T142963: Please enable WikidataPageBanner on ro.wikivoyage - https://phabricator.wikimedia.org/T142963 [13:35:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:35:37] (03PS3) 10Hashar: Add Collection render note for articles & rdf2latex [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305012 (https://phabricator.wikimedia.org/T135613) (owner: 10Addshore) [13:35:55] addshore: any clue how to test ^^^ ? [13:36:01] yup, I can do it :) [13:36:03] is it on 1099? [13:36:12] soon [13:36:18] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305012 (https://phabricator.wikimedia.org/T135613) (owner: 10Addshore) [13:36:45] (03Merged) 10jenkins-bot: Add Collection render note for articles & rdf2latex [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305012 (https://phabricator.wikimedia.org/T135613) (owner: 10Addshore) [13:36:48] * hashar updates https://etherpad.wikimedia.org/p/SWAT [13:37:07] addshore: it is on mw1099 [13:37:27] ack, checked and looks good to roll out everywhere [13:37:43] matt_flaschen: we will next do the couple wmf.15 patches and then get to your flow / script thing. [13:37:55] hashar, ok, thanks. [13:38:21] addshore: scapping [13:38:33] hashar: great! [13:38:41] aharoni: is your Compact Language Links: Apply toLowerCase when reading featured articles (task TT143527) working on mw1099 ? [13:38:49] if so lets deploy it [13:39:05] !log hashar@tin Synchronized wmf-config/CommonSettings.php: Add Collection render note for articles & rdf2latex -t - T135613 (duration: 00m 49s) [13:39:06] T135613: [GTWL] Include hint about excluded tables when generating a PDF - https://phabricator.wikimedia.org/T135613 [13:39:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:39:10] [= [13:39:12] Glaisher: I'm back, for when you want to test that :) [13:39:13] hashar: okay I'm going to deploy the two next changes for wmf [13:39:17] (03PS2) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/305996 (https://phabricator.wikimedia.org/T132529) [13:39:24] * mafk is fusing his brains building a regexp [13:39:25] Dereckson: hold on [13:39:27] scap complains on canaries [13:39:58] =o [13:40:08] na it is me [13:40:12] did a wrong command bah [13:40:19] (03CR) 10Ema: [C: 032] cache_misc: Allow PATCH requests [puppet] - 10https://gerrit.wikimedia.org/r/305990 (owner: 10Ema) [13:40:52] !log hashar@tin Synchronized wmf-config/CommonSettings.php: Add Collection render note for articles rdf2latex -t - T135613 (duration: 00m 48s) [13:40:52] T135613: [GTWL] Include hint about excluded tables when generating a PDF - https://phabricator.wikimedia.org/T135613 [13:40:55] Dereckson: all good for the wmf.15 couple patches :} [13:40:56] srry [13:40:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:40:58] (03CR) 10ArielGlenn: [C: 032] scheduler: on HUP, clean up zombies after killing child processes [dumps] - 10https://gerrit.wikimedia.org/r/305982 (owner: 10ArielGlenn) [13:41:02] !log dereckson@tin Started scap: php-1.28.0-wmf.15/extensions/UniversalLanguageSelector / resources/js/ext.uls.compactlinks.js Apply toLowerCase when reading featured articles (T143527) [13:41:03] T143527: nds-nl is not consistently lowercased in compact language links - https://phabricator.wikimedia.org/T143527 [13:41:06] !log dereckson@tin scap aborted: php-1.28.0-wmf.15/extensions/UniversalLanguageSelector / resources/js/ext.uls.compactlinks.js Apply toLowerCase when reading featured articles (T143527) (duration: 00m 04s) [13:41:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:41:07] T143527: nds-nl is not consistently lowercased in compact language links - https://phabricator.wikimedia.org/T143527 [13:41:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:41:52] hashar: yes, it works through mw1099 [13:41:56] \O/ [13:42:10] (03PS3) 10Hashar: Set Flow as default for User talk on kabwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305670 (https://phabricator.wikimedia.org/T140588) (owner: 10Mattflaschen) [13:42:15] !log dereckson@tin Synchronized php-1.28.0-wmf.15/extensions/UniversalLanguageSelector/resources/js/ext.uls.compactlinks.js: Apply toLowerCase when reading featured articles (T143527) (duration: 00m 50s) [13:42:16] T143527: nds-nl is not consistently lowercased in compact language links - https://phabricator.wikimedia.org/T143527 [13:42:16] aharoni: live in prod ^ [13:42:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:42:29] next is [wmf.15] 305773 Fix unknown constant AS_HOOK_ERROR issue in ProofreadPage (task T143471) [13:42:29] T143471: Couldn't find constant ProofreadPage::AS_HOOK_ERROR in ProofreadPage::onEditFilterMergedContent - https://phabricator.wikimedia.org/T143471 [13:42:44] matt_flaschen: get ready :} [13:43:19] !log dereckson@tin Synchronized php-1.28.0-wmf.15/extensions/ProofreadPage/ProofreadPage.body.php: Fix unknown constant AS_HOOK_ERROR issue in ProofreadPage (T143471) (duration: 00m 48s) [13:43:20] T143471: Couldn't find constant ProofreadPage::AS_HOOK_ERROR in ProofreadPage::onEditFilterMergedContent - https://phabricator.wikimedia.org/T143471 [13:43:22] hashar, ready. Does that mean "run the scripts now, you're next", or "get ready to run them"? [13:43:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:43:31] matt_flaschen: the later. Get ready to run them [13:43:47] Dereckson is finishing up the deployments of mw wmf.15 backports [13:43:51] I'm done. [13:44:20] !log stopping db1073 to clone compressed Innodb data to db2034 [13:44:21] Dereckson: have you done " [wmf.15] 305773 Fix unknown constant AS_HOOK_ERROR issue in ProofreadPage (task T143471)" ? [13:44:21] T143471: Couldn't find constant ProofreadPage::AS_HOOK_ERROR in ProofreadPage::onEditFilterMergedContent - https://phabricator.wikimedia.org/T143471 [13:44:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:44:33] hashar: yes, I have [13:44:35] I still don't see it live... but it's probably just waiting to really update. [13:44:37] \O/ [13:44:40] matt_flaschen: it is all your! [13:45:03] hashar: Dereckson Nikerabbit — tested, all fixed in production. Thanks for the support! [13:45:08] matt_flaschen: I did rebase https://gerrit.wikimedia.org/r/#/c/305670/ [13:45:09] You're welcome. [13:45:17] Thanks [13:45:19] aharoni: next time we will share a screen I guess and level you up :) [13:45:20] hashar, okay, starting now. [13:46:06] Dereckson: can you follow up on the WikidataPageBanner requiring the image to be given for some reason on T142963 ? [13:46:06] T142963: Please enable WikidataPageBanner on ro.wikivoyage - https://phabricator.wikimedia.org/T142963 [13:46:43] hashar: yes, I'll handle Phabricator reporting/resolve tasks in a few minutes. [13:47:08] Dereckson: and I have manually purged " https://ro.wikivoyage.org/wiki/Salzburg " [13:48:11] hashar: ack'ed [13:48:23] (03CR) 10Hashar: [C: 031] "Cluster ready :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305670 (https://phabricator.wikimedia.org/T140588) (owner: 10Mattflaschen) [13:48:55] matt_flaschen: and take your time. There are no other deployment slots following so we can overflow :} [13:49:45] addshore: I noticed a few: "Error: Couldn't find trailer dictionary" and "Error: Couldn't read xref table" might be related to pdf [13:49:47] Yeah, shouldn't take long, but I'm checking properly. [13:50:27] hashar: I also spotted them, not sure if they were to do with my patch though [13:59:08] (03CR) 10Ottomata: [C: 031] "Hm, I think this seems fine." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305969 (owner: 10Muehlenhoff) [14:02:47] (03PS12) 10Yuvipanda: [WIP] Introduce 'clush' module and toollabs role [puppet] - 10https://gerrit.wikimedia.org/r/305804 [14:05:15] (03PS2) 10Muehlenhoff: Kafka brokers: Limit access to production and fundraising networks [puppet] - 10https://gerrit.wikimedia.org/r/305969 [14:07:27] PROBLEM - puppet last run on mw2175 is CRITICAL: CRITICAL: puppet fail [14:12:04] hashar, ready for you to deploy the config. Thanks. [14:14:20] :) [14:14:29] (03CR) 10Hashar: [C: 032] Set Flow as default for User talk on kabwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305670 (https://phabricator.wikimedia.org/T140588) (owner: 10Mattflaschen) [14:14:55] matt_flaschen: https://gerrit.wikimedia.org/r/#/c/305670/ needs your CR-2 to be removed [14:15:04] (03CR) 10Ottomata: [C: 031] Kafka brokers: Limit access to production and fundraising networks [puppet] - 10https://gerrit.wikimedia.org/r/305969 (owner: 10Muehlenhoff) [14:15:51] (03CR) 10Mattflaschen: "Ready" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305670 (https://phabricator.wikimedia.org/T140588) (owner: 10Mattflaschen) [14:15:57] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Make elasticsearch configuration more robust to loss of network connectivity - https://phabricator.wikimedia.org/T143552#2571551 (10Gehel) [14:16:05] hashar, sorry, forgot about that, done. [14:16:24] (03CR) 10Hashar: Set Flow as default for User talk on kabwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305670 (https://phabricator.wikimedia.org/T140588) (owner: 10Mattflaschen) [14:16:27] (03CR) 10Hashar: [C: 032] Set Flow as default for User talk on kabwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305670 (https://phabricator.wikimedia.org/T140588) (owner: 10Mattflaschen) [14:16:57] (03Merged) 10jenkins-bot: Set Flow as default for User talk on kabwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305670 (https://phabricator.wikimedia.org/T140588) (owner: 10Mattflaschen) [14:18:17] matt_flaschen: deployed on mw1099 [14:21:04] matt_flaschen: that is supposed to make https://kab.wikipedia.org/ user talk page to be Flow isn't it ? [14:21:36] ah [14:21:45] https://kab.wikipedia.org/wiki/Amyannan_umsqedac:Hashar No such action [14:21:45] The page is being handled by Flow, but the Title class indicates that the content model is 'wikitext'. [14:22:05] rev_content_model is not populated yet maybe? [14:22:38] hashar, I ran the scripts. Maybe you visited after the script but before the config was deployed. [14:22:41] Let me look. [14:22:58] logged in /visited after I ran scap pull on mw1099 [14:23:58] hashar, https://kab.wikipedia.org/wiki/Amyannan_umsqedac:Totally_new_user_talk_page is fine on 1099. [14:24:10] that is a user without any talk page https://kab.wikipedia.org/wiki/Amyannan_umsqedac:CommonsDelinker [14:24:21] and in recent change [14:24:25] some bot / user welcomed me [14:24:29] (diff | hist) . . N Amyannan umsqedac:Hashar‎; 14:19 . . (+65)‎ . . ‎Loveless (talk | contribs)‎ (Ansuf !) [14:24:33] https://kab.wikipedia.org/w/index.php?title=Amyannan_umsqedac:Hashar&curid=11594&action=history [14:24:37] hashar, yeah, when you visit it auto-creates your talk page. [14:24:39] that yield the issue somehow [14:24:47] bah [14:24:50] that is crazy :( [14:25:33] https://kab.wikipedia.org/wiki/Amyannan_umsqedac:Totally_new_user_talk_page and https://kab.wikipedia.org/wiki/Amyannan_umsqedac:CommonsDelinker look fine (empty Flow). [14:25:37] matt_flaschen: I am probably the only one impacted. Other accounts looks fine [14:25:57] hashar, I think it might impact users going forward too. I need to check. [14:26:25] okk [14:26:34] but maybe it is "just" a race condition [14:26:52] let me know when I can scap it on all mw servers [14:29:57] hashar, it's not NewUserMessage (the extension for this). Looks to be a bot in this case: https://kab.wikipedia.org/w/index.php?title=Amyannan_umsqedac:Hashar&action=history . [14:30:16] yup [14:30:43] so maybe it created the page and rev_content_model hasn't been populated properly [14:31:40] (03PS15) 10BBlack: varnish: switch from libGeoIP to libmaxminddb [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [14:31:51] (03CR) 10BBlack: [C: 032 V: 032] varnish: switch from libGeoIP to libmaxminddb [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [14:32:04] heading to rest room brb [14:32:06] hashar, yeah, the bot created it as wikitext when it saw your new account, since it wasn't using 1099. [14:32:13] yup [14:32:27] PROBLEM - CirrusSearch codfw 95th percentile latency - more_like on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [2000.0] [14:32:36] gotta check whether some new account + bot welcome message would still trigger the same issue [14:32:43] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Make elasticsearch configuration more robust to loss of network connectivity - https://phabricator.wikimedia.org/T143552#2571597 (10dcausse) Would it make sense to tune our settings the same way we tune mysql? [14:34:46] RECOVERY - puppet last run on mw2175 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:36:03] !log text cache puppets disabled, cp1065 testing merged https://gerrit.wikimedia.org/r/253619 [14:36:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:36:37] RECOVERY - CirrusSearch codfw 95th percentile latency - more_like on graphite1001 is OK: OK: Less than 20.00% above the threshold [1200.0] [14:40:27] matt_flaschen: and also I have browsed via mw1099 but the bot was running on some other mw app not having the change yet. That might explain it [14:40:33] Right, that's what I meant. [14:40:44] lets generalize so ? [14:41:16] hashar, I think the bot welcome will fail after it's fully deployed, either because Flow doesn't support directApiEditing or because the bot doesn't have editcontentmodel (might depend whether it passes in a content model, but those should cover it I think). [14:41:23] hashar, yeah, let's go ahead. [14:41:35] I confirmed it doesn't have editcontentmodel. [14:41:51] Afterwards, I will test, and then ask that the bot be turned off on this wiki if it doesn't support Flow. [14:42:08] hashar, actually wait. [14:42:21] * hashar waits [14:42:40] OK, good. I still need to fix your page, got into the forest when I should have been on the tree. [14:44:19] hashar, sorry. Now you can go. Confirmed yours is fixed when viewing on mw1099. [14:44:56] matt_flaschen: I am pushing on the rest of mw servers so [14:45:05] Yep, thanks. [14:45:56] (When I ran the mwscript populateContentModel.php --wiki=kabwiki --ns=3 --table=revision yours was the only row affected this time) [14:46:04] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Set Flow as default for User talk on kabwiki - T140588 (duration: 00m 59s) [14:46:05] T140588: Enable Flow on all kab.wikipedia talk pages - https://phabricator.wikimedia.org/T140588 [14:46:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:46:32] matt_flaschen: awesome. Thank you for your patience :} [14:46:39] !log text caches: geoip testing looks good, re-enabling+running puppet for the rest [14:46:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:46:50] !log European swat completed 8/8 100% [14:46:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:46:59] hashar, let me test a new user now. [14:51:39] :} [14:51:45] (03PS8) 10BBlack: GeoIP VCL: re-set old IPv6 no-data cookies [puppet] - 10https://gerrit.wikimedia.org/r/305419 (https://phabricator.wikimedia.org/T99226) [14:51:53] (03CR) 10BBlack: [C: 032 V: 032] GeoIP VCL: re-set old IPv6 no-data cookies [puppet] - 10https://gerrit.wikimedia.org/r/305419 (https://phabricator.wikimedia.org/T99226) (owner: 10BBlack) [14:56:29] hashar, everything looks good. I can post where I should (https://kab.wikipedia.org/wiki/Amyannan_umsqedac:Hashar , https://kab.wikipedia.org/wiki/Amyannan_umsqedac:Test_2016-08-22_Flow , https://kab.wikipedia.org/wiki/Amyannan_umsqedac:Issimo_15 , https://kab.wikipedia.org/wiki/Amyannan_umsqedac:MSchottlender-WMF ), and the bot didn't do anything (neither a page nor a bogus notification on the recently created users). [14:57:32] 06Operations, 06Labs: grafana-labs.wikimedia.org doesn't reflect grafana-labs-admin.wikimedia.org - https://phabricator.wikimedia.org/T143556#2571642 (10yuvipanda) [14:58:08] matt_flaschen: excellent! thank you :) [14:58:27] matt_flaschen: possibly the bot dies out due to some API error it would have [14:58:32] will get fixed by the bot operator I guess [14:59:25] hashar, yes, it should either get "content model edit denied" or "no direct API editing". I'm posting to notify them right now. [14:59:40] matt_flaschen: excellent thank you! [15:01:32] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2561047 (10Papaul) @jcrespo I have 10x2TB 7.2k disks on site for spare that I can use to replace the faulty disk. [15:03:01] (03PS1) 10Andrew Bogott: Fix for incorrect quota check in Horizon [puppet] - 10https://gerrit.wikimedia.org/r/306004 (https://phabricator.wikimedia.org/T142379) [15:04:55] (03CR) 10Amire80: [C: 031] Deploy Compact Language Links out of beta for Tulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305997 (owner: 10KartikMistry) [15:05:07] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2571677 (10Papaul) a:03jcrespo Disk replacement complete. [15:06:16] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2571679 (10jcrespo) @Papaul, if you have 10, and you do not mind using 1 for this (not a priority server), just replacing the disk will be faster than rebuild the RAID! So procee... [15:06:26] (03CR) 10Rush: [C: 032] Fix for incorrect quota check in Horizon [puppet] - 10https://gerrit.wikimedia.org/r/306004 (https://phabricator.wikimedia.org/T142379) (owner: 10Andrew Bogott) [15:11:29] 06Operations, 10MediaWiki-extensions-CentralNotice, 10Traffic, 13Patch-For-Review: CN: Stop using the geoiplookup HTTPS service (always use the Cookie) - https://phabricator.wikimedia.org/T143271#2571686 (10BBlack) [15:11:32] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Traffic, 13Patch-For-Review: ULS GeoIP should not use meta.wm.o/geoiplookup - https://phabricator.wikimedia.org/T143270#2571687 (10BBlack) [15:11:41] 06Operations, 06MediaWiki-Stakeholders-Group, 10Traffic, 07Developer-notice, and 2 others: Get rid of geoiplookup service - https://phabricator.wikimedia.org/T100902#2571690 (10BBlack) [15:11:45] 06Operations, 10Fundraising-Backlog, 10Traffic, 13Patch-For-Review: Switch Varnish's GeoIP code to libmaxminddb/GeoIP2 - https://phabricator.wikimedia.org/T99226#2571684 (10BBlack) 05Open>03Resolved a:03BBlack [15:14:45] (03CR) 10Paladox: "Hi, this needs submitting or re c+2 please." [puppet] - 10https://gerrit.wikimedia.org/r/306004 (https://phabricator.wikimedia.org/T142379) (owner: 10Andrew Bogott) [15:23:26] (03CR) 10Jforrester: "This is Import, not Upload. Please check to make sure it's right in future to make the git logs more helpful. :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305777 (https://phabricator.wikimedia.org/T143397) (owner: 10Urbanecm) [15:24:22] (03CR) 10Urbanecm: "Terrible typo... Sorry! I'll double check the message in the future!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305777 (https://phabricator.wikimedia.org/T143397) (owner: 10Urbanecm) [15:25:21] 06Operations, 10DBA: Display lag on grafana, tendril and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master - https://phabricator.wikimedia.org/T141968#2571730 (10jcrespo) [15:26:06] 06Operations, 10DBA, 05Prometheus-metrics-monitoring: Decide storage backend for performance schema monitoring stats - https://phabricator.wikimedia.org/T119619#2571731 (10jcrespo) {F4385044} [15:28:18] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 07User-notice: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2571732 (10matmarex) All thumbnails on that page render for me. The oldest version of the file seems to be thumbnailed incorrectly, but it is nev... [15:28:47] 06Operations, 10DBA, 10Traffic, 06WMF-Legal, and 2 others: dbtree loads third party resources (from jquery.com and google.com) - https://phabricator.wikimedia.org/T96499#2571733 (10jcrespo) Grafana should substitute the graphing library {F4385044}. The only thing left is substituting code to generate a tre... [15:29:22] (03CR) 10Jforrester: "It happens. I've done worse. :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305777 (https://phabricator.wikimedia.org/T143397) (owner: 10Urbanecm) [15:37:25] 06Operations, 10Analytics: Implement stats_print in kafkatee - https://phabricator.wikimedia.org/T76345#2571781 (10Milimetric) 05Open>03declined [15:39:12] 06Operations, 10Traffic: High number of failed inbound TFO connections in esams Mon-Fri - https://phabricator.wikimedia.org/T143562#2571799 (10ema) [15:44:06] (03PS3) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/305996 (https://phabricator.wikimedia.org/T132529) [15:44:55] <_joe_> thcipriani: so, take a look here ^^ [15:45:13] <_joe_> I think we should do the same with dsh groups for scap3 repositories [15:45:19] 06Operations, 10Traffic: High number of failed inbound TFO connections in esams Mon-Fri - https://phabricator.wikimedia.org/T143562#2571842 (10ema) p:05Triage>03Normal [15:45:46] * thcipriani looks [15:45:59] <_joe_> thcipriani: derive dsh groups from conftool [15:47:17] 06Operations, 10Traffic: High number of failed inbound TFO connections in esams Mon-Fri - https://phabricator.wikimedia.org/T143562#2571799 (10BBlack) Perhaps this is a mobile carrier doing CGNAT that constantly flips souce IPs for TCP traffic from the same phones, thus constantly breaking otherwise-valid rece... [15:49:10] (03PS1) 10Hashar: clouseau: fix setup.py / add tox with flake8 [software] - 10https://gerrit.wikimedia.org/r/306010 (https://phabricator.wikimedia.org/T143559) [15:50:47] _joe_: nice! seems like it should work for beta etc, just have to not define conftool in hieradata, correct? [15:50:47] (03PS3) 10Muehlenhoff: Provide override file for base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/305635 [15:52:00] <_joe_> thcipriani: yes, or we could create a single-node etcd cluster in beta [15:52:05] <_joe_> and use it for conftool [15:53:09] we've been talking more about host pool/depooling via scap, would be nice to have a place to test things with that. [15:53:46] there is a beta deployment-conf(?) host already I think. It's been a bit since I've looked at conftool on beta. [15:54:05] <_joe_> brb [15:54:10] I'll show this to the folk in the deployment cabal meeting today, see if they have additional thoughts, lgtm though. [15:56:19] 06Operations, 10ops-codfw, 06Discovery: rack/setup/deploy wqds200[12] - https://phabricator.wikimedia.org/T142864#2571926 (10Papaul) [15:58:50] the mediawiki errors from db2034 should have stopped now, although the maintenance is still in progress [15:59:13] (03PS4) 10Muehlenhoff: Provide override file for base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/305635 [16:01:31] 06Operations, 06Labs: grafana-labs.wikimedia.org doesn't reflect grafana-labs-admin.wikimedia.org - https://phabricator.wikimedia.org/T143556#2571946 (10fgiunchedi) a:03fgiunchedi [16:14:20] (03CR) 10BryanDavis: [C: 04-1] "memcached needs to be reconciled with openstack::horizon::service" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [16:19:37] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:19:40] (03CR) 10Ottomata: [C: 031] Disable unprivileged user namespaces on trusty systems [puppet] - 10https://gerrit.wikimedia.org/r/304474 (https://phabricator.wikimedia.org/T142567) (owner: 10Muehlenhoff) [16:21:06] !log installing wezen new syslog server [16:21:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:31:22] (03PS19) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [16:32:47] (03CR) 10Dereckson: [C: 031] "Ready for deployment. Requires https://gerrit.wikimedia.org/r/#/c/304973/2/TranslateHooks.php which has already been merged." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305801 (https://phabricator.wikimedia.org/T143073) (owner: 10Glaisher) [16:32:50] 06Operations, 10Traffic: High number of failed inbound TFO connections in esams Mon-Fri - https://phabricator.wikimedia.org/T143562#2572034 (10ema) From https://www1.icsi.berkeley.edu/~barath/papers/tfo-conext11.pdf section 4.3: > some carrier-grade NAT configurations use different public IP addresses for new... [16:35:48] 06Operations, 10Traffic: Stop using persistent storage in our backend varnish layers. - https://phabricator.wikimedia.org/T142848#2572068 (10BBlack) I should add another overall point here about TTLs: One key thing that will become unblocked post-Varnish4 (so, early CY2017) is reducing our normal (other than... [16:35:53] (03CR) 10BryanDavis: "PS19 fixes a potential conflict with configuration of local memcached on californium from role::horizon." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [16:41:59] 06Operations, 07HHVM: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2571031 (10Papaul) @Joe I can help with the re-image [16:49:51] 06Operations, 10ops-eqiad, 06DC-Ops, 10Traffic, and 2 others: rack/setup new eqiad lvs machines - https://phabricator.wikimedia.org/T104458#2572147 (10BBlack) [16:49:54] 06Operations, 10ops-eqiad, 06DC-Ops, 10netops: asw-d-eqiad SNMP failures - https://phabricator.wikimedia.org/T112781#2572145 (10BBlack) 05Open>03stalled According to etherpad today, this is "Pending 10G migration with new hardware" (for Row D in eqiad, I think). [16:56:22] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Check that elasticsearch actually uses shard allocation awareness - https://phabricator.wikimedia.org/T143571#2572176 (10Gehel) [16:56:37] (03PS1) 10Glaisher: Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) [16:57:04] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1013-a.eqiad.wmnet [16:57:05] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [16:57:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:59:15] (03PS2) 10Glaisher: Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) [17:00:04] gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160822T1700). [17:00:04] SMalyshev: A patch you scheduled for Weekly Wikidata query service deployment window is about to be deployed. Please be available during the process. [17:00:12] jouncebot: o/ [17:00:29] (03CR) 10Kaldari: [C: 04-1] "Holding off for now, per Aaron's feedback. May need to switch the hook we're using..." [puppet] - 10https://gerrit.wikimedia.org/r/305871 (owner: 10Kaldari) [17:00:36] (03PS5) 10Gehel: Make Updater proper service [puppet] - 10https://gerrit.wikimedia.org/r/303626 (https://phabricator.wikimedia.org/T116754) (owner: 10Smalyshev) [17:01:58] (03CR) 10Gehel: [C: 032] Make Updater proper service [puppet] - 10https://gerrit.wikimedia.org/r/303626 (https://phabricator.wikimedia.org/T116754) (owner: 10Smalyshev) [17:07:02] (03PS1) 10Madhuvishy: nfs: Mount scratch from labstore1001 on different mount path [puppet] - 10https://gerrit.wikimedia.org/r/306019 [17:08:13] (03CR) 10jenkins-bot: [V: 04-1] nfs: Mount scratch from labstore1001 on different mount path [puppet] - 10https://gerrit.wikimedia.org/r/306019 (owner: 10Madhuvishy) [17:08:28] (03PS2) 10Madhuvishy: nfs: Mount scratch from labstore1001 on different mount path [puppet] - 10https://gerrit.wikimedia.org/r/306019 [17:09:01] !log deplyoing latest GUI + updater version on wdqs100? servers [17:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:09:32] (03CR) 10jenkins-bot: [V: 04-1] nfs: Mount scratch from labstore1001 on different mount path [puppet] - 10https://gerrit.wikimedia.org/r/306019 (owner: 10Madhuvishy) [17:11:54] (03PS3) 10Madhuvishy: nfs: Mount scratch from labstore1001 on different mount path [puppet] - 10https://gerrit.wikimedia.org/r/306019 [17:13:20] SMalyshev: deployment completed, test queries OK, GUI looks good, nothing suspicious in the logs. Feel free to do additional testing! [17:13:41] gehel: thanks! [17:14:22] (03PS8) 10Paladox: Add gbp.conf file for debian [debs/gerrit] - 10https://gerrit.wikimedia.org/r/301841 [17:16:07] 06Operations, 10Traffic: Sort query parameters on urls - https://phabricator.wikimedia.org/T143574#2572294 (10Jhernandez) [17:18:15] (03PS4) 10Paladox: Bring back ostriches (Chad) change with no "" [puppet] - 10https://gerrit.wikimedia.org/r/304977 [17:18:48] (03CR) 10Paladox: [C: 031] Gerrit: make auth_type configurable for labs [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [17:18:56] (03PS9) 10Paladox: Gerrit: make auth_type configurable for labs [puppet] - 10https://gerrit.wikimedia.org/r/303355 [17:19:06] (03CR) 10Paladox: Gerrit: make auth_type configurable for labs [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [17:19:08] (03CR) 10Chad: "Can you use a more descriptive commit message? Also preferably one that doesn't include my IRC nick so I'm not getting pinged senselessly " [puppet] - 10https://gerrit.wikimedia.org/r/304977 (owner: 10Paladox) [17:21:21] 06Operations, 06Labs, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572313 (10chasemp) @madhuvishy has been formalizing our logic for depooling/pooling grid exec nodes and so with T140483 resolved we hope to rolling this out without rebooted. 0. stage cha... [17:21:26] 06Operations, 10Traffic: Sort query parameters on urls - https://phabricator.wikimedia.org/T143574#2572317 (10Jhernandez) I'm out of my depths here, so sorry if it is a stupid question. I'm interested to understand why we're not doing this already. The why is that there's an interest in sending more traffic t... [17:22:23] (03PS5) 10Paladox: Gerrit: Minor config tidying to avoid puppet/init inconsistencies [puppet] - 10https://gerrit.wikimedia.org/r/304977 [17:23:10] (03CR) 10Paladox: "ok @Chad done :)" [puppet] - 10https://gerrit.wikimedia.org/r/304977 (owner: 10Paladox) [17:23:47] (03PS4) 10Madhuvishy: nfs: Mount scratch from labstore1001 on different mount path [puppet] - 10https://gerrit.wikimedia.org/r/306019 (https://phabricator.wikimedia.org/T134896) [17:32:44] (03CR) 10Dduvall: "> Sorry, I do not have a context for this, can you provide the" [puppet] - 10https://gerrit.wikimedia.org/r/305737 (owner: 10Dduvall) [17:33:19] (03Abandoned) 10Dduvall: labs: Allow LVM mount permissions to remain unmanaged [puppet] - 10https://gerrit.wikimedia.org/r/305737 (owner: 10Dduvall) [17:33:31] 06Operations, 10ops-eqiad, 10Elasticsearch, 13Patch-For-Review: Reclaim nobelium - https://phabricator.wikimedia.org/T142581#2572336 (10debt) Removing the Discovery tags - I believe we're done with our portion. [17:35:37] 06Operations, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint, 13Patch-For-Review: Elasticsearch logs are not send to logstash after 2.3.3 upgrade - https://phabricator.wikimedia.org/T136696#2572338 (10Gehel) Current patch isn't working. Moving this to backlog until I get time to dig more into it. [17:35:39] !log Restart Cassandra instances to apply updated certificates, restbase-test2001.codfw.wmnet [17:35:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:36:25] (03PS1) 10Madhuvishy: tools: Add script that helps manage sge exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/306025 (https://phabricator.wikimedia.org/T134896) [17:36:37] PROBLEM - puppet last run on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:38:42] 06Operations, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint, 13Patch-For-Review: Elasticsearch logs are not send to logstash after 2.3.3 upgrade - https://phabricator.wikimedia.org/T136696#2572364 (10debt) p:05Triage>03Normal [17:38:47] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 21 minutes ago with 0 failures [17:38:51] (03PS2) 10Muehlenhoff: contint::firewall: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/301627 [17:39:27] 06Operations, 06Labs, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572367 (10yuvipanda) You can do the same for k8s. You can depool https://wikitech.wikimedia.org/wiki/Tools_Kubernetes#Depooling_a_node, do your thing, repool. That will work for all the k8... [17:39:31] (03CR) 10Madhuvishy: [C: 04-1] "Do not merge right now (This patch is part of /data/scratch migration to labstore1003)" [puppet] - 10https://gerrit.wikimedia.org/r/306019 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [17:39:39] (03CR) 10Dduvall: "> Probably this is the cause for 305737 (see my comment there why I" [puppet] - 10https://gerrit.wikimedia.org/r/305668 (https://phabricator.wikimedia.org/T138778) (owner: 10Dduvall) [17:42:25] 06Operations, 06Labs, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572382 (10madhuvishy) Yeah I'm familiar with doing this for k8s worker nodes - did this a bunch of times while helping @yuvipanda recreate worker nodes couple weeks ago. [17:44:28] ticket "Install and use JDK 8 for Android CI testing" is resolved, but gerrit change "contint: Java 8 on Jessie slaves" is not merged. can both be true? [17:45:22] mutante: probably a question for the task not a random comment in -operations [17:46:31] (03PS1) 10Muehlenhoff: wdqs: Limit to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/306027 [17:46:46] greg-g: already done [17:46:57] task and gerrit [17:47:35] 06Operations, 10Ops-Access-Requests: Access to people.wikimedia.org for Volker_E - https://phabricator.wikimedia.org/T143465#2572432 (10Andrew) [17:48:07] 06Operations, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2572434 (10Gehel) @dcausse, @EBernhardson: It does not look like we can remove those fields after the fact. It looks to me like our options are... [17:48:28] !log cassandra: replace certs for restbase-test200[123]-[ab] - T120662 [17:48:29] T120662: Track/alert cassandra certs expiration - https://phabricator.wikimedia.org/T120662 [17:48:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:50:52] (03PS1) 10Ladsgroup: Enable ORES review tool for English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306028 (https://phabricator.wikimedia.org/T140003) [17:51:26] PROBLEM - puppet last run on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:51:29] 10Ops-Access-Reviews: basion/rutherfordium access for Volker_E - https://phabricator.wikimedia.org/T143579#2572462 (10Andrew) [18:00:04] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160822T1800). Please do the needful. [18:00:04] Amir1: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [18:00:16] Hey! [18:01:14] I can SWAT [18:01:29] thcipriani: thanks :) [18:03:30] Amir1: how long do you think PopulateDatabase will take for enwiki? [18:03:51] the same like others [18:03:52] because the number of revs is limited to 5K [18:04:01] ah, gotcha [18:04:03] it would be great to run it several times later [18:04:09] !log Restarting Cassandra instances to apply new TLS cert restbase-test2001.codfw.wmnet (for reals this time) [18:04:13] but not a big deal [18:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:04:16] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306028 (https://phabricator.wikimedia.org/T140003) (owner: 10Ladsgroup) [18:04:38] (03PS6) 10Ppchelko: ChangeProp: Update config for the new driver [puppet] - 10https://gerrit.wikimedia.org/r/305414 [18:04:41] (03Merged) 10jenkins-bot: Enable ORES review tool for English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306028 (https://phabricator.wikimedia.org/T140003) (owner: 10Ladsgroup) [18:05:09] (03CR) 10Ppchelko: ChangeProp: Update config for the new driver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305414 (owner: 10Ppchelko) [18:08:46] Amir1: running populatedatabase now [18:08:56] yess [18:09:14] Amir1: populatedatabase still running, but change live on mw1099 [18:09:20] kk [18:10:38] (03CR) 10Jcrespo: "So, the reason why we do not do that in production is because if we run out of temporary space, things like replication will break; which " [puppet] - 10https://gerrit.wikimedia.org/r/305668 (https://phabricator.wikimedia.org/T138778) (owner: 10Dduvall) [18:11:49] Amir1: seeing a lot of stuff like: Syntax Warning: Failed parsing page 1 using hint tables [18:12:18] Can you send a pic? [18:12:33] some is okay because the revision is not publicly visible [18:12:33] it's in the logs [18:12:44] Syntax Warning: Failed to get object num from hint tables for page 4 [18:13:37] !log Restarting Cassandra instances to apply new TLS cert restbase-test2002.codfw.wmnet [18:13:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:13:47] (03CR) 10Smalyshev: "I don't know what's the difference between the two really, but if it looks good to others I have no objections." [puppet] - 10https://gerrit.wikimedia.org/r/306027 (owner: 10Muehlenhoff) [18:14:19] * Amir1 searching in logstash [18:14:39] Amir1: check https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor [18:14:54] !log Restarting Cassandra instances to apply new TLS cert restbase-test2003.codfw.wmnet [18:14:55] (03CR) 10Smalyshev: "Adding Addshore as the author of port 8888 part." [puppet] - 10https://gerrit.wikimedia.org/r/306027 (owner: 10Muehlenhoff) [18:14:56] unclear what this is from since I haven't actually deployed anything yet [18:14:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:15:13] oh, much easier [18:16:50] (03CR) 10Addshore: [C: 031] wdqs: Limit to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/306027 (owner: 10Muehlenhoff) [18:16:58] thcipriani: they happen in mw1295 [18:17:05] definitely not related to ores [18:17:07] I see no issues there SMalyshev :) [18:17:16] and some other nodes [18:17:17] (03PS1) 10Filippo Giunchedi: install_server: use GPT for wezen [puppet] - 10https://gerrit.wikimedia.org/r/306030 (https://phabricator.wikimedia.org/T143146) [18:17:29] But I still can't see where they are coming from [18:17:47] (03PS2) 10Filippo Giunchedi: install_server: use GPT for wezen [puppet] - 10https://gerrit.wikimedia.org/r/306030 (https://phabricator.wikimedia.org/T143146) [18:18:11] addshore: ok, please also add on gerrit, so whoever will be merging it knows you're ok with it :) [18:18:18] (03CR) 10Smalyshev: [C: 031] wdqs: Limit to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/306027 (owner: 10Muehlenhoff) [18:18:22] +1ed it :) [18:18:26] thanks! [18:18:31] Amir1: ok, populatedatabase is done, doublecheck mw1099 [18:18:33] please :) [18:18:43] sure [18:18:58] okay, done [18:19:05] It works as epxpected [18:19:09] *expected [18:19:15] (03PS1) 10Hashar: Wrapper to invoke clouseau tox from root dir [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) [18:19:17] (03PS1) 10Hashar: Add flake8 at root of repo [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) [18:19:37] Amir1: ack, ok, rolling everywhere [18:20:40] (03CR) 10Filippo Giunchedi: [C: 032] install_server: use GPT for wezen [puppet] - 10https://gerrit.wikimedia.org/r/306030 (https://phabricator.wikimedia.org/T143146) (owner: 10Filippo Giunchedi) [18:21:28] \o/ [18:22:19] 06Operations, 06Services, 10Wikimedia-Logstash: Kibana / logstash dashboards timing out consistently since Kibana upgrade - https://phabricator.wikimedia.org/T141384#2572776 (10EBernhardson) for completeness, i've double checked a few days of indices and we are now seeing around 200 fields per day for restba... [18:22:33] papaul: ^ done [18:23:26] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1010-a.eqiad.wmnet [18:23:27] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:23:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:23:40] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:306028|Enable ORES review tool for English Wikipedia (T140003)]] (duration: 01m 04s) [18:23:41] T140003: Deploy ORES review tool in English Wikipedia - https://phabricator.wikimedia.org/T140003 [18:23:43] ^ Amir1 live everywhere [18:23:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:23:52] * Amir1 dances a little [18:23:57] 06Operations, 06Services, 10Wikimedia-Logstash: Kibana / logstash dashboards timing out consistently since Kibana upgrade - https://phabricator.wikimedia.org/T141384#2572794 (10GWicke) Thank you for your help in figuring this one out, @EBernhardson & @bd808! [18:24:07] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1011-a.eqiad.wmnet [18:24:08] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:24:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:24:47] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1012-a.eqiad.wmnet [18:24:48] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:24:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:25:24] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1009-a.eqiad.wmnet [18:25:25] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:25:29] My first rollback using ores in en.wp https://en.wikipedia.org/w/index.php?title=Template:HTML&diff=735727055&oldid=735727011 [18:25:49] mwhahaha [18:26:00] powerless vandals [18:26:01] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1014-a.eqiad.wmnet [18:26:02] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:26:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:26:26] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1015-a.eqiad.wmnet [18:26:27] godog: ok thank you [18:26:27] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:29:29] !log deploying eventlogging eventbus and a topic config patch, will depool each node as i do and check that all is well [18:29:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:29:53] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase2008-a.codfw.wmnet [18:29:54] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:29:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:30:25] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase2009-a.codfw.wmnet [18:30:26] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:30:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:30:48] (03PS1) 10Madhuvishy: labstore - Clarify drbd resource initialization docs [puppet] - 10https://gerrit.wikimedia.org/r/306036 [18:31:30] 06Operations, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2572829 (10EBernhardson) @gehel correct that we can't remove fields after the fact, dump and reload is basically the only option elasticsearch... [18:32:07] !log otto@palladium conftool action : set/pooled=no; selector: kafka2001.eqiad.wmnet [18:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:32:45] !log otto@palladium conftool action : set/pooled=no; selector: kafka2001.codfw.wmnet [18:32:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:34:45] 06Operations, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2572832 (10Gehel) Restarts during the 2.3.4 upgrade also hang. I did not dig deep enough, but I think we do have crons to do those merges regul... [18:35:36] !log otto@palladium conftool action : set/pooled=yes; selector: kafka2001.codfw.wmnet [18:35:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:36:10] !log otto@palladium conftool action : set/pooled=no; selector: kafka2002.codfw.wmnet [18:36:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:39:04] !log otto@palladium conftool action : set/pooled=yes; selector: kafka2002.codfw.wmnet [18:39:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:39:41] !log otto@palladium conftool action : set/pooled=no; selector: kafka1001.eqiad.wmnet [18:39:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:44:36] !log otto@palladium conftool action : set/pooled=yes; selector: kafka1001.eqiad.wmnet [18:44:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:45:01] !log otto@palladium conftool action : set/pooled=no; selector: kafka1002.eqiad.wmnet [18:45:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:47:11] !log otto@palladium conftool action : set/pooled=yes; selector: kafka1002.eqiad.wmnet [18:47:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:47:57] (03PS2) 10Andrew Bogott: Fix for incorrect quota check in Horizon [puppet] - 10https://gerrit.wikimedia.org/r/306004 (https://phabricator.wikimedia.org/T142379) [18:51:47] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:53:17] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 646 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4892878 keys - replication_delay is 646 [18:53:57] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [18:56:30] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations, 13Patch-For-Review: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2572914 (10DarTar) Thanks @RobH [19:02:57] (03CR) 10Dduvall: "> So, the reason why we do not do that in production is because if we" [puppet] - 10https://gerrit.wikimedia.org/r/305668 (https://phabricator.wikimedia.org/T138778) (owner: 10Dduvall) [19:06:07] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4851015 keys - replication_delay is 0 [19:12:57] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:15:56] Nikerabbit: whitelisted [19:16:57] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [19:16:59] Platonides: but the ban is still set? [19:17:20] argh [19:17:28] hmm [19:17:28] no [19:17:39] someone must have removed it already [19:17:49] -!- 51 - #wikimedia-operations: ban *!*@kartik.lustfield.net [by Platonides, 49808 secs ago] [19:17:53] I had removed it from another channel, but not from here [19:17:58] strange [19:18:01] ahg [19:18:09] I was doing: /mode #wikimedia-dev -b *!*@kartik.lustfield.net [19:18:27] thanks [19:20:02] feel free to feel free to deopme and unban whoever got affected [19:20:35] in the future [19:22:43] (03PS1) 10Rush: nfs-mount-manager: --target is not universally supported for mount [puppet] - 10https://gerrit.wikimedia.org/r/306041 (https://phabricator.wikimedia.org/T134896) [19:24:48] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Te [19:25:09] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:25:20] (03CR) 10Rush: [C: 032] nfs-mount-manager: --target is not universally supported for mount [puppet] - 10https://gerrit.wikimedia.org/r/306041 (https://phabricator.wikimedia.org/T134896) (owner: 10Rush) [19:26:57] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [19:27:07] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [19:32:45] (03PS1) 10Rush: nova-computer: add virt-top package [puppet] - 10https://gerrit.wikimedia.org/r/306043 [19:34:59] (03CR) 10Andrew Bogott: [C: 032] nova-computer: add virt-top package [puppet] - 10https://gerrit.wikimedia.org/r/306043 (owner: 10Rush) [19:36:55] 06Operations, 10Cassandra: Address abnormally wide partitions - https://phabricator.wikimedia.org/T143056#2573028 (10Eevans) [19:44:53] 06Operations, 10Traffic: Sort query parameters on urls - https://phabricator.wikimedia.org/T143574#2573043 (10BBlack) [19:44:55] 06Operations, 10MediaWiki-General-or-Unknown, 06Services, 10Traffic: Investigate query parameter normalization for MW/services - https://phabricator.wikimedia.org/T138093#2573041 (10BBlack) [19:45:07] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [19:45:43] Platonides: I don't have op here as far as I know [19:47:08] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:49:48] you should :P [19:50:07] 06Operations, 10MediaWiki-General-or-Unknown, 06Services, 10Traffic: Investigate query parameter normalization for MW/services - https://phabricator.wikimedia.org/T138093#2573059 (10BBlack) The default-parameter problem probably deserves it's own separate ticket and solution. Except for egregious legacy c... [19:55:57] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200) [19:58:07] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [20:00:04] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160822T2000). [20:05:34] !log starting parsoid deploy [20:05:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:08:13] !log synced new parsoid code; restarted parsoid on wtp1001 as a canary [20:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:08:53] !log wezen - signing puppet certs, salt-key, initial run [20:08:57] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:08:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:10:55] 06Operations, 10ops-codfw, 13Patch-For-Review: rack/setup/deploy wezen (codfw syslog) - https://phabricator.wikimedia.org/T143146#2573135 (10Papaul) [20:10:57] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [20:13:15] !log finished deploying parsoid sha df53a991 [20:13:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:18:58] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200) [20:20:55] Any root around who can nuke bromine:/home/demon/build/ for me? Files in there got stolen by Chris at some point :P [20:21:52] sec [20:22:24] you own them now, toss what you like, ostriches [20:22:29] thx! [20:22:31] yw [20:22:58] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [20:25:12] 06Operations, 10ops-codfw, 13Patch-For-Review: rack/setup/deploy wezen (codfw syslog) - https://phabricator.wikimedia.org/T143146#2573182 (10Papaul) [20:27:03] 06Operations, 10ops-codfw, 13Patch-For-Review: rack/setup/deploy wezen (codfw syslog) - https://phabricator.wikimedia.org/T143146#2558409 (10Papaul) a:05Papaul>03fgiunchedi @fgiunchedi The installation is complete, I am assigning you the task for service implementation . Thanks [20:27:37] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Te [20:29:46] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [20:49:49] (03PS1) 10BryanDavis: logstash: Tag Striker messages for indexing [puppet] - 10https://gerrit.wikimedia.org/r/306055 [20:50:26] 06Operations, 07HHVM: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2571031 (10greg) Out of curiosity (and so we know what to do with MW-Vagrant) What's the timeline for this? [20:54:48] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [20:56:48] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4850868 keys - replication_delay is 0 [20:56:54] (03PS2) 10BryanDavis: logstash: Tag Striker messages for indexing [puppet] - 10https://gerrit.wikimedia.org/r/306055 (https://phabricator.wikimedia.org/T143172) [20:57:06] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for wqds200[12] Bug:T142864 [dns] - 10https://gerrit.wikimedia.org/r/306056 (https://phabricator.wikimedia.org/T142864) [20:59:57] (03PS5) 10Madhuvishy: nfs: Modify /data/scratch on nfs clients to point to mount from labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/306019 (https://phabricator.wikimedia.org/T134896) [21:00:04] dapatrick and bawolff: Respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160822T2100). Please do the needful. [21:01:47] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200) [21:03:38] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [21:05:49] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 36 probes of 424 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [21:08:33] 06Operations, 10ops-codfw, 06Discovery: codfw: rack/setup/deploy wqds200[12]switch configuration - https://phabricator.wikimedia.org/T143613#2573416 (10Papaul) [21:09:26] 06Operations, 10ops-codfw, 06Discovery: rack/setup/deploy wqds200[12] - https://phabricator.wikimedia.org/T142864#2549225 (10Papaul) [21:17:09] PROBLEM - MegaRAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:17:58] PROBLEM - configured eth on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:19:07] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200) [21:19:08] !log Restarting restbase staging in codfw [21:19:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:19:17] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200) [21:19:17] RECOVERY - MegaRAID on install2001 is OK: OK: no disks configured for RAID [21:19:26] (03PS2) 10Madhuvishy: tools: Add script that helps manage sge exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/306025 (https://phabricator.wikimedia.org/T134896) [21:19:58] RECOVERY - configured eth on install2001 is OK: OK - interfaces up [21:21:07] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [21:21:17] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [21:23:58] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 0 probes of 425 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [21:27:45] (03CR) 10Rush: [C: 032] labstore - Clarify drbd resource initialization docs [puppet] - 10https://gerrit.wikimedia.org/r/306036 (owner: 10Madhuvishy) [21:28:26] (03PS2) 10Madhuvishy: labstore - Clarify drbd resource initialization docs [puppet] - 10https://gerrit.wikimedia.org/r/306036 [21:28:35] (03CR) 10Madhuvishy: [V: 032] labstore - Clarify drbd resource initialization docs [puppet] - 10https://gerrit.wikimedia.org/r/306036 (owner: 10Madhuvishy) [21:34:08] (03PS9) 10Dduvall: beta: Create and mount LVM volumes for mariadb [puppet] - 10https://gerrit.wikimedia.org/r/305668 (https://phabricator.wikimedia.org/T138778) [21:34:18] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 42 probes of 425 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [21:40:19] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 1 probes of 425 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [21:42:04] 06Operations, 10ops-eqiad, 10Elasticsearch, 13Patch-For-Review: Reclaim nobelium - https://phabricator.wikimedia.org/T142581#2573639 (10debt) [21:53:12] (03PS1) 10BBlack: text VCL: 403 geoiplookup w/o referer [puppet] - 10https://gerrit.wikimedia.org/r/306065 (https://phabricator.wikimedia.org/T100902) [21:55:51] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002/stat1004 for Jdlrobson - https://phabricator.wikimedia.org/T141811#2573736 (10Jdlrobson) Thanks @RobH and everyone involved. I can confirm that I have access now :) [22:01:04] (03PS3) 10Rush: tools: mount scratch on labstore1003 as well [puppet] - 10https://gerrit.wikimedia.org/r/305657 (https://phabricator.wikimedia.org/T134896) [22:02:57] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 32 probes of 425 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [22:08:58] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 0 probes of 425 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [22:11:59] (03CR) 10Alex Monk: "Isn't this going to break kafka brokers in labs?" [puppet] - 10https://gerrit.wikimedia.org/r/305969 (owner: 10Muehlenhoff) [22:15:36] (03CR) 10Madhuvishy: [C: 032] tools: mount scratch on labstore1003 as well [puppet] - 10https://gerrit.wikimedia.org/r/305657 (https://phabricator.wikimedia.org/T134896) (owner: 10Rush) [22:19:43] 06Operations, 10Traffic, 13Patch-For-Review: Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#2573813 (10BBlack) >>! In T107430#2534974, @Krinkle wrote: >>>! In T107430#2520799, @Krinkle wrote: >> The Commons app for Android (previously by Wikimedia, now community-maintained) als... [22:20:18] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [22:24:13] Platonides: ping? [22:24:17] pong [22:24:29] what did I break today? [22:24:31] Platonides: you could schedule https://gerrit.wikimedia.org/r/#/c/305596/ to the SWAT in 35 minutes [22:24:59] how to schedule it? [22:25:22] add it to the table at https://wikitech.wikimedia.org/wiki/Deployments#Week_of_August_22nd and be present at the deployment window to test it [22:25:37] longer version and documentation is at https://wikitech.wikimedia.org/wiki/SWAT_deploys [22:27:04] not sure if I'll be around [22:29:34] there are windows sonner at 15h and 20h UTC too [22:30:53] (if you can be present in half an our, but not for too long, we can also deploy your fix first) [22:40:15] maybe… [22:40:34] ALVARO MOLINA ES TONTO [22:40:35] ALVARO MOLINA ES TONTO [22:40:37] ALVARO MOLINA ES TONTO [22:40:40] LALALLALA [22:40:44] MIERDA NOOOOOOOO [22:42:56] thanks, Platonides [22:43:31] !log Run maintenance ssript namespaceDupes.php on azwiktionary (T143580) [22:43:31] T143580: Namespace problem on Azerbaijani Wiktionary - https://phabricator.wikimedia.org/T143580 [22:43:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:43:49] np ori [22:44:48] It's annoying, namespaceDupes.php --merge gives me a db error occuring in LinksDeletionUpdate->doUpdate() LinksUpdate::acquirePageLock(DBConnRef, integer) [22:45:07] he has been doing it on many channels [22:45:32] (this script --merged option worked well until now) [22:46:14] DatabaseBase::getScopedLockAndFlush: Flushing an explicit transaction, getting out of sync! [22:50:17] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:51:41] arg [22:51:43] commit 63a3911a67507731695bad3188f486219a563b7d [22:51:55] Improvements to RefreshLinksJob/DeleteLinksJob locking [22:52:17] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [22:52:29] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 2.59 ms [22:53:47] AaronSchulz: it seems your last commit could have broken an option of the namespaceDupes maintenance script, see T143631 [22:53:47] T143631: namespaceDupes.php --merge can throw a DBUnexpectedError DatabaseBase::{closure}: Flushing an explicit transaction, getting out of sync! exception - https://phabricator.wikimedia.org/T143631 [22:54:48] (03PS4) 10Thcipriani: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/305996 (https://phabricator.wikimedia.org/T132529) (owner: 10Giuseppe Lavagetto) [22:54:50] (03PS1) 10Thcipriani: Beta Scap: dsh groups in hieradata [puppet] - 10https://gerrit.wikimedia.org/r/306070 [22:56:03] (03CR) 10Thcipriani: [C: 031] "LGTM. Added one inline comment + a dependent patch set to keep beta happy (cherry-picked there)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305996 (https://phabricator.wikimedia.org/T132529) (owner: 10Giuseppe Lavagetto) [23:00:04] RoanKattouw, ostriches, MaxSem, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160822T2300). [23:00:04] Dereckson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:36] Platonides: so, you want we deploy it now? [23:00:42] ok [23:02:10] You've already used https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug? If not, you've an extension to install (one is available for Chrome, one for Firofox) to be able to ask your browser to add an header instructing the load balancer to send your request to a staging server, mw1099 [23:03:06] oh localisation :/ [23:03:30] I know the header [23:03:34] but not have it installed [23:03:50] Ah, l10n will be fine, no new key. [23:04:11] (and probably already updated from master cherry-picking last night by l10nupdate) [23:04:31] localisation? [23:04:39] Hola [23:04:49] vengo a pedir ayuda [23:05:50] Platonides: oh, sorry, I was looking another patch [23:06:02] un idiota en #wikimedia-ayuda me baneo sin motivo.. ayuda [23:06:02] your cahnge doesn't have l10n change [23:06:10] hi wpayuda64628, you want #wikimedia-ops [23:06:16] solo queria aprender a editar en wikinoticias [23:06:30] this channel is for Wikimedia operations (servers stuff) [23:06:39] Dereckson: it's the trol [23:06:39] me baneo alli un tipo llamado Matiia [23:06:43] Platonides: ah [23:06:46] ayudame [23:06:56] Y no hagas caso a platonides [23:08:27] it doesn't install… [23:08:47] platonicos [23:08:50] te amo [23:08:58] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:09:18] Platonides, tal vez +b $j:#wikipedia-es-ops? [23:09:20] Platonides: live on mw1099 [23:09:46] Dereckson: this extension doesn't install [23:09:55] it remains in "checking…" [23:10:04] on Chrome or Firefox? [23:10:13] (03PS2) 10Dereckson: Fix user namespaces on Slovak Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305785 (https://phabricator.wikimedia.org/T143472) [23:10:39] Chrome [23:10:49] I alredy closed and reopened it [23:11:07] maybe with a new profile… [23:11:34] worked [23:11:50] good, so you can test your change on mw1099 :) [23:12:04] (beware, it selects mw1017 by default) [23:12:04] yeo [23:12:35] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305785 (https://phabricator.wikimedia.org/T143472) (owner: 10Dereckson) [23:12:36] https://es.wikipedia.org/wiki/Especial:FiltroAntiAbusos/history/9 fails on live [23:12:50] and fails in 1099 :S [23:13:01] (03Merged) 10jenkins-bot: Fix user namespaces on Slovak Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305785 (https://phabricator.wikimedia.org/T143472) (owner: 10Dereckson) [23:13:31] no error in the logs [23:14:37] it's in mw1099? [23:14:58] I've just checked files checksum, yes, your change has been deployed. On both Tin and mw1099 I've 68226f76d93b17c31c74ba8cac03ab30 Views/AbuseFilterViewDiff.php [23:16:16] 305785 live on mw1099 too [23:17:06] I do have abusefilter-modify [23:17:16] hmm [23:17:29] 305785 works [23:17:53] but I get abusefilter-history-error-hidden [23:19:03] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Fix user namespaces on Slovak Wikipedia (T143472) (duration: 00m 58s) [23:19:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:19:22] is the output cached? :s [23:19:51] legoktm: ping? [23:20:15] hi [23:20:34] (iffy internet right now, what's up?) [23:20:46] legoktm: https://gerrit.wikimedia.org/r/#/c/305596/ is on mw1099 [23:20:52] but doesn't seem to work [23:20:53] any idea? [23:21:41] oh [23:21:42] hmm [23:21:44] I think I know [23:21:53] I'm going to live hack on tin real quick [23:22:35] !log NamespaceDupes maintenance script run on sk.wikipedia [23:22:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:23:02] legoktm: currently deployed on mw1099, not in full cluster by the way [23:23:11] is it not on tin? [23:23:15] yes it is [23:23:32] oh [23:23:35] tin /srv/mediawiki-staging + mw1099 [23:23:38] my bad [23:24:27] heh, with live hacks it's easier to debug :) [23:25:14] Platonides: especially you can do that on servers not in prod [23:25:29] (well in the prod cluster, but not user-facing) [23:26:00] (03PS20) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [23:27:14] not that many people would notice this if in hacked in real prod [23:27:28] (03CR) 10BBlack: [C: 032] text VCL: 403 geoiplookup w/o referer [puppet] - 10https://gerrit.wikimedia.org/r/306065 (https://phabricator.wikimedia.org/T100902) (owner: 10BBlack) [23:27:29] but better not to do it user-facing, indeed ;) [23:27:34] (03PS2) 10BBlack: text VCL: 403 geoiplookup w/o referer [puppet] - 10https://gerrit.wikimedia.org/r/306065 (https://phabricator.wikimedia.org/T100902) [23:27:36] (03CR) 10BBlack: [V: 032] text VCL: 403 geoiplookup w/o referer [puppet] - 10https://gerrit.wikimedia.org/r/306065 (https://phabricator.wikimedia.org/T100902) (owner: 10BBlack) [23:29:32] now it works [23:29:36] Platonides: try now? [23:29:36] using mw1099 [23:29:41] Heh [23:29:48] so uh [23:30:35] I'll submit a patch in a sec ;) [23:30:36] did you change anything? [23:32:00] yes [23:37:15] Cherry-picked for wmf/1.28.0-wmf.15 as https://gerrit.wikimedia.org/r/#/c/306077/ [23:38:20] Dereckson: lgtm, do you want to deploy it? you'll need to revert my local changes in the repo though [23:39:37] so wmf/1.28.0-wmf.15 will be 306077, right? [23:40:26] (ah just stash, you didn't commit, ok) [23:41:41] yeah [23:48:33] 306076 for master merged, I CR+2 306077 for wmf15 [23:50:48] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [23:54:19] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:55:08] 06Operations, 10Phabricator: networking: allow ssh between iridium and phab2001 - https://phabricator.wikimedia.org/T143363#2574048 (10mmodell) >>! In T143363#2566159, @Dzahn wrote: > regarding that reinstall, btw @20after4 my thoughts were like "once phab2001 is installed and working, are we going to make tha... [23:55:56] Platonides: legoktm: 306077 live on mw1099 [23:56:27] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [23:56:54] green light to send it to prod? [23:57:17] Dereckson: it works [23:57:23] Nice. [23:57:25] go ahead [23:58:18] !log dereckson@tin Synchronized php-1.28.0-wmf.15/extensions/AbuseFilter/Views/: Let abusefilter-modify users see history of hidden filters ([[gerrit:305596]]+[[gerrit:306077]], T143365) (duration: 00m 50s) [23:58:19] T143365: abusefilter-modify should let you view histories of private filters - https://phabricator.wikimedia.org/T143365 [23:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:59:15] SWAT done.