[00:00:50] TE MATARE ALVARO MOLINA [00:13:24] TE MATARE ALVARO MOLINA PUTO [00:32:26] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [00:34:56] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4926183 keys - replication_delay is 0 [01:37:04] PROBLEM - puppet last run on elastic2001 is CRITICAL: CRITICAL: puppet fail [01:55:44] 07Puppet, 06Labs, 10Phabricator: Update phabricator puppet role to support use on labs - https://phabricator.wikimedia.org/T144112#2589893 (10Peachey88) [02:02:06] RECOVERY - puppet last run on elastic2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:25:14] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.16) (duration: 11m 25s) [02:25:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:31:06] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Aug 29 02:31:06 UTC 2016 (duration 5m 53s) [02:31:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:59:51] TU DEBES MORIR ALVARO MOLINA PUTO DE MIERDA [02:59:53] TU DEBES MORIR ALVARO MOLINA PUTO DE MIERDA [02:59:53] TU DEBES MORIR ALVARO MOLINA PUTO DE MIERDA [03:02:19] TU DEBES MORIR ALVARO MOLINA PUTO DE MIERDA [03:02:20] TU DEBES MORIR ALVARO MOLINA PUTO DE MIERDA [03:02:20] TU DEBES MORIR ALVARO MOLINA PUTO DE MIERDA [03:02:20] TU DEBES MORIR ALVARO MOLINA PUTO DE MIERDA [03:02:20] LALLLLALALALA [03:02:20] SLSLSLLSO [03:02:21] SLSLSLLSOD [03:02:21] D [03:02:22] D [03:51:41] Any operations guy around? [03:51:47] (not urgent) [03:51:57] Just a simple question. [03:54:19] We were asked at Commons ‘how big’ the database of files is… https://commons.wikimedia.org/wiki/Special:MediaStatistics claims about 87 terabytes… I was wondering if that is ‘top versions’, or the entire history of old revisions. [04:25:37] Revent: I believe "Statistics about uploaded file types. This only includes the most recent version of a file. Old or deleted versions of files are excluded." woud indicate its only the current revision [04:27:00] Oh derp, I should read better. :P [04:27:13] Thanks. [04:58:47] (03PS1) 10Giuseppe Lavagetto: puppetmasters: support multiple git masters [puppet] - 10https://gerrit.wikimedia.org/r/307236 (https://phabricator.wikimedia.org/T143869) [04:59:38] (03Abandoned) 10Giuseppe Lavagetto: [WiP] puppetmaster::gitclone: support primary/secundary masters [puppet] - 10https://gerrit.wikimedia.org/r/306833 (owner: 10Giuseppe Lavagetto) [04:59:48] (03CR) 10jenkins-bot: [V: 04-1] puppetmasters: support multiple git masters [puppet] - 10https://gerrit.wikimedia.org/r/307236 (https://phabricator.wikimedia.org/T143869) (owner: 10Giuseppe Lavagetto) [05:04:40] (03PS2) 10Giuseppe Lavagetto: puppetmasters: support multiple git masters [puppet] - 10https://gerrit.wikimedia.org/r/307236 (https://phabricator.wikimedia.org/T143869) [05:46:35] (03PS2) 10KartikMistry: WIP: Remove cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/306674 (https://phabricator.wikimedia.org/T129284) [06:10:57] (03PS3) 1020after4: WIP: Scap swat command [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) [06:47:25] (03PS2) 10Muehlenhoff: Ship a script to rewrite group memberships after enabling the memberof overlay [puppet] - 10https://gerrit.wikimedia.org/r/306905 (https://phabricator.wikimedia.org/T142817) [07:54:00] (03CR) 10Muehlenhoff: [C: 032] Ship a script to rewrite group memberships after enabling the memberof overlay [puppet] - 10https://gerrit.wikimedia.org/r/306905 (https://phabricator.wikimedia.org/T142817) (owner: 10Muehlenhoff) [08:09:06] (03CR) 10Muehlenhoff: [C: 032] Disable unprivileged user namespaces on labvirt nodes running 4.4 HWE kernels [puppet] - 10https://gerrit.wikimedia.org/r/306910 (https://phabricator.wikimedia.org/T142567) (owner: 10Muehlenhoff) [08:09:10] (03PS2) 10Muehlenhoff: Disable unprivileged user namespaces on labvirt nodes running 4.4 HWE kernels [puppet] - 10https://gerrit.wikimedia.org/r/306910 (https://phabricator.wikimedia.org/T142567) [08:18:54] !log Upgrading Zuul server on gallium zuul 2.1.0-391-gbc58ea3-wmf2precise1 => zuul_2.5.0-8-gcbc7f62-wmf1precise1 T144088 [08:18:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:19:07] stashbot: ping [08:23:25] 06Operations, 10Continuous-Integration-Infrastructure, 07Zuul: Upgrade Zuul on gallium to 2.5.0-8-gcbc7f62-wmf1precise1 - https://phabricator.wikimedia.org/T144088#2590129 (10hashar) Got Zuul upgraded on gallium (the Zuul server). Dependencies upgrades I failed to add in the changelog: | Module | Old | New... [08:26:09] (03CR) 10Muehlenhoff: "Looks good to me, but db1075 still includes base::firewall in site.pp, that one can be dropped as well." [puppet] - 10https://gerrit.wikimedia.org/r/306918 (owner: 10Jcrespo) [08:29:47] (03CR) 10Volans: "Although a bit late, my 2 cents ;)" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/306905 (https://phabricator.wikimedia.org/T142817) (owner: 10Muehlenhoff) [08:32:04] volans: this is a one-off script, which will be obsolete by the end of this week, feel free to submit your changes as a patch, but it will be of limited use :-) [08:32:36] moritzm: so obsolete that we'll drop it from the repo? :) [08:33:18] in that case nevermind, ignore my comments, I was missing the context ;) [08:33:43] I think so, all further group changes will be updated by slapo-memberof, this is only needed to retroactively update the previously existing entries [08:34:03] ok [08:34:11] then sorry for the noise [08:34:28] but thanks for the pointer towards getpass, I was unaware of that :-) [08:35:43] (03PS1) 10Elukey: Raise the Varnishkafka maximum incomplete transactions to 5000 [puppet] - 10https://gerrit.wikimedia.org/r/307246 [08:36:37] (03PS3) 10Alexandros Kosiaris: ores: Define extra config for ores [puppet] - 10https://gerrit.wikimedia.org/r/306839 (https://phabricator.wikimedia.org/T143567) (owner: 10Ladsgroup) [08:36:41] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ores: Define extra config for ores [puppet] - 10https://gerrit.wikimedia.org/r/306839 (https://phabricator.wikimedia.org/T143567) (owner: 10Ladsgroup) [08:40:52] (03CR) 10Elukey: "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/3867/" [puppet] - 10https://gerrit.wikimedia.org/r/307246 (owner: 10Elukey) [08:41:06] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, package is uploaded, I'll merge this later today once Tyler is online" [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [08:42:25] (03PS5) 10Jcrespo: mysql: Clean up puppet code related to code databases [puppet] - 10https://gerrit.wikimedia.org/r/306918 [08:42:45] (03PS6) 10Jcrespo: mysql: Clean up puppet code related to code databases [puppet] - 10https://gerrit.wikimedia.org/r/306918 [08:47:20] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, pcc https://puppet-compiler.wmflabs.org/3868/" [puppet] - 10https://gerrit.wikimedia.org/r/306936 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [08:48:51] (03PS3) 10Jcrespo: Labsdb: include labs salt groups and prometheus monitoring for dbs [puppet] - 10https://gerrit.wikimedia.org/r/306936 (https://phabricator.wikimedia.org/T126757) [08:49:56] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/306918 (owner: 10Jcrespo) [08:50:03] (03CR) 10Ema: [C: 031] Raise the Varnishkafka maximum incomplete transactions to 5000 [puppet] - 10https://gerrit.wikimedia.org/r/307246 (owner: 10Elukey) [08:50:05] (03CR) 10Jcrespo: [C: 032] Labsdb: include labs salt groups and prometheus monitoring for dbs [puppet] - 10https://gerrit.wikimedia.org/r/306936 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [08:50:23] (03CR) 10Filippo Giunchedi: "@Daniel yeah we can either remove those or rename the "-srv" version to the current name (i.e. without "-srv")" [puppet] - 10https://gerrit.wikimedia.org/r/306501 (owner: 10Dzahn) [08:52:30] (03PS2) 10Elukey: Raise the Varnishkafka maximum incomplete transactions to 5000 [puppet] - 10https://gerrit.wikimedia.org/r/307246 [08:55:19] labs didn't have the users added, doing it now [08:56:22] (03CR) 10Elukey: [C: 032] Raise the Varnishkafka maximum incomplete transactions to 5000 [puppet] - 10https://gerrit.wikimedia.org/r/307246 (owner: 10Elukey) [09:06:53] (03CR) 10Filippo Giunchedi: [C: 04-1] "overall LGTM, some comments about the code, also it'd be nice to use the if __name__ == '__main__' idiom and have a standalone function do" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303531 (https://phabricator.wikimedia.org/T116742) (owner: 10Muehlenhoff) [09:07:56] (03PS7) 10Jcrespo: mysql: Clean up puppet code related to code databases [puppet] - 10https://gerrit.wikimedia.org/r/306918 [09:08:32] some gerrit issues, maybe? [09:08:49] of the web interface, not git [09:10:53] (03CR) 10Jcrespo: [C: 032] mysql: Clean up puppet code related to code databases [puppet] - 10https://gerrit.wikimedia.org/r/306918 (owner: 10Jcrespo) [09:12:40] (03CR) 10Alexandros Kosiaris: [C: 031] "Looks nice, thanks. Had to RTFM myself for the clear of Environment= and EnvironmentFile= part as well. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/306225 (https://phabricator.wikimedia.org/T143210) (owner: 10Muehlenhoff) [09:13:34] moritzm, firewall rules applied cleanly and created no connectivity issues [09:13:40] which was my only concern [09:15:08] ok, great. there's also a further ferm change from my side; migrating from $INTERNAL->DOMAIN_NETWORKS, but will make a separate patch, didn't want to entangle this with your patch [09:15:56] 06Operations, 10MediaWiki-General-or-Unknown, 06Services, 10Traffic: Investigate query parameter normalization for MW/services - https://phabricator.wikimedia.org/T138093#2590240 (10Jhernandez) > Worst case we could simply skip sort-normalization in v3 for any query-string that contains brackets. That sou... [09:16:47] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-en-ca: New upstream release and Jessie build [debs/contenttranslation/apertium-en-ca] - 10https://gerrit.wikimedia.org/r/294264 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [09:17:27] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-isl: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-isl] - 10https://gerrit.wikimedia.org/r/296050 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [09:18:08] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-hbs: New upstream snapshot and rebuild [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/305498 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [09:19:01] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-eus: Rebuild for Jessie and other fixes [debs/contenttranslation/apertium-eus] - 10https://gerrit.wikimedia.org/r/294673 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [09:20:28] moritzm: re: https://gerrit.wikimedia.org/r/#/c/303531/ another possible way would be to push stats to graphite/statsd on a periodic basis, e.g. to the "daily" hierarchy, you'd only need to scan the most recent history [09:20:49] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-en-es: Rebuilt for Jessie [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/294314 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [09:21:53] (03CR) 10Alex Monk: Bump scap version to 3.2.4-1 [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [09:22:17] (03CR) 10Alexandros Kosiaris: [C: 032] giella-sme: Initial Debian packaging [debs/contenttranslation/giella-sme] - 10https://gerrit.wikimedia.org/r/294430 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [09:24:59] godog: yeah, I was thinking of doing that in a second step [09:25:11] 06Operations, 10Traffic: varnishkafka frequently disconnects from kafka servers - https://phabricator.wikimedia.org/T144158#2590248 (10ema) [09:26:00] (03PS2) 10Jcrespo: dbproxy: add prometheus node monitoring [puppet] - 10https://gerrit.wikimedia.org/r/306937 (https://phabricator.wikimedia.org/T126757) [09:26:08] moritzm: ok! that works too, doesn't look like the history files are ever purged/rotated [09:27:19] 06Operations, 10Traffic: varnishkafka frequently disconnects from kafka servers - https://phabricator.wikimedia.org/T144158#2590260 (10ema) p:05Triage>03Normal [09:28:08] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/306225 (https://phabricator.wikimedia.org/T143210) (owner: 10Muehlenhoff) [09:28:43] (03CR) 10Jcrespo: [C: 04-1] "We cannot apply this until we apply the firewall to all proxies." [puppet] - 10https://gerrit.wikimedia.org/r/306937 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [09:28:49] !log upgrading cp4007 to Varnish 4 (T131502) [09:28:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:30:28] (03PS2) 10Jcrespo: es2001-4: add node exporter to this standalones hosts [puppet] - 10https://gerrit.wikimedia.org/r/306939 (https://phabricator.wikimedia.org/T126757) [09:31:09] (03CR) 10jenkins-bot: [V: 04-1] es2001-4: add node exporter to this standalones hosts [puppet] - 10https://gerrit.wikimedia.org/r/306939 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [09:32:54] (03PS1) 10Ema: Upgrade cp4007 (ulsfo cache_upload) to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/307247 (https://phabricator.wikimedia.org/T131502) [09:34:14] (03CR) 10Ema: [C: 032] Upgrade cp4007 (ulsfo cache_upload) to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/307247 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [09:34:21] (03PS2) 10Jcrespo: Update regex to include new labsdb and proxy machines [puppet] - 10https://gerrit.wikimedia.org/r/301095 [09:35:46] (03CR) 10Volans: [C: 04-1] Generate stats for monthly package upgrade activity (0311 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303531 (https://phabricator.wikimedia.org/T116742) (owner: 10Muehlenhoff) [09:35:48] (03CR) 10Jcrespo: [C: 032] Update regex to include new labsdb and proxy machines [puppet] - 10https://gerrit.wikimedia.org/r/301095 (owner: 10Jcrespo) [09:35:53] (03PS3) 10Jcrespo: Update regex to include new labsdb and proxy machines [puppet] - 10https://gerrit.wikimedia.org/r/301095 [09:35:57] (03CR) 10Jcrespo: [V: 032] Update regex to include new labsdb and proxy machines [puppet] - 10https://gerrit.wikimedia.org/r/301095 (owner: 10Jcrespo) [09:36:48] (03PS3) 10Jcrespo: Remove unused accounts from unneeded functionalities with large uid [puppet] - 10https://gerrit.wikimedia.org/r/304203 [09:38:10] (03CR) 10Jcrespo: [C: 032] Remove unused accounts from unneeded functionalities with large uid [puppet] - 10https://gerrit.wikimedia.org/r/304203 (owner: 10Jcrespo) [09:44:09] !log ema@palladium conftool action : set/pooled=yes; selector: cp4007.ulsfo.wmnet [09:44:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:45:54] (03PS1) 10Niharika29: Switch enwiki to uca-default collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307248 (https://phabricator.wikimedia.org/T136150) [09:46:49] 06Operations, 10Traffic: varnishkafka frequently disconnects from kafka servers - https://phabricator.wikimedia.org/T144158#2590303 (10elukey) Two kafka brokers seems to handle less traffic from the same date: {F4412361} Maybe it is only a matter of rebalancing the kafka topic partition leaders? [09:46:59] (03PS1) 10Jcrespo: prometheus: add labsdb eqiad hosts to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/307249 (https://phabricator.wikimedia.org/T126757) [09:48:59] (03PS1) 10Jcrespo: prometheus: Add parsercaches on eqiad (and fix the ones on codfw) [puppet] - 10https://gerrit.wikimedia.org/r/307250 (https://phabricator.wikimedia.org/T126757) [09:52:32] (03CR) 10Filippo Giunchedi: [C: 031] prometheus: Add parsercaches on eqiad (and fix the ones on codfw) [puppet] - 10https://gerrit.wikimedia.org/r/307250 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [09:52:51] (03CR) 10Filippo Giunchedi: [C: 031] prometheus: add labsdb eqiad hosts to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/307249 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [09:55:19] !log depooled mw1266, running some tests with systemd unit override and hhvm update [09:55:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:56:55] (03CR) 10Jcrespo: [C: 032] prometheus: add labsdb eqiad hosts to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/307249 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [09:57:07] (03CR) 10Jcrespo: [C: 032] prometheus: Add parsercaches on eqiad (and fix the ones on codfw) [puppet] - 10https://gerrit.wikimedia.org/r/307250 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [09:57:58] (03PS1) 10Filippo Giunchedi: cassandra: add ssl monitoring only for ssl-enabled hosts [puppet] - 10https://gerrit.wikimedia.org/r/307251 (https://phabricator.wikimedia.org/T120662) [09:58:33] !log Executed 'kafka preferred-replica-election' on kafka1012 to rebalance Kafka broker leaders (T144158) [09:58:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:06:07] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/3870/" [puppet] - 10https://gerrit.wikimedia.org/r/307251 (https://phabricator.wikimedia.org/T120662) (owner: 10Filippo Giunchedi) [10:08:44] "Package prometheus-node-exporter is not available" on labsdb1005 (?) [10:09:11] !log disabling puppet on jessie mediawiki hosts in eqiad for staged merged of https://gerrit.wikimedia.org/r/#/c/306225/ [10:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:09:25] oh, it is a precise [10:10:48] (03PS1) 10Jcrespo: prometheus: add misc eqiad hosts to mysqld exporter [puppet] - 10https://gerrit.wikimedia.org/r/307254 (https://phabricator.wikimedia.org/T126757) [10:13:09] jynus: ah, I haven't bothered with uploading to precise but the trusty package might just work [10:13:17] do not bother [10:13:29] oh, you want to try forcing it? [10:13:37] I was going to add an exception [10:13:50] but I will try manually installing, that host has to go away soon [10:14:04] yup worth a manual try first [10:15:11] 06Operations, 10Traffic: varnishkafka frequently disconnects from kafka servers - https://phabricator.wikimedia.org/T144158#2590327 (10elukey) There is also another interesting thing to notice: * from Aug 8th at ~18:30 UTC these messages were logged for various brokers: ``` ... Aug 08 18:30:28 cp3008 varnishk... [10:20:55] (03PS3) 10Muehlenhoff: Provide a systemd override unit for hhvm [puppet] - 10https://gerrit.wikimedia.org/r/306225 (https://phabricator.wikimedia.org/T143210) [10:21:55] (03CR) 10Elukey: [C: 031] cassandra: add ssl monitoring only for ssl-enabled hosts [puppet] - 10https://gerrit.wikimedia.org/r/307251 (https://phabricator.wikimedia.org/T120662) (owner: 10Filippo Giunchedi) [10:22:46] (03CR) 10Muehlenhoff: [C: 032] Provide a systemd override unit for hhvm [puppet] - 10https://gerrit.wikimedia.org/r/306225 (https://phabricator.wikimedia.org/T143210) (owner: 10Muehlenhoff) [10:25:07] (03PS1) 10ArielGlenn: abstract out code for adds/changes dumps generation, for general library [dumps] - 10https://gerrit.wikimedia.org/r/307257 (https://phabricator.wikimedia.org/T133547) [10:25:22] (03CR) 10jenkins-bot: [V: 04-1] abstract out code for adds/changes dumps generation, for general library [dumps] - 10https://gerrit.wikimedia.org/r/307257 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [10:28:09] (03PS2) 10ArielGlenn: abstract out code for adds/changes dumps generation, for general library [dumps] - 10https://gerrit.wikimedia.org/r/307257 (https://phabricator.wikimedia.org/T133547) [10:29:38] (03PS1) 10Muehlenhoff: Revert "Provide a systemd override unit for hhvm" [puppet] - 10https://gerrit.wikimedia.org/r/307260 [10:31:01] (03CR) 10Muehlenhoff: [C: 032] Revert "Provide a systemd override unit for hhvm" [puppet] - 10https://gerrit.wikimedia.org/r/307260 (owner: 10Muehlenhoff) [10:34:57] apergos: feel free to add me to ^^^ if you want me to have a look ;) [10:35:53] volans: sure, but it's way too soon. this is my "put the patch somewhere in gerrit in case my laptop dies" strategy [10:36:15] yeah, I saw the WIP [10:36:24] I need to move the caller script over, then break the whole patch down into little bite-sized pieces for review :-) [10:37:49] actually I will add you now, but don't review now, it's pointless :-D [10:38:04] ok [10:50:43] 06Operations, 10Continuous-Integration-Infrastructure, 07Zuul: Upgrade Zuul on gallium to 2.5.0-8-gcbc7f62-wmf2precise1 - https://phabricator.wikimedia.org/T144088#2590397 (10hashar) [10:51:53] 06Operations, 10Continuous-Integration-Infrastructure, 07Zuul: Upgrade Zuul on gallium to 2.5.0-8-gcbc7f62-wmf2precise1 - https://phabricator.wikimedia.org/T144088#2588273 (10hashar) 05Open>03Resolved I have build yet another package to cherry pick a few more patches we needed. New package is on gallium... [10:52:14] !log Upgrading Zuul server on gallium zuul_2.5.0-8-gcbc7f62-wmf1precise1 zuul_2.5.0-8-gcbc7f62-wmf2precise1 T144088 [10:52:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:52:41] 06Operations, 10Continuous-Integration-Infrastructure, 07Zuul: Upgrade Zuul on gallium to 2.5.0-8-gcbc7f62-wmf2precise1 - https://phabricator.wikimedia.org/T144088#2590406 (10hashar) 05Resolved>03Open did not meant to resolve it sorry. Still need upload to apt.wm.o [11:07:17] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-en-ca_0.9.3~r61328-2+wmf1 [11:07:17] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-en-es_0.8.0+svn~57502-2+wmf1 [11:07:17] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-eus_0.1.0-1+wmf1 [11:07:17] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-hbs_0.5.0~r68212-1+wmf1 [11:07:18] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-isl_0.1.0~r65494-1+wmf1 [11:07:18] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: giella-sme_0.0.20150917~r121176-1+wmf1 [11:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:07:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:07:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:07:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:10:01] (03PS1) 10Filippo Giunchedi: [WIP] monitoring: add check_prometheus define [puppet] - 10https://gerrit.wikimedia.org/r/307269 [11:10:30] (03PS1) 10Muehlenhoff: Provide a systemd override unit for hhvm [puppet] - 10https://gerrit.wikimedia.org/r/307270 (https://phabricator.wikimedia.org/T143210) [11:12:01] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:12:12] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/296049 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:12:15] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/296203 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:12:17] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:12:18] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-isl-eng] - 10https://gerrit.wikimedia.org/r/296157 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:12:21] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-sme-nob] - 10https://gerrit.wikimedia.org/r/295185 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [11:12:23] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-isl-eng] - 10https://gerrit.wikimedia.org/r/296157 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:12:25] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-kaz-tat] - 10https://gerrit.wikimedia.org/r/296369 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:12:26] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-sme-nob] - 10https://gerrit.wikimedia.org/r/295185 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [11:12:29] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:15:30] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169#2590514 (10Volans) [11:28:00] volans: spoiler, Phabricator has a poll application https://phabricator.wikimedia.org/vote/ :] [11:28:11] which you can then embed in tasks iirc :D [11:28:29] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:29:43] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-sme-nob: Initial Debian packaging [debs/contenttranslation/apertium-sme-nob] - 10https://gerrit.wikimedia.org/r/295185 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [11:30:14] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-isl-eng: New upstream, rebuild for Jessie [debs/contenttranslation/apertium-isl-eng] - 10https://gerrit.wikimedia.org/r/296157 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:30:32] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-hbs-slv: New upstream, rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/296203 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:30:38] 06Operations, 10Traffic: varnishkafka frequently disconnects from kafka servers - https://phabricator.wikimedia.org/T144158#2590539 (10elukey) Most of the errors seemed to be related to kafka1012 and kafka1018, the ones that were not acting as leaders. Rebalancing the brokers helped and the disconnects stopped... [11:30:49] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-hbs-eng: New upstream, rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/296049 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:31:26] !log reimaging mw2088 to jessie [11:31:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:35:33] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-urd-hin: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:36:42] (03CR) 10Alexandros Kosiaris: "the "unrepresentable changes to source" still stands." [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:38:21] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169#2590544 (10hashar) A bit of context for the CI part: The Jenkins job `operations-puppet-tox` is pretty simple, it basically:... [11:38:53] volans: on https://phabricator.wikimedia.org/T144169 you might want to list the commands that gives the stats python: 80 files python3: 16 files . [11:38:59] might be useful to track progress later on [11:44:54] 06Operations, 07HHVM: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2590551 (10elukey) Keeping track of the current status in https://etherpad.wikimedia.org/p/trusty-mw-reimage [11:52:33] !log reimaging mw209[01] to jessie [11:52:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:54:24] (03PS1) 10Muehlenhoff: puppet_compiler: Restrict to labs networks [puppet] - 10https://gerrit.wikimedia.org/r/307275 [12:04:38] hashar: thanks for the poll and sure I'll add more details, also replying to you on the 1st possibility [12:04:56] I didn't explain myself well :) after lunch I'll update it [12:05:08] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-hbs-eng_0.1.0~r57598-1+wmf1 [12:05:08] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-hbs-slv_0.1.0~r59294-1+wmf1 [12:05:08] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-isl-eng_0.1.0~r66083-1+wmf1 [12:05:08] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-sme-nob_0.6.0~r61921-1+wmf1 [12:05:08] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-urd-hin_0.1.0~r64379-1+wmf1 [12:05:09] volans: I dont mind either way. But would like to keep the jenkins job as dumb as possible (eg just invoke 'tox' [12:05:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:05:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:05:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:05:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:11:28] akosiaris: are those .deb building just fine in Jenkins ? [12:11:39] hashar: yes [12:11:48] piuparts is failing but it is not voting [12:11:53] so we 've ignored it for now [12:12:05] and it is failing cause it is not using jessie-wikimedia from what I can tell [12:12:08] yeah I think there is something off in the systemd file installation [12:12:15] looks like it forget to remove some files/symlinks [12:14:55] 06Operations, 10Deployment-Systems, 10scap, 03Scap3: Make keyholder work with systemd - https://phabricator.wikimedia.org/T144043#2590601 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff I'll look into that. [12:22:48] (03PS1) 10Alexandros Kosiaris: logging::mediawiki: Remove redundant NRPE ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307278 [12:22:50] (03PS1) 10Alexandros Kosiaris: nrpe: update ferm::rule to remove INTERNAL [puppet] - 10https://gerrit.wikimedia.org/r/307279 [12:22:53] (03PS9) 10Thiemo Mättig (WMDE): Fix phabricator expanding links [puppet] - 10https://gerrit.wikimedia.org/r/306413 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [12:23:56] 06Operations, 10MediaWiki-JobQueue, 07Regression: Restore 30 minutes delayed list update to no waiting, to stop killing sandbox functionality - https://phabricator.wikimedia.org/T139893#2590607 (10Aklapper) Summary: That [[ https://en.wikipedia.org/wiki/User:ManosHacker/sandbox | example page ]] uses [[ http... [12:25:10] (03CR) 10Muehlenhoff: [C: 031] logging::mediawiki: Remove redundant NRPE ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307278 (owner: 10Alexandros Kosiaris) [12:26:39] (03PS7) 10Hashar: cache: vary statsd_server with hiera [puppet] - 10https://gerrit.wikimedia.org/r/249490 (https://phabricator.wikimedia.org/T116898) [12:28:48] (03CR) 10Muehlenhoff: nrpe: update ferm::rule to remove INTERNAL (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/307279 (owner: 10Alexandros Kosiaris) [12:29:48] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations, 10Research-management: Request access to data for WDQS research - https://phabricator.wikimedia.org/T142780#2590614 (10AlexKrauseTUD) * have signed the L3 doc * I have acreated a useraccount through https://www.mediawiki... [12:29:52] (03CR) 10Mobrovac: "Has this been moved somewhere else? How will CXServer now get the articles?" [puppet] - 10https://gerrit.wikimedia.org/r/306674 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [12:30:46] (03CR) 10Hashar: "I have rebased this on tip of production. The only change is that there are now check_procs defined in module/varnish/manifests/logging/ " [puppet] - 10https://gerrit.wikimedia.org/r/249490 (https://phabricator.wikimedia.org/T116898) (owner: 10Hashar) [12:34:52] (03PS10) 10Thiemo Mättig (WMDE): Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/256663 (https://phabricator.wikimedia.org/T75997) [12:36:56] (03PS1) 10Ema: Upgrade upload ulsfo to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/307282 (https://phabricator.wikimedia.org/T131502) [12:38:02] (03CR) 10BBlack: [C: 031] Upgrade upload ulsfo to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/307282 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [12:39:42] (03CR) 10Hashar: "Actually, it is probably for beta cluster to keep logging under varnish. instead of trying to inject the BetaMediaWiki prefix everywhere. " [puppet] - 10https://gerrit.wikimedia.org/r/249490 (https://phabricator.wikimedia.org/T116898) (owner: 10Hashar) [12:40:32] (03CR) 10Ema: [C: 032] Upgrade upload ulsfo to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/307282 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [12:42:27] !log upgrading cp4013 to Varnish 4 (T131502) [12:42:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:42:55] !log ema@palladium conftool action : set/pooled=no; selector: cp4013.ulsfo.wmnet [12:43:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:43:29] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review, 07WorkType-Maintenance: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#2590674 (10hashar) I have rebased the Puppet patch https://gerrit.wikimedia.org/r/#/c/249490/... [12:45:08] (03PS1) 10Filippo Giunchedi: hieradata: add prometheus_nodes for ulsfo/esams [puppet] - 10https://gerrit.wikimedia.org/r/307284 [12:46:07] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review, 07WorkType-Maintenance: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#2590718 (10hashar) The patch has been cherry picked on beta cluster for quite a while already s... [12:46:50] (03CR) 10Filippo Giunchedi: [C: 031] prometheus: add misc eqiad hosts to mysqld exporter [puppet] - 10https://gerrit.wikimedia.org/r/307254 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [12:49:35] (03PS2) 10Filippo Giunchedi: hieradata: add prometheus_nodes for ulsfo/esams [puppet] - 10https://gerrit.wikimedia.org/r/307284 [12:50:28] (03PS8) 10BBlack: cache: vary statsd_server with hiera [puppet] - 10https://gerrit.wikimedia.org/r/249490 (https://phabricator.wikimedia.org/T116898) (owner: 10Hashar) [12:51:22] !log ema@palladium conftool action : set/pooled=yes; selector: cp4013.ulsfo.wmnet [12:51:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:51:51] (03CR) 10BBlack: [C: 032 V: 032] cache: vary statsd_server with hiera [puppet] - 10https://gerrit.wikimedia.org/r/249490 (https://phabricator.wikimedia.org/T116898) (owner: 10Hashar) [12:52:40] (03CR) 10KartikMistry: "@mobrovac, config.yaml doesn't contain restbase_url - is that mistake? see, https://phabricator.wikimedia.org/diffusion/GCXS/browse/master" [puppet] - 10https://gerrit.wikimedia.org/r/306674 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [12:53:27] hashar: around for european swat? [12:53:31] yeah [12:53:40] jouncebot is dead :(- [12:54:43] what happened to poor jouncebot? [12:55:11] hashar: hangout? [12:55:39] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169#2590769 (10Volans) [12:57:25] (03PS3) 10Filippo Giunchedi: hieradata: add prometheus_nodes for ulsfo/esams [puppet] - 10https://gerrit.wikimedia.org/r/307284 [12:57:59] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169#2590775 (10Volans) >>! In T144169#2590544, @hashar wrote: > So for 1 //Fix the Jenkins job to search for those files and inclu... [13:00:23] zeljkof: na I think I will skip hangout today [13:00:42] Amir1: MatmaRex Urbanecm tto dcausse European SWAT deploy starting now [13:00:49] o/ [13:00:55] you are first :] [13:00:55] hi [13:00:56] hey [13:00:59] where's jouncebot? :( [13:01:05] died :( [13:01:09] 'ello 'ello 'ello folks [13:01:34] hashar: want to do swat, or should I do it? [13:01:38] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add prometheus_nodes for ulsfo/esams [puppet] - 10https://gerrit.wikimedia.org/r/307284 (owner: 10Filippo Giunchedi) [13:01:38] lets SWAT https://gerrit.wikimedia.org/r/#/c/307261/ "Fallback to QueryString if we detect acronyms" for dcausse [13:02:11] zeljkof: guess I am hot to handle the patches :D [13:02:49] dcausse: do you have a way to check that patch with the Wikimedia Debug extension and mw1099 ? [13:02:54] hashar: yes [13:02:58] great [13:03:48] hashar: ok, you are the man then, ping me if you need help ;) [13:04:07] MatmaRex: I have +2ed your patch https://gerrit.wikimedia.org/r/#/c/306951/ will push it after dcausse one [13:04:29] zeljkof: sure thing :] [13:04:53] I should have CR+2 ed those patches earlier [13:05:07] (03Abandoned) 10Filippo Giunchedi: prometheus: add mysql mediawiki production db discovery [puppet] - 10https://gerrit.wikimedia.org/r/296596 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [13:05:16] (03Abandoned) 10Filippo Giunchedi: prometheus: generate mysql targets from mw config [puppet] - 10https://gerrit.wikimedia.org/r/296595 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [13:05:53] (03PS2) 10Filippo Giunchedi: prometheus: return 204 on / [puppet] - 10https://gerrit.wikimedia.org/r/306671 [13:07:32] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: return 204 on / [puppet] - 10https://gerrit.wikimedia.org/r/306671 (owner: 10Filippo Giunchedi) [13:08:21] dcausse: going to pull on mw1099 [13:08:36] ok [13:08:44] !log Pulled https://gerrit.wikimedia.org/r/#/c/307261/ on mw1099 [13:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:08:51] should be ready for testing [13:09:08] MatmaRex: Amir1 Urbanecm I am going to rebase all your mw config patches [13:09:14] thanks [13:09:19] hashar: it works as expected :) [13:09:23] neat [13:10:17] dcausse: going to be live soonish [13:10:58] !log hashar@tin Synchronized php-1.28.0-wmf.16/extensions/CirrusSearch/includes/Query/FullTextSimpleMatchQueryBuilder.php: Fallback to QueryString if we detect acronyms T143541 (duration: 00m 50s) [13:11:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:11:16] dcausse: it is live ! [13:11:21] hashar: thanks! :) [13:11:56] godog: a bunch of servers are complaining because /usr/bin/scap no such file or directory. I guess puppet cant find the package :( Servers: mw2088 mw2090 and mw2091 [13:12:23] 06Operations, 06Community-Tech, 10wikidiff2, 13Patch-For-Review: Deploy new version of wikidiff2 package - https://phabricator.wikimedia.org/T140443#2464963 (10MoritzMuehlenhoff) I just reimaged a jessie scaler and it fails to run puppet due to being able to find hhvm-wikidiff2, existing servers have 1.3.5... [13:12:25] (03PS1) 10Jcrespo: Workaround still existing, but irrelevant, precise hosts [puppet] - 10https://gerrit.wikimedia.org/r/307285 (https://phabricator.wikimedia.org/T126757) [13:13:09] (03CR) 10Hashar: [C: 032] Disable ORES for reverted, goodfaith and wp10 models [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306904 (https://phabricator.wikimedia.org/T143988) (owner: 10Ladsgroup) [13:13:12] Amir1: I am landing your https://gerrit.wikimedia.org/r/#/c/306904/ [13:13:12] (03PS2) 10Jcrespo: prometheus exporter: avoid still existing precise hosts [puppet] - 10https://gerrit.wikimedia.org/r/307285 (https://phabricator.wikimedia.org/T126757) [13:13:27] I'm ready to test in mw1099 [13:13:36] (03Merged) 10jenkins-bot: Disable ORES for reverted, goodfaith and wp10 models [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306904 (https://phabricator.wikimedia.org/T143988) (owner: 10Ladsgroup) [13:14:08] !log Pulled https://gerrit.wikimedia.org/r/#/c/306904/ on mw1099 [13:14:10] Amir1: done :) [13:14:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:14:21] awesome [13:14:25] MatmaRex: going to do your https://gerrit.wikimedia.org/r/#/c/306951/ next [13:14:35] Amir1: let me know when it can be pushed everywhere [13:14:40] sure [13:14:43] alright [13:15:37] MatmaRex: pulled on mw1099 [13:15:45] !log Pulled https://gerrit.wikimedia.org/r/#/c/306951/ on mw1099 [13:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:24] hashar: deploy it everywhere please :) [13:17:21] should really let you all do the scap dance next time :D [13:17:21] hashar: Present, sorry for my lateness :) [13:17:24] but I am lazy today [13:17:28] Urbanecm: no problem :] [13:17:30] hashar: those were recently reimaged, I guess scap hasn't been installed there yet [13:17:51] thanks hashar :) [13:17:54] godog: or since puppet points to previous version, it can get it from apt.wm.o which has a more recent version ? [13:18:53] hashar: thanks, checking [13:19:44] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Disable ORES for reverted, goodfaith and wp10 models T143988 (duration: 02m 43s) [13:19:47] (03CR) 10Hashar: [C: 032] Add throttling exception for UBC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306903 (https://phabricator.wikimedia.org/T143951) (owner: 10Urbanecm) [13:19:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:19:57] hashar: this'll take a couple of minutes, i need to upload a large file, so please go on with other stuff in the meantime if possible [13:19:58] hashar: you are correct it is racy like that heh :( [13:20:27] MatmaRex: sure thing. take your time and let me know when it is all greenish :] [13:20:34] Urbanecm: doing your throttling exception patch [13:20:40] Okay. [13:20:41] thanks [13:20:49] bah merge conflict :( [13:20:54] https://gerrit.wikimedia.org/r/#/c/306903/ [13:20:59] (03PS2) 10Hashar: Add throttling exception for UBC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306903 (https://phabricator.wikimedia.org/T143951) (owner: 10Urbanecm) [13:21:17] (03CR) 10Hashar: Add throttling exception for UBC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306903 (https://phabricator.wikimedia.org/T143951) (owner: 10Urbanecm) [13:21:20] (03CR) 10Hashar: [C: 032] Add throttling exception for UBC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306903 (https://phabricator.wikimedia.org/T143951) (owner: 10Urbanecm) [13:21:47] hashar: is it live now? [13:21:58] (03Merged) 10jenkins-bot: Add throttling exception for UBC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306903 (https://phabricator.wikimedia.org/T143951) (owner: 10Urbanecm) [13:22:11] (03PS4) 10Hashar: Don't prepend protocol in missing.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304689 (https://phabricator.wikimedia.org/T141208) (owner: 10TTO) [13:22:19] (03PS2) 10Filippo Giunchedi: prometheus: add to LVS [puppet] - 10https://gerrit.wikimedia.org/r/306672 (https://phabricator.wikimedia.org/T126785) [13:22:54] hashar: umm. looks like m1099 doesn't like file uploads. it gives me 413 Request Entity Too Large. same thing works with WikimediaDebug disabled [13:23:03] mw1099* [13:23:13] MatmaRex: eeek :/ [13:23:27] so i dunno, can we try in production? since i'm fairly sure my patch is not causing that… [13:23:38] Urbanecm: can you check on mw1099 that en.wikipedia.org is still all fine ? [13:24:13] MatmaRex: i think there is another server that can be used for testing [13:24:38] hashar: I doesn't see no that seems wrong :). [13:24:53] ;D [13:25:00] jynus: I'm assuming just installing the trusty packages on precise doesn't work heh? [13:25:03] It looks fine now [13:25:07] godog, nope [13:25:43] in fact, I wasn't able neither to remove it properly, pre and post rm fail [13:25:55] hashar: Is it live everywhere? [13:25:57] MatmaRex: I have no clue what that 413 would be about though. If it fails on mw1099 that might be a good indiaction the patch is at fault ? [13:26:03] !log hashar@tin Synchronized wmf-config/throttle.php: Add throttling exception for UBC T143951 (duration: 00m 46s) [13:26:04] not worth maintaining it for precise [13:26:06] MatmaRex: can try to revert the patch on mw1099 and see whether it fix it [13:26:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:26:12] Urbanecm: now it is [13:26:18] Thanks a lot hashar ! [13:26:32] hashar: how would the patch be at fault? i think this is from an earlier laye than MediaWiki [13:26:47] sounds like that comes from varnish potentially [13:27:07] 413 Request Entity Too Large [13:27:07] nginx/1.11.1 [13:27:39] you'll get the same error in production if you try to upload a 100 MB file in a single chunk [13:27:44] but i'm uploading a 5 MB chunk here [13:27:51] Urbanecm, I was about to ban your user from the replicas until I saw you [13:27:59] (03CR) 10Filippo Giunchedi: [C: 031] prometheus exporter: avoid still existing precise hosts [puppet] - 10https://gerrit.wikimedia.org/r/307285 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [13:28:08] jynus: ack, yeah doesn't make sense alright [13:28:12] 9-day long query, very unoptimized [13:28:19] jynus: Am I something doing wrong? [13:28:39] I am creating a ticket [13:29:21] MatmaRex: what about I revert the patch on mw1099 and we try again? [13:29:23] just to confirm [13:29:42] but yeah apparently that is nginx (which does terminate the TSL connection) that complains somehow [13:29:48] hashar: sure [13:30:14] MatmaRex: should be reverted on mw1099 now [13:30:53] tto: if you are still around. Sorry I kind of forgot aobut your patch https://gerrit.wikimedia.org/r/#/c/304689/ [13:31:02] Yes, I'm still here [13:31:14] (03PS2) 10Alexandros Kosiaris: nrpe: remove redundant ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307279 [13:31:17] Is there still time to handle it? It's simple to test [13:31:19] tto: is that testable on mw1099 ? [13:31:30] hashar: i get the same error [13:31:35] jynus: I do not understand you. What am I doing wrong? I was only testing something so mysql console was running (it could run nine days but with no query). If this is wrong I can kill the console everytime I leave my session. I've run ps x | grep mysql at dev and normal bastion, nothing shows (except grep of course). So no mysql process should run from my side. [13:31:39] (03CR) 10Mobrovac: [C: 031] "Oh, right, RB is being accessed through the api util functions. LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/306674 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [13:31:40] I imagine so, but I can't be certain, not knowing the ins and outs of the WMF infrastructure [13:32:01] MatmaRex: so definitely nginx not loving whatever huge header / getting confused somehow :( [13:32:16] Urbanecm, check the ticket I created where I added you there, ask any questions there [13:32:19] tto: do you have steps to reproduce ? [13:32:28] (03CR) 10Hashar: [C: 032] Don't prepend protocol in missing.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304689 (https://phabricator.wikimedia.org/T141208) (owner: 10TTO) [13:32:35] I am not baning you, but I killed the running queries [13:32:55] (03Merged) 10jenkins-bot: Don't prepend protocol in missing.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304689 (https://phabricator.wikimedia.org/T141208) (owner: 10TTO) [13:32:55] tto: ah yeah there are in the task https://phabricator.wikimedia.org/T141208 :) [13:33:04] hashar: yes they are :) [13:33:09] godog: can you update the topic to list cmjohnson1 as Clinic person? (It seems you're the only one who still has the rights) [13:33:13] jynus: Okay... [13:33:32] tto: do you have the Wikimedia Debug chromium/firefox extension enabled? [13:33:37] Basically visit https://wuu.wikipedia.org/wiki/b:zh: and get redirected to the correct URL of Chinese Wikibooks, not a nonsense incubator URL [13:33:42] hashar: yes I do [13:33:52] tto: so your change is live on mw1099 [13:34:19] thanks, testing [13:34:50] MatmaRex: I have put your patch back on mw1099 [13:34:56] (03CR) 10Alexandros Kosiaris: nrpe: remove redundant ferm::rule (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/307279 (owner: 10Alexandros Kosiaris) [13:35:00] andrewbogott: {{done}} ! [13:35:03] hashar, seems all good! thanks [13:35:33] andrewbogott: you can op yourself too, /msg chanserv op #wikimedia-operations andrewbogott [13:35:39] hashar: hmm, let me try something [13:35:52] tto: pushing it to the whole cluster. Make sure to close the task if that solved it :] [13:36:07] Will do. Thanks hashar as always! [13:36:24] !log hashar@tin Synchronized wmf-config/missing.php: Do not prepend protocol in missing.php T141208 (duration: 00m 47s) [13:36:27] I thought we didn't even need op for topic if we're in the list? [13:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:36:34] godog: thanks! [13:36:37] !log upgrading cp4014 to Varnish 4 (T131502) [13:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:36:52] so SWAT is covered with the exception of https://gerrit.wikimedia.org/r/#/c/306951/ "ApiUpload: Better handle unreasonably large metadata in 'imageinfo'" [13:36:54] cmjohnson1: am pinged about this one https://phabricator.wikimedia.org/T143718 [13:36:56] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2590948 (10Ottomata) @Cmjohnson is on ops clinic duty this week. [13:37:06] !log ema@palladium conftool action : set/pooled=no; selector: cp4014.ulsfo.wmnet [13:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:37:29] MatmaRex: will we need a different test case? [13:37:31] hmm guess not :) [13:37:36] hashar: it looks like i can upload the test file on mw1099 in chunks of 100 KB [13:37:42] so i'm trying that. it'll take longer though :P [13:37:44] oh right channel mode +t [13:37:56] MatmaRex: take your time :] [13:38:09] the progress bar says 5 minutes left :) [13:38:30] :( [13:39:45] I have no idea about nginx limitation myself [13:40:37] (03CR) 10Muehlenhoff: [C: 031] "Ack, duplicated by 10_monitoring_all" [puppet] - 10https://gerrit.wikimedia.org/r/307279 (owner: 10Alexandros Kosiaris) [13:41:36] jynus: I've posted something to the ticket. [13:42:31] (03CR) 10KartikMistry: "Still, Beta is not able to load articles :~ Debugging further.." [puppet] - 10https://gerrit.wikimedia.org/r/306674 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [13:42:47] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Internet Explorer 6 can not reach https://*.wikipedia.org - https://phabricator.wikimedia.org/T143539#2590963 (10BBlack) I don't think it's just the SHA-1 problem. We also don't support SSLv3 (since back when the POODLE attack appeared... [13:43:03] Urbanecm, was those kind of queries originated due to code or were one-time custom executions? [13:43:27] One time custom executions jynus :) [13:43:39] the follow up is.: if they are one time executions, be careful next time, check how long they take etc. [13:43:47] if they are on code, change the code [13:43:52] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Internet Explorer 6 can not reach https://*.wikipedia.org - https://phabricator.wikimedia.org/T143539#2590964 (10BBlack) (also, it's possible there's non-default settings on IE6/XP under some conditions / service-packs / patches that al... [13:44:04] !log ema@palladium conftool action : set/pooled=yes; selector: cp4014.ulsfo.wmnet [13:44:07] I didn't had a proper look at the queries [13:44:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:44:19] but probably a join would be more eficient [13:44:41] Okay, thanks for your advice. I'll watch my queries next time. [13:44:42] and limiting a bit the scope [13:44:54] with that in mind, we can close the ticket [13:45:20] Okay. Could you close it with a summary from this discussion? [13:45:24] I was worried becuse we had very high cpu usage on labsdb1001 [13:45:26] yes [13:46:32] godog: ah! thank you [13:47:31] jynus: Thanks. BTW should I be in more WMF channels so everybody will be able to easily find me somewhere? Currently my IRC client auto-join me to #wikimedia-operations, #wikipedia-cs and #wikipedia-sk but I can add more channels if it'll be more easy for the others. [13:47:46] cmjohnson1 andrewbogott np! the access list seems to include most of ops too [13:47:57] labs is the #1 place I go to "talk" to labs and tool users [13:48:07] Okay. [13:48:13] #wikimedia-labs [13:48:30] Added to my auto-connect list :). [13:48:33] I thank you a lot for your reponse [13:48:40] You're welcome. [13:48:42] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Internet Explorer 6 can not reach https://*.wikipedia.org - https://phabricator.wikimedia.org/T143539#2590972 (10BBlack) @Florian - On a related note, we've recently created https://wikitech.wikimedia.org/wiki/HTTPS:_Browser_Recommendat... [13:48:49] hashar: sorry, i did a silly and i have to try again D: [13:48:59] (03Abandoned) 10Andrew Bogott: WIP: Aggregate instance root passwords on Labs puppet master [puppet] - 10https://gerrit.wikimedia.org/r/302834 (owner: 10Andrew Bogott) [13:49:25] (03PS3) 10Andrew Bogott: Nova: update api-paste.ini.erb to conform with Liberty defaults [puppet] - 10https://gerrit.wikimedia.org/r/303434 [13:50:51] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, 10Wikimedia-Logstash: logstash - cron failing to optimize indices - https://phabricator.wikimedia.org/T140973#2590978 (10Gehel) 05Open>03Resolved a:03Gehel No further errors seen in logs. [13:51:54] (03CR) 10Andrew Bogott: [C: 032] Nova: update api-paste.ini.erb to conform with Liberty defaults [puppet] - 10https://gerrit.wikimedia.org/r/303434 (owner: 10Andrew Bogott) [13:52:48] MatmaRex: no worries :) [13:56:09] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review, 07WorkType-Maintenance: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#2590998 (10hashar) 05Open>03Resolved Has been kindly reviewed and merged in by @BBlack . On... [13:56:13] hashar: ok, verified at last. patch works as expected :D [13:56:52] neat! [13:57:41] MatmaRex: ready to get it synced everywhere ? [13:58:41] hashar: yeah, go ahead [13:58:55] unleashed [13:59:32] !log hashar@tin Synchronized php-1.28.0-wmf.16/includes/api/ApiUpload.php: ApiUpload: Better handle unreasonably large metadata in 'imageinfo' T143993 (duration: 00m 46s) [13:59:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:59:38] thanks! just in time ;) [13:59:45] you are a pro! [14:00:12] godog: looks like you got scap installed :] It is still missing from mw2091.codfw.wmnet if you can get it added there :) [14:00:18] (03PS3) 10Rush: toollabs::static: Prune and gc git clone [puppet] - 10https://gerrit.wikimedia.org/r/307111 (owner: 10BryanDavis) [14:00:42] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch: Make elasticsearch actually uses shard allocation awareness - https://phabricator.wikimedia.org/T143571#2591010 (10Gehel) 05Resolved>03Open [14:02:04] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch: Make elasticsearch actually uses shard allocation awareness - https://phabricator.wikimedia.org/T143571#2572176 (10Gehel) I'm reopening this and changed the title. Previous title was just about checking configuration, but this should really be... [14:02:49] hashar: sure, {{done}} [14:03:15] (03PS3) 10Jcrespo: prometheus exporter: avoid still existing precise hosts [puppet] - 10https://gerrit.wikimedia.org/r/307285 (https://phabricator.wikimedia.org/T126757) [14:03:15] godog: danke :) [14:03:31] np! [14:03:41] !log European SWAT deploy is complete [14:03:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:06:29] !log restart elasticsearch on logstash1004 to validate fix for T142357 [14:06:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:12:43] (03CR) 10Jcrespo: [C: 032] prometheus exporter: avoid still existing precise hosts [puppet] - 10https://gerrit.wikimedia.org/r/307285 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [14:15:36] 06Operations, 06Labs: Connect secondary nic for labstore1004 and labstore1005 - https://phabricator.wikimedia.org/T144183#2591023 (10chasemp) [14:20:26] !log upgrading cp4015 to Varnish 4 (T131502) [14:20:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:20:41] !log ema@palladium conftool action : set/pooled=no; selector: cp4015.ulsfo.wmnet [14:20:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:25:40] (03PS1) 10Gilles: Upgrade to 0.1.12 upstream [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/307292 [14:27:29] 06Operations, 10Wikimedia-Logstash, 06Discovery-Search (Current work): Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2591072 (10Gehel) 05Open>03Resolved Restart of elasticsearch on logstash1004 while a force merge is in progress took ~ 5 minutes. T... [14:28:17] !log ema@palladium conftool action : set/pooled=yes; selector: cp4015.ulsfo.wmnet [14:28:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:32:35] 06Operations, 13Patch-For-Review: Disable unprivileged user namespaces on trusty kernels - https://phabricator.wikimedia.org/T142567#2591089 (10MoritzMuehlenhoff) Done. (The labvirt* kernels will receive the change when they're rebooted for 4.4) [14:32:41] 06Operations, 13Patch-For-Review: Disable unprivileged user namespaces on trusty kernels - https://phabricator.wikimedia.org/T142567#2591090 (10MoritzMuehlenhoff) 05Open>03Resolved [14:34:44] (03PS1) 10Jcrespo: prometheus mysqld exporter: Add dbstore-eqiad hosts [puppet] - 10https://gerrit.wikimedia.org/r/307293 (https://phabricator.wikimedia.org/T126757) [14:44:35] (03PS2) 10Andrew Bogott: openstack: Delete old juno files from the repository [puppet] - 10https://gerrit.wikimedia.org/r/304751 (owner: 10Alex Monk) [14:46:28] (03CR) 10Andrew Bogott: [C: 032] openstack: Delete old juno files from the repository [puppet] - 10https://gerrit.wikimedia.org/r/304751 (owner: 10Alex Monk) [14:48:12] (03PS1) 10Jcrespo: prometheus mysqld exporter: disable labsdb1005 because "precise" [puppet] - 10https://gerrit.wikimedia.org/r/307298 (https://phabricator.wikimedia.org/T126757) [14:48:32] (03PS2) 10Jcrespo: prometheus: add misc eqiad hosts to mysqld exporter [puppet] - 10https://gerrit.wikimedia.org/r/307254 (https://phabricator.wikimedia.org/T126757) [14:51:31] 06Operations, 10DBA, 13Patch-For-Review: Decomission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#2591129 (10jcrespo) db1015, db1035 are starting to run low on available disk space. We should prioritize decomm'em soon. [14:52:27] (03CR) 10Jcrespo: [C: 032] prometheus: add misc eqiad hosts to mysqld exporter [puppet] - 10https://gerrit.wikimedia.org/r/307254 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [14:53:53] (03PS2) 10Jcrespo: prometheus mysqld exporter: Add dbstore-eqiad hosts [puppet] - 10https://gerrit.wikimedia.org/r/307293 (https://phabricator.wikimedia.org/T126757) [14:55:03] (03CR) 10Andrew Bogott: [C: 031] "I hate how the puppet compiler shows any hiera change as a 'change' even if it doesn't affect the host. But, this looks safe to me." [puppet] - 10https://gerrit.wikimedia.org/r/302835 (owner: 10Alex Monk) [14:55:05] (03PS3) 10Andrew Bogott: labnet: Merge site_address and network_public_ip in novaconfig [puppet] - 10https://gerrit.wikimedia.org/r/302835 (owner: 10Alex Monk) [14:55:11] godog: saw some backscroll in -releng. I'm around for a bit before meetings start if https://gerrit.wikimedia.org/r/#/c/307028/ is fine to go. [14:55:15] (03CR) 10Jcrespo: [C: 032] prometheus mysqld exporter: Add dbstore-eqiad hosts [puppet] - 10https://gerrit.wikimedia.org/r/307293 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [14:55:39] (03PS2) 10Jcrespo: prometheus mysqld exporter: disable labsdb1005 because "precise" [puppet] - 10https://gerrit.wikimedia.org/r/307298 (https://phabricator.wikimedia.org/T126757) [14:56:13] (03PS1) 10Alexandros Kosiaris: site.pp: Remove $ganglia_aggregator node scope variables [puppet] - 10https://gerrit.wikimedia.org/r/307301 [14:56:15] (03PS1) 10Alexandros Kosiaris: ganglia: Use ferm::service instead of ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307302 [14:56:17] (03PS1) 10Alexandros Kosiaris: ganglia: Make the ferm statements conditional [puppet] - 10https://gerrit.wikimedia.org/r/307303 [14:56:19] (03PS1) 10Alexandros Kosiaris: ganglia: Remove the Corp OIT LDAP mirror cluster [puppet] - 10https://gerrit.wikimedia.org/r/307304 [14:56:19] thcipriani: thanks! yeah I'll merge [14:56:21] (03PS1) 10Alexandros Kosiaris: ganglia: Define install2001 by FQDN, not IP [puppet] - 10https://gerrit.wikimedia.org/r/307305 [14:56:23] (03PS1) 10Alexandros Kosiaris: ganglia: Remove nickel remnant [puppet] - 10https://gerrit.wikimedia.org/r/307306 [14:56:29] (03PS2) 10Filippo Giunchedi: Bump scap version to 3.2.4-1 [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [14:56:40] godog: cool thanks :) [14:56:53] (03CR) 10Andrew Bogott: [C: 032] labnet: Merge site_address and network_public_ip in novaconfig [puppet] - 10https://gerrit.wikimedia.org/r/302835 (owner: 10Alex Monk) [14:56:58] (03PS4) 10Andrew Bogott: labnet: Merge site_address and network_public_ip in novaconfig [puppet] - 10https://gerrit.wikimedia.org/r/302835 (owner: 10Alex Monk) [14:58:53] (03CR) 10Jcrespo: [C: 032] prometheus mysqld exporter: disable labsdb1005 because "precise" [puppet] - 10https://gerrit.wikimedia.org/r/307298 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [14:59:38] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Bump scap version to 3.2.4-1 [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [14:59:43] (03PS3) 10Filippo Giunchedi: Bump scap version to 3.2.4-1 [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [14:59:49] (03PS3) 10Jcrespo: es2001-4: add node exporter to this standalones hosts [puppet] - 10https://gerrit.wikimedia.org/r/306939 (https://phabricator.wikimedia.org/T126757) [15:00:07] (03CR) 10Filippo Giunchedi: [V: 032] Bump scap version to 3.2.4-1 [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [15:03:30] 06Operations, 10Traffic: Better handling for one-hit-wonder objects - https://phabricator.wikimedia.org/T144187#2591156 (10BBlack) [15:03:55] (03PS2) 10Alexandros Kosiaris: logging::mediawiki: Remove redundant NRPE ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307278 [15:04:00] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] logging::mediawiki: Remove redundant NRPE ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307278 (owner: 10Alexandros Kosiaris) [15:04:17] (03PS3) 10Alexandros Kosiaris: nrpe: remove redundant ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307279 [15:04:20] (03PS5) 10Andrew Bogott: labnet: Merge site_address and network_public_ip in novaconfig [puppet] - 10https://gerrit.wikimedia.org/r/302835 (owner: 10Alex Monk) [15:04:22] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] nrpe: remove redundant ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307279 (owner: 10Alexandros Kosiaris) [15:04:31] 06Operations, 10Traffic: Better handling for one-hit-wonder objects - https://phabricator.wikimedia.org/T144187#2591183 (10BBlack) [15:04:57] (03PS1) 10RobH: Revert "robh on vacation next week, remove from paging" [puppet] - 10https://gerrit.wikimedia.org/r/307309 [15:05:04] (03PS2) 10RobH: Revert "robh on vacation next week, remove from paging" [puppet] - 10https://gerrit.wikimedia.org/r/307309 [15:06:20] Working on ORES deploy now [15:06:34] For our window this morning. [15:06:35] I'm in tin now [15:06:46] The current commit is 7aad8e9 [15:06:53] :-) [15:07:44] I've confirmed that get pull got me b8598dd and b8598dd is currently running on beta [15:09:09] Starting deployment to canary. [15:09:55] (03CR) 10RobH: [C: 032] Revert "robh on vacation next week, remove from paging" [puppet] - 10https://gerrit.wikimedia.org/r/307309 (owner: 10RobH) [15:10:24] (03PS1) 10Filippo Giunchedi: prometheus: add aggregation rules for ops [puppet] - 10https://gerrit.wikimedia.org/r/307310 [15:10:52] Logging into scb1002 to start test. [15:11:05] Looks like a curl to the test model works and reports the right version [15:11:29] halfak: log [15:11:37] Forgot thanks [15:12:08] (Also include phab number too) [15:12:16] have it handy? [15:12:16] T144101 [15:12:42] !log deploying ores b8598dd (see T144101) [15:12:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:15:07] stashbot: ping [15:15:22] it should log it in proper places [15:17:23] hashar: do you have link to the swat deploy handbook [15:17:24] ? [15:17:54] (03CR) 10EBernhardson: [C: 031] CirrusSearch BM25 A/B test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306721 (https://phabricator.wikimedia.org/T143586) (owner: 10DCausse) [15:18:16] Amir1: I got a mail with stashbot comment add to the task [15:18:30] I think there was something else beside https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment and https://wikitech.wikimedia.org/wiki/How_to_deploy_code [15:18:37] https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers [15:18:48] a guide written by th.cipriani [15:19:00] (with more contributions and additions by others) [15:19:10] OK. ORES deployment complete and my checks seem to suggest that we are good. [15:19:11] thanks, we are deploying the service and after that we will be deploying some config changes to wikimedia wiki [15:19:17] awesome [15:19:21] here starts my part [15:19:23] I'm going to declare victory and post an update to the mailing list. [15:19:49] I need to do the other deployment parts :) [15:21:31] 06Operations, 05Prometheus-metrics-monitoring: MySQL monitoring with prometheus - https://phabricator.wikimedia.org/T143896#2591210 (10jcrespo) [15:21:39] (03CR) 10Filippo Giunchedi: [C: 031] Upgrade to 0.1.12 upstream [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/307292 (owner: 10Gilles) [15:22:31] 06Operations, 10MediaWiki-JobQueue, 07Regression: Restore 30 minutes delayed list update to no waiting, to stop killing sandbox functionality - https://phabricator.wikimedia.org/T139893#2591211 (10Anomie) This has nothing to do with the API, ApiSandbox, or TemplateSandbox. The issue here is that a user's "ma... [15:22:57] Amir1: in SWAT, what we do to push config and which works well is: (1) CR +2 (2) on Tin in /srv/mediawiki-staging, get the merged change, for example with git fetch, git rebase origin/master (3) on mw1099, "scap pull", so config change is on staging (4) test with X-Wikimedia-Debug the change (5) scap sync-file it [15:23:25] The only remaining trap is if you modify InitialiseSettings.php and CommonSetting.php both in the same time. [15:23:32] One of your config change do that? [15:23:38] !log ladsgroup@terbium:~$ mwscript extensions/ORES/maintenance/CheckModelVersions.php on (fa|tr|pl|ru|pt|nl|wikidata|en)wiki [15:23:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:24:06] Dereckson: no, we only change InitialiseSettings.php [15:24:26] No trap in this case. [15:29:40] Celery isn't taking jobs on scb1002. [15:29:47] I was to restart the service manually [15:29:54] Amir1, what do you think? [15:30:09] Right now, 1001 is doing the whole load [15:30:18] let's restart to see if it works [15:30:24] Oh! I can't restart I don't have sudo [15:30:25] Hmm [15:30:39] akosiaris, can you restart the ores celery service on scb1002? [15:30:47] you can halfak [15:30:52] Oh... OK I'll try [15:31:11] * halfak tries to remember the service name [15:31:12] you don't have sudo but you for some commands (including this) you have the right using sudo [15:31:32] hmm, check model didn't add new versions to ores_model [15:31:39] didn't change it at all [15:31:53] Yes it did [15:32:05] I checked on the models. [15:32:08] (versions [15:32:41] I think there might be some caching in front of us [15:32:59] I see different version between a hard refresh of https://ores.wikimedia.org/v2/scores/enwiki/reverted/324231245 and https://ores.wikimedia.org/v2/scores/enwiki/reverted/324231245?features [15:33:01] no, I mean in the table for the ores review tool [15:33:49] !log restarted ores-celery-worker service on scb1002 [15:34:01] Looks like scb1002 has recovered and is taking jobs now [15:34:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:34:21] Amir1, overload errors are high. Maybe we need some limiting on these scripts. [15:34:52] Let's think about it later. [15:35:09] Right now, I need to find out why this thing doesn't work [15:35:15] (03PS1) 10BryanDavis: Log error and privnotice messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307315 (https://phabricator.wikimedia.org/T144189) [15:35:45] Looks like overload errors went way back down now that scb1002 is up [15:36:20] Overload is back at zero :) [15:38:35] Dereckson: I need to run main. scripts at terbium, right? [15:38:52] Something like this should work, "mwscript extensions/ORES/maintenance/CheckModelVersions.php --wiki=enwiki" ? [15:39:05] Amir1: right for Terbium [15:39:23] strange [15:39:58] akosiaris, it looks like there is a cache in front of ores.wikimedia.org that shouldn't be there. [15:40:14] E.g. we just deployed new models and that caused our internal cache to invalidate. [15:40:19] (03PS4) 10Jcrespo: es2001-4: add node exporter to this standalones hosts [puppet] - 10https://gerrit.wikimedia.org/r/306939 (https://phabricator.wikimedia.org/T126757) [15:40:24] But externally, we're seeing old scores and model versions. [15:40:36] varnish [15:40:42] Amir1: looks good to me the mwscript --wiki= [15:40:48] 06Operations, 10Traffic, 13Patch-For-Review: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181#2591288 (10BBlack) https://gerrit.wikimedia.org/r/#/c/306935/ probably should've linked here. This is a sort of temporary measure to start bugging users to upgrade... [15:40:52] varnish should not be getting in our way [15:41:04] This is breaking things at least a little bit. [15:41:31] akosiaris, compare model versions reported at https://ores.wikimedia.org/scores/enwiki/ ("0.1.1") vs. https://ores.wikimedia.org/scores/enwiki/?blah ("0.1.2" [15:41:41] yes, that's varnish [15:41:54] OK. We need to not have that :) [15:42:02] hehehe [15:42:05] curl "https://ores.wikimedia.org/scores/enwiki/" returns 0.1.1 [15:42:08] It's going to render the update scripts useless. [15:42:09] yeah, that's not really possible [15:42:12] in terbium [15:42:18] I mean not having varnish in front [15:42:31] akosiaris, oh. I mean either clearing the cache or turning it off. [15:42:38] But, of course, leaving varnish where it is. [15:43:03] I'm realizing that we are under-reporting our cache hit rate and our request rates then too :S [15:43:04] or sending the correct headers in the HTTP responses [15:43:12] Amir1: you could find useful to create a ~/logs folder on Terbium. So, you can run `script logs/.log`, then your maintenance scripts, then a logout to quit script [15:43:22] so that the cache age in varnish is set correctly and these problems don't show [15:43:29] benefit: you've already a file with the output when something is wrong [15:43:39] Dereckson: thansk [15:43:45] akosiaris, what does "correct" look like? [15:44:11] akosiaris: how we can invalidate varnish cache in terbium? [15:44:34] Amir1: what does terbium have to do with the varnish cache ? [15:44:44] Amir1: oh and of course, run them from inside a multiplexer, like tmux [15:45:00] (so if you're disconnected when it's still running you can reattach to it) [15:45:07] akosiaris: I'm trying to run a maintenance script there [15:45:09] Amir1: well, to answer your question, you can't. terbium is nothing special as far as varnish cache goes [15:45:36] Amir1: you can bypass it if you point your script to ores.svc.eqiad.wmnet [15:46:00] lemme find the port for ya [15:46:07] I'm afraid that's not possible. I need to change it in the extension and backport it [15:46:12] the port is 8081 [15:46:18] (03PS4) 10Madhuvishy: toollabs: Convert puppet clone of cdnjs to cron [puppet] - 10https://gerrit.wikimedia.org/r/306958 (https://phabricator.wikimedia.org/T143637) [15:46:32] Can we wait for some time? [15:46:45] Amir1: so the maintenance script uses http://ores.wikimedia.org/ ? [15:47:00] yes [15:47:06] then you can't bypass varnish [15:47:30] but internal jobs should try to not do that [15:47:43] that should be improved: an option in the script to set the address to use? or use an environment variable? [15:47:55] yeah [15:48:03] I can do it later [15:48:05] Dereckson: exactly. the former would be preferable [15:48:50] halfak: look at http://book.varnish-software.com/3.0/HTTP.html [15:49:11] I'm aiming to implement http://stackoverflow.com/questions/49547/making-sure-a-web-page-is-not-cached-across-all-browsers [15:49:33] the Cache related headers paragraph and below is useful to point out what headers need to be set in responses [15:50:32] so, how urgent is this ? [15:50:37] Ahh. I see that "Cache-Control: no-store" will be sufficient for varnish [15:51:04] yeah but it is not going to fix the current issue immediately [15:51:08] akosiaris, it is not actively causing problems, but it is preventing our deploy from continuing right now. We need to rebuild historical scores for ORES and they are hidden! [15:51:18] that's because the response is already cached [15:51:24] Historical scores are close to current scores, so I'm not *too* worried. [15:51:32] akosiaris, indeed. [15:51:52] Why don't you upgrade your script and backport it as a part of the deployment if it's something needed? [15:51:58] curl -H "Cache-Control: no-store" "https://ores.wikimedia.org/scores/enwiki/" [15:52:02] 06Operations, 10MediaWiki-extensions-CentralNotice, 10Traffic: Varnish-triggered CN campaign about browser security - https://phabricator.wikimedia.org/T144194#2591355 (10BBlack) [15:52:02] The maint script can solve this [15:52:04] still returns old data [15:52:23] cmjohnson1: hi. i'm trying to find out if resolving https://phabricator.wikimedia.org/T140419 is possible, or if i should just abandon hope and wait for us to update php/hhvm in a couple of years. [15:52:25] Amir1, curl https://ores.wikimedia.org/scores/enwiki/?blah [15:52:42] Amir1: you 've misunderstood. That's not a not an HTTP request header, it's an HTTP response header [15:53:00] oh, sorry, [15:53:20] halfak: I need to add that to the main script [15:53:38] (03PS2) 10BryanDavis: Log error and privnotice messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307315 (https://phabricator.wikimedia.org/T144189) [15:53:45] Amir1, I'd not make it "blah" but maybe the current timestamp. [15:53:56] Something that won't accidentally get matched :) [15:54:13] maybe I can issue a varnish "ban" for you [15:54:23] matmarex: I am going to defer that question to _joe_...he is more equipped to answer that question. [15:54:32] for a minute maybe? [15:55:10] (I also found another bug in check model script, will take care of it later) [15:56:58] akosiaris, would that allow us to quickly run these scripts and expect to get good data? [15:57:14] halfak: yes [15:57:27] it's kind of a tricky procedure though, will take some time [15:57:29] cool. I think that sounds good. Amir1, what do you think? [15:57:43] akosiaris: how long? [15:58:00] about 5-10 minutes [15:58:20] okay, please [15:58:28] I need someone with +o in here to help figure out why jouncebot can't join the channel. [15:58:49] jouncebot? [15:59:06] yeah. its our pythoin bot that announces deploys [15:59:09] do you have the irc output? [15:59:28] yes, T144189 [15:59:42] ffs is stashbot quieted/banned too? [16:00:03] Coukd someone have quieted the bot? [16:00:05] could [16:00:18] a +q wouldn't block it from entering [16:00:21] just for talking [16:00:36] Oh [16:01:01] it should be able to come… [16:01:02] I'm not seeing any notices about a block either, but I can try joining with an interactive client to see if the logging is just crap [16:01:06] * Platonides looks on #wikimedia-bans [16:01:40] bd808 would setting https://phabricator.wikimedia.org/diffusion/GJOU/browse/master/DefaultConfig.yaml;f14640fcdcd0115ab8be1505cf045ea2cb8c7b44$30 to true help us debug more? [16:01:57] 435 - #wikimedia-bans: ban $~a [by c!charitwo@wikimedia/charitwo, 46692 secs ago] [16:02:06] no… [16:02:10] the bout is identified… [16:02:14] *the bot [16:02:33] Amir1: halfak ok done. you should be good to go now [16:02:42] but I have to be in a meeting for the next hour. bbl [16:02:42] are we now requiring auth in all channels?! [16:02:44] awesome [16:03:01] Thanks akosiaris [16:03:02] bd808 i woulden thingso, do you mean password? [16:03:09] thingso = thinkso [16:03:28] Amir1, I'm in the revscoring call, but do what you need to do. I'll placate the managers if they show up :) [16:05:52] okay, the check model version is done now [16:06:57] now it entered [16:07:02] is it yours? [16:07:05] Platonides: I think maybe it's an ip block/ban of some sort? I just joined using the account manually [16:07:05] !log mwscript extensions/ORES/maintenance/PurgeScoreCache.php --wiki=enwiki (and seven other wikis) (T144101) [16:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:07:34] is it possible that it attempts to join before it is authenticated? [16:07:46] perhaps, actually [16:08:09] * Platonides removes a 98 days old ban [16:08:10] but this channel really shouldn't be auth only [16:08:17] try now [16:08:50] charitwo required it for all channels 15 hours ago [16:09:14] that's not cool IMO [16:09:34] hey look! [16:09:53] akosiaris, could you have a look at https://github.com/wiki-ai/ores/pull/165 ? [16:09:59] jouncebot: next [16:09:59] In 0 hour(s) and 50 minute(s): Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T1700) [16:10:01] (re. cache headers) [16:10:04] 06Operations, 10Wikimedia-Mailing-lists: deactivate maint-announce - https://phabricator.wikimedia.org/T143760#2591495 (10RobH) 05Resolved>03Open I am reopening this. Why are we using RT for maint-announce? The plan overall was to kill all mail relays into RT, and rely on the mailing list. The mailing l... [16:11:07] Platonides: can you add some notes about what you fixed/changed on https://phabricator.wikimedia.org/T144189 [16:11:44] 07Puppet, 06Labs, 10Phabricator: Update phabricator puppet role to support use on labs - https://phabricator.wikimedia.org/T144112#2591508 (10mmodell) related {T131899} [16:12:24] !log mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=fawiki (and seven other wikis) T144101 [16:12:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:13:12] (03CR) 10Eevans: [C: 031] cassandra: add ssl monitoring only for ssl-enabled hosts [puppet] - 10https://gerrit.wikimedia.org/r/307251 (https://phabricator.wikimedia.org/T120662) (owner: 10Filippo Giunchedi) [16:16:42] Platonides: who do I have to lobby to get the authentication requirement dropped from the shared ban list? That is really horribly unfriendly to casual irc users who may be trying to get help or report a problem [16:17:15] A few curses in broken spanish is worth enduring honestly [16:17:36] xD [16:18:17] bd808: /msg chanserv access #wikimedia-bans list [16:18:34] any of them can lift it [16:18:47] otherwise, curse charitwo for placing it and not removing [16:24:43] 06Operations, 06Community-Tech, 10wikidiff2, 13Patch-For-Review: Deploy new version of wikidiff2 package - https://phabricator.wikimedia.org/T140443#2591698 (10akosiaris) >>! In T140443#2590830, @MoritzMuehlenhoff wrote: > I just reimaged a jessie scaler and it fails to run puppet due to being able to find... [16:31:36] (03CR) 10Chad: [C: 031] "lgtm when you're ready to land it :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305675 (https://phabricator.wikimedia.org/T138778) (owner: 10Dduvall) [16:33:48] (03PS1) 10Rush: phab: ip bans for socket puppet accounts [puppet] - 10https://gerrit.wikimedia.org/r/307323 [16:33:50] (03PS2) 10Dduvall: beta: Configure storage cluster for migrated databases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305675 (https://phabricator.wikimedia.org/T138778) [16:34:59] (03PS2) 10Rush: phab: ip bans for sockpuppet accounts [puppet] - 10https://gerrit.wikimedia.org/r/307323 [16:36:44] (03CR) 10Rush: [C: 032] phab: ip bans for sockpuppet accounts [puppet] - 10https://gerrit.wikimedia.org/r/307323 (owner: 10Rush) [16:39:45] (03CR) 10Matěj Suchánek: "Abandon? T143100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304607 (https://phabricator.wikimedia.org/T143100) (owner: 10Addshore) [16:46:25] 06Operations, 06Labs, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#2591854 (10yuvipanda) I moved striker, and @bd808 reports it is all good. the instance with this issue in the shinken proje... [16:47:05] (03PS1) 10Cmjohnson: Adding users flemmerich and psinger to analytics-privatedata-users group. [puppet] - 10https://gerrit.wikimedia.org/r/307325 [16:54:31] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2591895 (10Cmjohnson) https://gerrit.wikimedia.org/r/#/c/307325/ [16:58:08] (03PS10) 10Paladox: Fix phabricator expanding links [puppet] - 10https://gerrit.wikimedia.org/r/306413 (https://phabricator.wikimedia.org/T75997) [17:00:04] gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T1700). [17:00:45] nothing planned for WDQS deployment. SMalyshev ping me if you have a last minute addition... [17:07:00] mark, why do not send an email to ops-private; do you have questions about this, then *only* schedule it if you get some feedback? [17:07:28] sorry, wrong window [17:07:50] jynus: because that's not the point [17:07:59] ok, ok [17:09:28] 06Operations, 06Community-Tech, 10wikidiff2, 13Patch-For-Review: Deploy new version of wikidiff2 package - https://phabricator.wikimedia.org/T140443#2591944 (10akosiaris) 1.4.1 has been uploaded on jessie-wikimedia as well. So the part that @MoritzMuehlenhoff mentions should be fixed. I 've rescheduled th... [17:10:59] _joe_: hi. i'm trying to find out if resolving https://phabricator.wikimedia.org/T140419 is possible, or if i should just abandon hope and wait for us to update php/hhvm in a couple of years. cmjohnson1 said to ask you. [17:11:44] MatmaRex: _joe_ is probably not gonna be available this week. [17:13:07] ugh. okay, thanks [17:13:20] MatmaRex: FWIW it is definitely NOT gonna be a couple of years [17:13:30] but that patch has not been in a release HHVM version yet [17:13:34] released* [17:13:53] hhvm makes frequent releases… [17:14:24] still, that patch seems to be only in master.. [17:14:30] hhvm makes a new version every 2 weeks [17:14:37] and a release every 8 weeks [17:15:18] (03CR) 10Madhuvishy: toollabs: Convert puppet clone of cdnjs to cron (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/306958 (https://phabricator.wikimedia.org/T143637) (owner: 10Madhuvishy) [17:15:44] (03PS5) 10Madhuvishy: toollabs: Convert puppet clone of cdnjs to cron [puppet] - 10https://gerrit.wikimedia.org/r/306958 (https://phabricator.wikimedia.org/T143637) [17:16:11] (03CR) 10Madhuvishy: [C: 032 V: 032] toollabs: Convert puppet clone of cdnjs to cron [puppet] - 10https://gerrit.wikimedia.org/r/306958 (https://phabricator.wikimedia.org/T143637) (owner: 10Madhuvishy) [17:17:07] akosiaris: just ran puppet on mw209[01] (just reimaged) and wikidiff2 works fine, thanks! [17:19:43] (03PS1) 10BryanDavis: Pause after identify [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307333 (https://phabricator.wikimedia.org/T144189) [17:19:58] elukey: I 'll try to find some time to help with the upgrades [17:20:05] (03CR) 10jenkins-bot: [V: 04-1] Pause after identify [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307333 (https://phabricator.wikimedia.org/T144189) (owner: 10BryanDavis) [17:21:14] (03PS2) 10BryanDavis: Pause after identify [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307333 (https://phabricator.wikimedia.org/T144189) [17:21:28] akosiaris: thanks! volans promised to write a complete automation that will reimage 240 servers in half a day :P [17:21:50] lol [17:23:38] * volans hides [17:23:48] (03PS1) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) [17:24:18] <_joe_> MatmaRex: I'll take a look when I'm not busy swimming in the sea, that is in a week [17:24:23] <_joe_> but please ping me again [17:24:52] (03PS2) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) [17:25:51] (03PS3) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) [17:26:52] (03PS4) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) [17:26:55] (03CR) 10jenkins-bot: [V: 04-1] Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) (owner: 10Paladox) [17:27:58] jouncebot next [17:27:58] In 0 hour(s) and 32 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T1800) [17:28:09] (03PS2) 10Chad: Multiversion: delete deleteMediaWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306462 [17:29:21] (03CR) 10Chad: [C: 032] Multiversion: delete deleteMediaWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306462 (owner: 10Chad) [17:29:47] (03Merged) 10jenkins-bot: Multiversion: delete deleteMediaWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306462 (owner: 10Chad) [17:31:18] !log demon@tin Synchronized multiversion/: delete deleteMediawiki (duration: 01m 09s) [17:31:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:38:00] (03PS1) 10Madhuvishy: toollabs: Remove puppet dependencies on git clone cdnjs [puppet] - 10https://gerrit.wikimedia.org/r/307337 (https://phabricator.wikimedia.org/T134896) [17:39:56] (03CR) 10Madhuvishy: [C: 032] toollabs: Remove puppet dependencies on git clone cdnjs [puppet] - 10https://gerrit.wikimedia.org/r/307337 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [17:40:05] (03CR) 10Madhuvishy: [V: 032] toollabs: Remove puppet dependencies on git clone cdnjs [puppet] - 10https://gerrit.wikimedia.org/r/307337 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [17:44:09] (03CR) 10Mobrovac: "How would one manage multiple deploy groups for the same target service with this patch? Each service can (and does) have at least two - c" [puppet] - 10https://gerrit.wikimedia.org/r/306431 (owner: 10Giuseppe Lavagetto) [17:46:25] (03CR) 10Dzahn: "I _think_ we want to avoid more "if $realm" checks and use only Hiera for this." [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) (owner: 10Paladox) [17:49:37] (03PS4) 10Jforrester: On public wikis, show "Publish" rather than "Save" on edit pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) [17:52:25] (03CR) 10Jforrester: On public wikis, show "Publish" rather than "Save" on edit pages (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [17:55:53] (03CR) 1020after4: [C: 031] Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) (owner: 10Paladox) [17:56:14] (03PS1) 10Madhuvishy: toollabs: Set timeout 0 on cdnjs git clone exec [puppet] - 10https://gerrit.wikimedia.org/r/307343 (https://phabricator.wikimedia.org/T134896) [17:56:37] (03CR) 10Paladox: "@dzahn I'm not sure how to do this. But this way it should not allow you to use the config if your in labs since it is unneeded." [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) (owner: 10Paladox) [17:57:54] (03CR) 10Madhuvishy: [C: 032 V: 032] toollabs: Set timeout 0 on cdnjs git clone exec [puppet] - 10https://gerrit.wikimedia.org/r/307343 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [18:00:05] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T1800). Please do the needful. [18:00:05] Niharika and ebernhardson: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [18:00:14] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations, 10Research-management: Request access to data for WDQS research - https://phabricator.wikimedia.org/T142780#2592128 (10leila) [18:00:23] \o [18:00:30] o/ [18:00:33] (03PS5) 10Jcrespo: es2001-4: add node exporter to this standalones hosts [puppet] - 10https://gerrit.wikimedia.org/r/306939 (https://phabricator.wikimedia.org/T126757) [18:01:06] heh, Morning SWAT totally sneaks up on me now. I can SWAT today. [18:01:10] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations, 10Research-management: Request access to data for WDQS research - https://phabricator.wikimedia.org/T142780#2546133 (10leila) Thanks @AlexKrauseTUD. @Cmjohnson can you help with this request as well? [18:01:41] (03CR) 10Cmjohnson: [C: 032] Adding users flemmerich and psinger to analytics-privatedata-users group. [puppet] - 10https://gerrit.wikimedia.org/r/307325 (owner: 10Cmjohnson) [18:02:11] (03PS2) 10Cmjohnson: Adding users flemmerich and psinger to analytics-privatedata-users group. [puppet] - 10https://gerrit.wikimedia.org/r/307325 [18:02:15] (03PS2) 10Thcipriani: Switch enwiki to uca-default collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307248 (https://phabricator.wikimedia.org/T136150) (owner: 10Niharika29) [18:02:24] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307248 (https://phabricator.wikimedia.org/T136150) (owner: 10Niharika29) [18:02:46] jynus: feel free to add me to the prometheus/mysql code reviews to add hosts btw! I can + a aa+1 if needed [18:02:55] (03Merged) 10jenkins-bot: Switch enwiki to uca-default collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307248 (https://phabricator.wikimedia.org/T136150) (owner: 10Niharika29) [18:03:09] well, I stopped doing it to avoid spam [18:03:16] (03Abandoned) 10Gehel: CirrusSearch: disable saneitizer cron job [puppet] - 10https://gerrit.wikimedia.org/r/306639 (https://phabricator.wikimedia.org/T143862) (owner: 10Gehel) [18:04:06] (03CR) 10MaxSem: [C: 031] maps - grant privileges on sequences to all known users [puppet] - 10https://gerrit.wikimedia.org/r/306728 (owner: 10Gehel) [18:04:37] (03PS3) 10Gehel: maps - grant privileges on sequences to all known users [puppet] - 10https://gerrit.wikimedia.org/r/306728 [18:04:42] Niharika: your patch is live on mw1099, anything you'd like to test there before it goes out everywhere? [18:05:43] thcipriani: Sorry, but how do I test it out on mw1099? I want to make sure Category pages load okay. [18:05:51] kaldari: ^ [18:06:02] Niharika, thcipriani: We can check to make sure the category pages aren't blowing up at least... [18:06:03] Niharika: do you have the browser extension? [18:06:06] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2592175 (10Cmjohnson) 05Open>03Resolved Patchset has been merged for both users. You should have access at this time. It may take up to... [18:06:14] https://wikitech.wikimedia.org/wiki/Debugging_in_production [18:06:16] Niharika: ^ [18:06:22] 06Operations, 06Labs, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#2592177 (10yuvipanda) servermon is done. I'll do the analytics ones next. [18:06:32] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2592182 (10Cmjohnson) [18:06:33] oop,s wrong one: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [18:06:34] Niharika: with the X-Wikimedia-debug browser extension. I'll check it real quick... [18:06:34] 06Operations, 10Ops-Access-Requests: Requesting access to the statistics host(s) for flemmerich - https://phabricator.wikimedia.org/T143881#2592180 (10Cmjohnson) 05Open>03Resolved Resolved with T143718 [18:06:37] 06Operations, 10MediaWiki-JobQueue, 07Regression: Restore 30 minutes delayed list update to no waiting, to stop killing sandbox functionality - https://phabricator.wikimedia.org/T139893#2592183 (10Aklapper) Thanks for explaining, @anomie! [18:06:40] Niharika: you can use the X-Wikimedia-Debug header or there's a crhome plugin, see: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [18:06:53] greg-g: I'm a slow typist :P [18:06:55] (03CR) 10Gehel: [C: 032] maps - grant privileges on sequences to all known users [puppet] - 10https://gerrit.wikimedia.org/r/306728 (owner: 10Gehel) [18:07:19] thcipriani: I don't use words, just urls (mostly) [18:08:00] wiki page or it didn't happen [18:08:52] Niharika, thcipriani: category pages still seem to be loading fine when using mw1099. [18:09:17] kaldari: great, thanks for checking :) [18:09:38] Niharika: or kaldari are you setup to run the updateCollation script on terbium? [18:09:42] thcipriani, Niharika: We can't test the meat of the change however until it's synced since it mainly affects a maintanence script [18:09:55] indeed [18:09:56] I can run it. [18:10:04] okie doke, let me sync it live [18:10:49] (03PS5) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) [18:10:56] (03PS6) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) [18:11:49] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:307248|Switch enwiki to uca-default collation (T136150)]] (duration: 00m 47s) [18:11:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:11:58] ^ Niharika change is now live [18:12:12] thcipriani: Should I run the script now? [18:12:23] Niharika, thcipriani: looks good so far: https://en.wikipedia.org/wiki/Category:1983_songs [18:12:29] (03CR) 10Jcrespo: [C: 032] es2001-4: add node exporter to this standalones hosts [puppet] - 10https://gerrit.wikimedia.org/r/306939 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [18:12:34] (03PS6) 10Jcrespo: es2001-4: add node exporter to this standalones hosts [puppet] - 10https://gerrit.wikimedia.org/r/306939 (https://phabricator.wikimedia.org/T126757) [18:13:07] Niharika: yeah, you can run it now. Do you have a screen session started? [18:13:18] (it'll take a while to finish) [18:13:31] kaldari: I just started one. [18:13:54] And it gave me an intro box and then kinda cleared up the terminal history. [18:14:10] Niharika: Cool. You can go ahead and start it. [18:14:23] Niharika: It should show you progress as it rebuilds things [18:14:31] kaldari: It does. [18:15:26] cool :) Thanks kaldari and Niharika [18:15:35] Thank thcipriani! [18:15:39] Thanks* [18:15:40] (03PS4) 10Thcipriani: CirrusSearch BM25 A/B test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306721 (https://phabricator.wikimedia.org/T143586) (owner: 10DCausse) [18:15:43] Niharika: Did it tell you how many rows will be updated? I bet it's a lot :) [18:16:00] kaldari: It didn't. It [18:16:03] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306721 (https://phabricator.wikimedia.org/T143586) (owner: 10DCausse) [18:16:08] It's at 8500 now. [18:16:30] (03Merged) 10jenkins-bot: CirrusSearch BM25 A/B test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306721 (https://phabricator.wikimedia.org/T143586) (owner: 10DCausse) [18:17:54] ebernhardson: CirrusSearch BM25 A/B test config is live on mw1099 anything to test there? [18:17:55] (03PS2) 10Gehel: elasticsearch - check shards via the service, not via each individual node [puppet] - 10https://gerrit.wikimedia.org/r/305519 (https://phabricator.wikimedia.org/T133844) [18:18:40] thcipriani: yea, sec [18:18:45] ack [18:18:58] 07Puppet, 06Labs, 10Phabricator, 13Patch-For-Review: Update phabricator puppet role to support use on labs - https://phabricator.wikimedia.org/T144112#2592218 (10Paladox) [18:19:01] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2592220 (10Paladox) [18:19:29] 07Puppet, 06Labs, 10Phabricator, 13Patch-For-Review: Update phabricator puppet role to support use on labs - https://phabricator.wikimedia.org/T144112#2588887 (10Paladox) We will fix production role first and after that remove the labs role once production role works in labs :) [18:21:09] thcipriani: seems sane, no errors logged to logstash about it, seems good to go [18:21:21] ebernhardson: kk, going live everywhere [18:21:48] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2592239 (10leila) Thanks, @Cmjohnson for your help. [18:23:20] !log thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT: [[gerrit:306721|CirrusSearch BM25 A/B test config (T143586)]] (duration: 00m 46s) [18:23:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:23:28] ^ ebernhardson live everywhere [18:23:33] 06Operations, 06Labs, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#2592244 (10yuvipanda) The analytics project doesn't actually seem to have any! These were just stale LDAP entries for insta... [18:24:13] thcipriani: looks to work there too, thanks! [18:25:23] ebernhardson: starting to see a few errors trickle in: https://logstash.wikimedia.org/goto/61503c22a1cee6fc9eb1ef6937cae90b [18:26:00] thcipriani: doh, i know what that would be. sec [18:26:01] Fatal error: Uncaught exception 'MWException' with message 'Variable 'wgCirrusSearchPageViewsW' is not set.' in /srv/mediawiki/php-1.28.0-wmf.16/maintenance/getConfiguration.php:105 [18:26:15] * ebernhardson thinks that function shouldn't fatal on it... [18:27:28] thcipriani: lets revert while i get a fix put together [18:27:35] ebernhardson: kk, doing [18:29:08] thcipriani: actually, its easier than i thought no need to revert, just one sec [18:29:09] !log thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: REVERT SWAT: [[gerrit:306721|CirrusSearch BM25 A/B test config (T143586)]] (duration: 00m 48s) [18:29:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:29:16] ok :) [18:29:24] ebernhardson: I'm quick on the revert :) [18:30:26] (03PS1) 10Thcipriani: Revert "CirrusSearch BM25 A/B test config" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307348 [18:31:09] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307348 (owner: 10Thcipriani) [18:31:14] (03PS1) 10EBernhardson: Define variables for AB test for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307349 [18:31:30] thcipriani: ^^ along with the original patch should make it work right [18:31:36] (03Merged) 10jenkins-bot: Revert "CirrusSearch BM25 A/B test config" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307348 (owner: 10Thcipriani) [18:32:50] !log make greg a phab admin to fight surge of spam bots [18:32:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:33:03] ebernhardson: kk, so I can revert the revert, merge that, sync cirrussearch-common, sound right? [18:33:07] it didn't occur to me to test something in a foreign language to trigger our language detection code [18:33:10] thcipriani: right [18:33:19] kk, doing [18:33:19] thcipriani: if you sync it all to mw1099 i can trigger the language detection and make sure it doesn't bail [18:33:27] cool, will do [18:34:44] (03PS1) 10Thcipriani: Revert "Revert "CirrusSearch BM25 A/B test config"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307350 [18:35:03] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307350 (owner: 10Thcipriani) [18:35:29] (03Merged) 10jenkins-bot: Revert "Revert "CirrusSearch BM25 A/B test config"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307350 (owner: 10Thcipriani) [18:36:13] (03PS2) 10Thcipriani: Define variables for AB test for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307349 (owner: 10EBernhardson) [18:36:31] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307349 (owner: 10EBernhardson) [18:37:08] (03Merged) 10jenkins-bot: Define variables for AB test for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307349 (owner: 10EBernhardson) [18:37:47] ebernhardson: live on mw1099 [18:39:58] thcipriani: looks good, language detection is returning foreign results and not logging any problems [18:40:21] ebernhardson: nice, going live everywhere [18:42:48] !log thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT: [[gerrit:307349|Define variables for AB test for all wikis]] and [[gerrit:307350|Revert "Revert "CirrusSearch BM25 A/B test config""]] (duration: 00m 47s) [18:42:52] ^ ebernhardson live everywhere [18:42:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:44:06] doesn't look to be any errors this time around [18:44:49] ebernhardson: yup, error log looks fine. thanks for the quick fix! [18:48:53] (03PS6) 10Gehel: elasticsearch - cleanup roles [puppet] - 10https://gerrit.wikimedia.org/r/304067 [18:49:16] (03CR) 10Gehel: "rebase" [puppet] - 10https://gerrit.wikimedia.org/r/304067 (owner: 10Gehel) [18:56:33] thcipriani: if you are done with swat I'm going to take stashbot offline for a bit and see if I can get its account registration sorted out [18:56:59] jouncebot: next [18:57:00] In 0 hour(s) and 3 minute(s): Grants Review app update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T1900) [18:57:17] actually I'll wait until after I help Niharika [18:57:48] bd808: We can do it now. [18:58:15] bd808: yup, done [18:58:52] (03PS7) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) [18:59:41] (03CR) 1020after4: Disable $phabricator_active_server in labs since it is uneeded in labs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) (owner: 10Paladox) [19:00:04] bd808 and Niharika: Respected human, time to deploy Grants Review app update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T1900). Please do the needful. [19:00:04] Niharika: A patch you scheduled for Grants Review app update is about to be deployed. Please be available during the process. [19:00:17] (03CR) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) (owner: 10Paladox) [19:00:24] (03PS8) 10Paladox: Disable $phabricator_active_server in labs since it is uneeded in labs [puppet] - 10https://gerrit.wikimedia.org/r/307335 (https://phabricator.wikimedia.org/T144112) [19:03:55] (03PS5) 10Jforrester: On public wikis, show "Publish" rather than "Save" on edit pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) [19:08:46] trying to remember trebuchet commands well enough to teach them to someone else is making my head hurt [19:09:09] * bd808 probably relies on bash history for too many things [19:14:48] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2592521 (10Paladox) 05duplicate>03Open Re opening for now since the main role will take a while to fix. [19:15:41] (03PS1) 10Chad: Phab: Remove config abstraction. Useless & confusing [puppet] - 10https://gerrit.wikimedia.org/r/307354 [19:15:47] A question, is there a way to run master sql queries against mediawiki dbs? I want to run a maintenance script but it's broken so I probably need to backport the fix [19:16:41] Amir1: use maintenance/eval.php and $dbw->query() ? [19:17:02] (03PS2) 10Chad: Phab: Remove config abstraction. Useless & confusing [puppet] - 10https://gerrit.wikimedia.org/r/307354 (https://phabricator.wikimedia.org/T144112) [19:17:08] hmm, If Ops are okay, I'm okay [19:18:30] (03CR) 10Paladox: [C: 031] Phab: Remove config abstraction. Useless & confusing [puppet] - 10https://gerrit.wikimedia.org/r/307354 (https://phabricator.wikimedia.org/T144112) (owner: 10Chad) [19:19:53] brion, can I delete the labs project 'embed-sandbox'? Current members are you, me, yuvi. [19:20:28] Amir1: `sql --write foowiki` [19:20:50] Errr, maybe I misread.... [19:21:48] (03CR) 10Chad: "Hmm not identical... https://puppet-compiler.wmflabs.org/3875/" [puppet] - 10https://gerrit.wikimedia.org/r/307354 (https://phabricator.wikimedia.org/T144112) (owner: 10Chad) [19:22:40] ostriches: that can work too [19:26:55] !log Updated iegreview to 29e98bb (Show all users in 'Manage Users' list (blocked and unblocked both)) [19:27:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:27:43] ^ Niharika's first prod deploy :) [19:28:12] Woohoo. \o/ [19:29:08] :) [19:29:13] (03PS1) 10Paladox: Move labs.pp and main.pp to /modules/phabricator/manifests [puppet] - 10https://gerrit.wikimedia.org/r/307357 (https://phabricator.wikimedia.org/T144112) [19:29:59] (03CR) 10jenkins-bot: [V: 04-1] Move labs.pp and main.pp to /modules/phabricator/manifests [puppet] - 10https://gerrit.wikimedia.org/r/307357 (https://phabricator.wikimedia.org/T144112) (owner: 10Paladox) [19:30:59] (03PS2) 10Paladox: Move labs.pp and main.pp to /modules/phabricator/manifests [puppet] - 10https://gerrit.wikimedia.org/r/307357 (https://phabricator.wikimedia.org/T144112) [19:32:31] (03Abandoned) 10Paladox: Move labs.pp and main.pp to /modules/phabricator/manifests [puppet] - 10https://gerrit.wikimedia.org/r/307357 (https://phabricator.wikimedia.org/T144112) (owner: 10Paladox) [19:40:21] (03PS3) 10ArielGlenn: abstract out code for adds/changes dumps generation, for general library [dumps] - 10https://gerrit.wikimedia.org/r/307257 (https://phabricator.wikimedia.org/T133547) [19:40:36] (03CR) 10jenkins-bot: [V: 04-1] abstract out code for adds/changes dumps generation, for general library [dumps] - 10https://gerrit.wikimedia.org/r/307257 (https://phabricator.wikimedia.org/T133547) (owner: 10ArielGlenn) [19:41:21] and too bad, I'm done for the night. [19:41:31] * apergos uses gerrit for code backups yet again [19:41:44] !log Taking stashbot offline (hopefully briefly) [19:41:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:42:20] apergos: your stash is gone ^ [19:42:33] ho ho [19:42:39] :) [19:44:10] LOL https://9to5mac.com/2016/08/29/reddit-jailbreak-dribbble-client-app-store/ [19:44:38] (03PS4) 10ArielGlenn: abstract out code for adds/changes dumps generation, for general library [dumps] - 10https://gerrit.wikimedia.org/r/307257 (https://phabricator.wikimedia.org/T133547) [19:44:47] luckly i am running ios 10 public beta :) [19:44:48] ok I lied. ocd and stuff. now I'm done for the night [19:44:59] lol [19:45:11] Do you really have ocd apergos? [19:45:16] well [19:45:18] no diagnosis [19:45:21] so who can say [19:45:33] paladox: I have iOS 10 public beta on my phone [19:45:33] Oh [19:45:33] but stuff like that just gets on my nerves and I will sit and stew [19:45:38] :) [19:45:40] so better to just fix and forget :-D [19:45:41] Works great! [19:45:44] Yep [19:45:59] Bsadowski1 i find favourite sites not working on the ipad [19:46:12] Oh, I don't have a compatible iPad. [19:46:14] Not sure if it is just broken on the ipad pro or all of them [19:46:16] Oh [19:46:31] I have three ipads, ipad 2, ipad 4, and the ipad pro 12.9inch [19:46:32] brb [19:47:05] it seems absurd that we live in a world where 'jailbreak your device' is a thing [19:47:27] ^ [19:48:28] Testing stashbot: T144189 [19:48:28] T144189: Jouncebot not joining #wikimedia-operations - https://phabricator.wikimedia.org/T144189 [19:48:36] w00t [19:50:05] \o/ [19:50:06] (03CR) 10Platonides: [C: 031] "Old "solution" to race condition, but… not broken enough +1" [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307333 (https://phabricator.wikimedia.org/T144189) (owner: 10BryanDavis) [19:50:07] andrewbogott: if necessary yes, I can recreate it as a more modern system . It's one of my occasional experiments [19:51:19] im back [19:51:43] apergos lol, you can get viruses by jailbreaking [19:51:55] i would never ever jailbreak my iphone or my ipads [19:52:38] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch: Make elasticsearch actually uses shard allocation awareness - https://phabricator.wikimedia.org/T143571#2592635 (10debt) a:05debt>03Gehel Thanks! [19:52:56] you can get viruses by pratically everything you do on a data-enabled phone [19:53:17] I'd much rather jailbreak and be able to shut things off and know they are off (for example) [19:55:08] apergos: you would, but 99% of the world doesn't (think of the uncle with 15 toolbars) [19:56:26] probably the uncle will get the phone jailbreaked automatically by some malware [19:56:36] as opposing to jailbreaking it himself [19:56:58] Hm. I guess it is a form of security-by-obscurity, yes. [19:57:42] actually there will be an anti-surveillance security kit that will distribute itself via jailbreaking a phone and sending itself to all the contacts [19:57:44] LOL [19:58:01] in a matter of days the NSA will be frozen out of all phones :-P [19:58:13] except for those zero-day vulnerabilities they keep buying... [19:58:15] apergos what do you mean? [19:58:23] well that's how it would work [19:58:39] you want to make sure everyone has good security, you can't rely on them to do it [19:58:46] so you do it for them :-P [19:58:50] :P [19:58:52] Did you hear https://9to5mac.com/2016/08/22/nsa-hack-iphone-master-key-apple-fbi/ [19:58:55] yeah [19:58:58] indeed [19:59:02] :) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T2000). [20:00:28] I've got some changes to push out for striker [20:02:01] !log starting parsoid deploy [20:02:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:04:35] !log synced new parsoid code; restarted parsoid on wtp1001 as a canary [20:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:10:19] !log finished deploying parsoid sha 48cf803e [20:10:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:11:00] !log starting striker deploy [20:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:12:37] !log Updated striker to ffe13c1; see https://wikitech.wikimedia.org/wiki/Toolsadmin.wikimedia.org/Deployments#2016-08-29 [20:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:16:59] 06Operations, 06Labs, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#2592701 (10yuvipanda) did toolsbeta, or at least the toolsbeta instances that are still sshable. Many didn't come back up a... [20:18:16] page on phone but nothing in here? [20:18:33] db1083 has problems [20:19:49] jynus: if you need anything I'm around [20:19:57] jynus: same here [20:20:12] this is probably software related [20:21:09] it would be nice to have someone from releng here, too [20:21:35] I'll third that but obv not a releng person :) [20:21:48] what do you need? [20:22:14] any recent update to code to enwiki? [20:22:15] what is "it" [20:22:18] btw where is icinga bot? [20:22:19] no [20:22:56] actually, https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T1800 [20:23:03] [config] 307248 Switch enwiki to uca-default collation [20:23:06] multiple 15-minute queries [20:23:19] from showIndirectLinks [20:23:43] using a horrible subquery [20:23:56] with temporary tables [20:24:01] If the uca-default collation is causing it, we can shoot the script easily enough [20:24:19] it is pagelinks, not categorylinks [20:24:46] Looks like someone is reindexing enwiki in Cirrus too? [20:24:46] jynus: no changes to that code since may afaict [20:24:55] interesting [20:25:04] ostriches: yes, but that shouldn't really be hitting the db it's an in-place reindex [20:25:10] but it is not a server issue, because only that code is failing [20:25:17] no other queries [20:25:20] ebernhardson: Just spelunking :) [20:25:29] ostriches: :) [20:26:59] well, I would start by disabling that speciall page [20:27:18] Easy enough [20:27:18] well, I will start by killing those queries actually [20:27:25] Well, perhaps.... [20:27:27] then disabling that, then debug [20:27:28] kaldari, Niharika ^ [20:27:37] ostriches: whatlinkshere is pretty commonly used, might just disable the indirect links? [20:27:43] I'm curious if something's hitting the API for it. [20:27:43] yes [20:27:47] not the whole thing [20:28:01] only the specific feature, if possible [20:28:02] ebernhardson: I prefer 800lb sledgehammers, but you're right :p [20:28:08] :) [20:29:54] What an ugly fucking class. [20:29:56] * ostriches fumes [20:30:37] (03PS6) 10Andrew Bogott: labnet: Merge site_address and network_public_ip in novaconfig [puppet] - 10https://gerrit.wikimedia.org/r/302835 (owner: 10Alex Monk) [20:30:41] volans: :(( icinga-wm seems to not be authed to freenode and a really lame voice ban is happening for all non-authed accounts thanks to a block added to #wikimedia-bans last night [20:30:51] * paladox guessing people who have seen the old willy wonkers films http://news.sky.com/story/willy-wonka-actor-gene-wilder-dies-aged-83-10557604 [20:31:00] paladox: Not the time. [20:31:07] ok [20:31:19] bd808: anything we can do about it? do we have a tracking task? [20:31:34] ostriches: seems par for the course on old special pages :P [20:32:00] Short of lowering MaxRedirectLinksRetrieved to something insanely low, I don't see how we can nerf this thing [20:32:31] bd808: it was removed already [20:32:44] ebernhardson: It's at 500 rn. [20:33:11] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [20:33:32] !log restarted ircecho on neon.wikimedia.org (icinga-wm) [20:33:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:33:42] ebernhardson: Thoughts? [20:33:43] at least that part seems to work [20:33:53] yeah, thanks akosiaris [20:34:10] ostriches: i'm looking at it but not sure :S [20:34:26] ostriches: tbh that query is not what i had expected at all from special:whatlinkshere ... [20:35:29] (03PS22) 10Andrew Bogott: Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) [20:36:36] ebernhardson: ya me either [20:36:50] (03CR) 10Alexandros Kosiaris: [C: 04-1] "conftool-data/nodes/{eqiad,codfw}.yaml will also need to be updated. That can happen in a different commit but this one will work as well" [puppet] - 10https://gerrit.wikimedia.org/r/306672 (https://phabricator.wikimedia.org/T126785) (owner: 10Filippo Giunchedi) [20:37:03] (03CR) 10Andrew Bogott: [C: 032] Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott) [20:40:15] PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: puppet fail [20:40:33] I suppose showIndirectLinks would be owned by the core mediawiki team? [20:41:46] if there was one, probably [20:42:04] I leave it there https://phabricator.wikimedia.org/T144235 [20:42:51] bd808: groan *sigh* [20:44:00] let's add a watchdog there [20:44:48] (03PS1) 10Rush: openstack: remove old OpenDJ log file parser [puppet] - 10https://gerrit.wikimedia.org/r/307423 [20:44:56] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 725 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4967040 keys - replication_delay is 725 [20:44:59] I agree, jynus [20:46:04] the problems is tring to fix anything like this at db level is stupid, by the time queries arrive to the db it is too late to do anything about them [20:48:59] (03PS6) 10Jforrester: On public wikis, show "Publish" rather than "Save" on edit pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) [20:49:50] (03PS1) 10Andrew Bogott: Install openstack::horizon::puppetpanel on labtestweb2001 [puppet] - 10https://gerrit.wikimedia.org/r/307424 [20:51:06] (03CR) 10jenkins-bot: [V: 04-1] Install openstack::horizon::puppetpanel on labtestweb2001 [puppet] - 10https://gerrit.wikimedia.org/r/307424 (owner: 10Andrew Bogott) [20:52:41] (03PS2) 10Andrew Bogott: Install openstack::horizon::puppetpanel on labtestweb2001 [puppet] - 10https://gerrit.wikimedia.org/r/307424 [20:52:45] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4957407 keys - replication_delay is 0 [20:54:26] (03CR) 10Andrew Bogott: [C: 032] Install openstack::horizon::puppetpanel on labtestweb2001 [puppet] - 10https://gerrit.wikimedia.org/r/307424 (owner: 10Andrew Bogott) [20:54:28] (03CR) 10Alexandros Kosiaris: [C: 031] Use DB_LOG_AUTOREMOVE for openldap database [puppet] - 10https://gerrit.wikimedia.org/r/305992 (https://phabricator.wikimedia.org/T143302) (owner: 10Muehlenhoff) [20:56:15] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:00:04] dapatrick and bawolff: Respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T2100). Please do the needful. [21:06:35] RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:11:41] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1007-b.eqiad.wmnet [21:11:43] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [21:11:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:23:23] (03PS2) 10Alexandros Kosiaris: ganglia: Define install2001 by FQDN, not IP [puppet] - 10https://gerrit.wikimedia.org/r/307305 [21:23:25] (03PS2) 10Alexandros Kosiaris: ganglia: Remove the Corp OIT LDAP mirror cluster [puppet] - 10https://gerrit.wikimedia.org/r/307304 [21:23:27] (03PS2) 10Alexandros Kosiaris: ganglia: Remove nickel remnant [puppet] - 10https://gerrit.wikimedia.org/r/307306 [21:23:29] (03PS2) 10Alexandros Kosiaris: ganglia: Make the ferm statements conditional [puppet] - 10https://gerrit.wikimedia.org/r/307303 [21:23:31] (03PS2) 10Alexandros Kosiaris: ganglia: Use ferm::service instead of ferm::rule [puppet] - 10https://gerrit.wikimedia.org/r/307302 [21:29:02] !log T143226: Stopping restbase1015-a.eqiad.wmnet to remove repairedAt attribute on a table [21:29:04] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [21:29:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:31:34] !log T143226: Starting restbase1015-a.eqiad.wmnet [21:31:35] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [21:31:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:32:47] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1015-a.eqiad.wmnet [21:32:48] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [21:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:35:20] (03PS1) 10Rush: rabbitmq: add rabbitmqadmin for control via mgmt plugin [puppet] - 10https://gerrit.wikimedia.org/r/307428 [21:36:37] (03CR) 10Rush: [C: 032] openstack: remove old OpenDJ log file parser [puppet] - 10https://gerrit.wikimedia.org/r/307423 (owner: 10Rush) [21:36:44] (03PS2) 10Rush: openstack: remove old OpenDJ log file parser [puppet] - 10https://gerrit.wikimedia.org/r/307423 [21:36:46] (03CR) 10Rush: [V: 032] openstack: remove old OpenDJ log file parser [puppet] - 10https://gerrit.wikimedia.org/r/307423 (owner: 10Rush) [21:38:17] (03PS2) 10Rush: rabbitmq: add rabbitmqadmin for control via mgmt plugin [puppet] - 10https://gerrit.wikimedia.org/r/307428 [21:39:18] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stat1003, stat1002 and fluorine for chelsyx - https://phabricator.wikimedia.org/T142648#2542048 (10Ottomata) For posterity: I just gave Chelsea a Hue account. [21:41:01] (03CR) 10Rush: [C: 032] rabbitmq: add rabbitmqadmin for control via mgmt plugin [puppet] - 10https://gerrit.wikimedia.org/r/307428 (owner: 10Rush) [21:50:16] (03CR) 10Catrope: [C: 032] On public wikis, show "Publish" rather than "Save" on edit pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [21:50:47] (03Merged) 10jenkins-bot: On public wikis, show "Publish" rather than "Save" on edit pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [21:54:38] RoanKattouw: LGTM on 1099. [21:55:00] OK, pushing [21:55:34] greg-g: Also, the tab completion for scap is broken in the way it completes directories :/ [21:55:55] #REDIRECT phabricator.wikimedia.org [21:56:03] :) [21:56:07] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: T131132 (duration: 00m 48s) [21:56:08] T131132: Re-label the "Save" button to be "Publish", to better indicate to users the outcomes of their action - https://phabricator.wikimedia.org/T131132 [21:56:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:00:19] (03PS1) 10Jdlrobson: End lazy loading reference experiments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307435 (https://phabricator.wikimedia.org/T144240) [22:01:43] Thanks, RoanKattouw. [22:04:20] (03PS1) 10Andrew Bogott: Horizon puppet panel: Clean up config and defaults [puppet] - 10https://gerrit.wikimedia.org/r/307436 [22:04:25] (03PS1) 10Ladsgroup: ores: Update thresholds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307437 (https://phabricator.wikimedia.org/T144101) [22:05:20] greg-g: Fair enough [22:05:26] (03CR) 10Alex Monk: [C: 032] Log error and privnotice messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307315 (https://phabricator.wikimedia.org/T144189) (owner: 10BryanDavis) [22:05:46] (03Merged) 10jenkins-bot: Log error and privnotice messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307315 (https://phabricator.wikimedia.org/T144189) (owner: 10BryanDavis) [22:06:09] (03CR) 10Alex Monk: "Ewww." [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/307333 (https://phabricator.wikimedia.org/T144189) (owner: 10BryanDavis) [22:06:59] (03CR) 10Alex Monk: "Relevant tickets: T143718, T143881" [puppet] - 10https://gerrit.wikimedia.org/r/307325 (owner: 10Cmjohnson) [22:07:22] 06Operations, 10MediaWiki-JobQueue, 07Regression: Restore 30 minutes delayed list update to no waiting, to stop killing sandbox functionality - https://phabricator.wikimedia.org/T139893#2593068 (10ManosHacker) This affects our user group's [[ https://meta.wikimedia.org/wiki/File:Fuse_Wikipedia_with_Education... [22:12:03] greg-g: https://phabricator.wikimedia.org/T144244 [22:15:15] (03CR) 10Alex Monk: [C: 031] Horizon puppet panel: Clean up config and defaults [puppet] - 10https://gerrit.wikimedia.org/r/307436 (owner: 10Andrew Bogott) [22:17:12] RoanKattouw: ty [22:39:07] !log Deployed patch for T125177 to 1.28.0-wmf.16 [22:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:52:06] 06Operations, 06Labs, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#1846901 (10AlexMonk-WMF) So it seems the list is now: - deployment-prep - integration - wikidata-query - etcd >>!... [22:52:14] PROBLEM - puppet last run on mw2168 is CRITICAL: CRITICAL: puppet fail [22:59:57] (03PS1) 10Jforrester: EditSubmitButtonLabelPublish: Temporarily don't do this [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307446 [23:00:05] RoanKattouw, ostriches, MaxSem, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160829T2300). [23:00:05] Amir1 and Dereckson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:15] hey [23:00:22] the backport can't be tested [23:00:34] the config one is hard to test but possible [23:00:42] no particular order to deploy [23:01:03] I've got a patch too. [23:03:14] Hi. I can SWAT this evening. [23:03:44] Amir1: oh, you backported the script to be able to run ir against eqiad wmnet? [23:03:52] nice [23:04:17] Dereckson: yes :D [23:05:19] I wonder if it's a good idea to use same Change-Id for both master and backport by the way. [23:06:05] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL - No data received from host [23:07:00] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307437 (https://phabricator.wikimedia.org/T144101) (owner: 10Ladsgroup) [23:07:36] (03Merged) 10jenkins-bot: ores: Update thresholds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307437 (https://phabricator.wikimedia.org/T144101) (owner: 10Ladsgroup) [23:08:00] Amir1: live on mw1099 [23:08:10] config? [23:08:18] yep [23:10:13] (03PS2) 10Dereckson: EditSubmitButtonLabelPublish: Temporarily don't do this [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307446 (owner: 10Jforrester) [23:10:38] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307446 (owner: 10Jforrester) [23:11:03] (03Merged) 10jenkins-bot: EditSubmitButtonLabelPublish: Temporarily don't do this [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307446 (owner: 10Jforrester) [23:11:50] James_F: live on mw1099 [23:12:22] Dereckson: Ta. It's a no-op though. [23:12:28] (The code is in the train tomorrow.) [23:12:47] (03PS2) 10Dereckson: Add Əlavə namespace to az.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307108 (https://phabricator.wikimedia.org/T143851) [23:13:22] Dereckson: Yeah, looks good to me. [23:13:43] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 341 bytes in 0.005 second response time [23:13:49] PROBLEM - toolschecker service itself needs to return OK on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/self - 341 bytes in 0.004 second response time [23:14:06] PROBLEM - Redis set/get on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/redis - 341 bytes in 0.018 second response time [23:14:17] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307108 (https://phabricator.wikimedia.org/T143851) (owner: 10Dereckson) [23:14:49] (03Merged) 10jenkins-bot: Add Əlavə namespace to az.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307108 (https://phabricator.wikimedia.org/T143851) (owner: 10Dereckson) [23:14:55] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 341 bytes in 0.052 second response time [23:15:45] I tested and it's fine [23:15:50] PROBLEM - Test LDAP for query on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/ldap - 341 bytes in 0.036 second response time [23:15:50] PROBLEM - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 341 bytes in 0.046 second response time [23:15:50] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 341 bytes in 0.027 second response time [23:15:50] PROBLEM - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 341 bytes in 0.059 second response time [23:15:50] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 341 bytes in 0.040 second response time [23:15:51] testing is really hard [23:15:55] Amir1: ack'ed [23:16:02] Add Əlavə namespace to az.wiktionary live too on mw1099 [23:16:24] PROBLEM - showmount succeeds on a labs instance on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/nfs/showmount - 341 bytes in 0.012 second response time [23:16:24] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 341 bytes in 0.026 second response time [23:16:24] PROBLEM - Verify internal DNS from within Tools on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/labs-dns/private - 341 bytes in 0.029 second response time [23:16:26] (meanwhile, Zuul merged ORES wmf16 backport) [23:17:01] thanks! [23:17:30] Amir1: yoru test procedured touched officewiki? [23:17:47] James_F: ^ [23:18:16] Dereckson: What are you asking? I didn't touch officewiki. [23:18:35] Okay, someone is probably browsing mw1099 with X-Wikimedia-Debug. [23:18:39] PROBLEM - NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 341 bytes in 0.019 second response time [23:18:48] Oh, that's probably me, sorry. [23:19:29] PROBLEM - Redis set/get on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/redis - 341 bytes in 0.036 second response time [23:20:26] Dereckson: I didn't touch office wiki [23:20:43] Amir1: was James [23:20:53] oh okay [23:20:53] (some Echo warning noises) [23:20:53] RECOVERY - puppet last run on mw2168 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:21:12] my patch is tested, works fine too [23:21:20] okay let's send them to prod [23:21:36] I was thinking if my contact can be added in https://office.wikimedia.org/wiki/Contact_list in case ORES blows up [23:22:06] (I don't have account there, I'm not sure I'm eligible to have one, signed NDA but not a staff) [23:22:45] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: ores: Update thresholds (T144101) (duration: 00m 48s) [23:23:54] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: EditSubmitButtonLabelPublish: Temporarily don't do this (currently no-op) (duration: 00m 47s) [23:24:54] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add Əlavə namespace to az.wiktionary (T143851) (duration: 00m 47s) [23:25:44] Config done. [23:27:02] !log restart nfs-kernel-server on labstore1003 and labstore1001 [23:27:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:27:50] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add Əlavə namespace to az.wiktionary (T143851) (duration: 00m 47s) [23:27:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:28:03] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.020 second response time [23:28:09] (the two before was logged) [23:28:45] RECOVERY - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.027 second response time [23:28:51] RECOVERY - Test LDAP for query on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.384 second response time [23:28:51] RECOVERY - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 2.018 second response time [23:28:51] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 2.282 second response time [23:29:12] RECOVERY - NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.031 second response time [23:29:35] RECOVERY - Verify internal DNS from within Tools on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.018 second response time [23:29:35] RECOVERY - showmount succeeds on a labs instance on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.069 second response time [23:29:36] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.381 second response time [23:29:43] RECOVERY - toolschecker service itself needs to return OK on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.057 second response time [23:29:43] RECOVERY - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.977 second response time [23:29:56] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.041 second response time [23:30:05] RECOVERY - Redis set/get on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.068 second response time [23:31:41] Eurueuuwue [23:32:15] Amir1: includes/Cache.php ORES update live on mw1099 [23:32:28] I LOVE DERECKSON [23:32:34] Amir1: minimal test is "extension still loads and doen't throw a fatal error" [23:32:35] :))) [23:32:38] robh: [23:33:25] Dereckson: the minimal test works :) [23:34:47] !log dereckson@tin Synchronized php-1.28.0-wmf.16/extensions/ORES/includes/Cache.php: Improvements to purging cache (T144216) (duration: 00m 47s) [23:34:48] T144216: Purge model score should clean when there is no row is ores_model too - https://phabricator.wikimedia.org/T144216 [23:34:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:35:23] SWAT is done. [23:43:43] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.010 second response time [23:46:05] OLA ALVAROMOLIN [23:46:10] TE MATARE [23:46:13] LALLALALA [23:46:15] ERES [23:46:15] UN [23:46:18] PUTO [23:46:50] (03CR) 10Dereckson: "Follow-up: I5387d504e06cd44649a4cb8b293d36fb00a01693." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [23:47:20] !log stop nfs server on labstore1001 to try to fix scratch mount [23:47:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:49:36] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.337 second response time [23:53:03] Kwkss [23:53:04] Dd [23:53:04] F [23:53:04] Fg [23:53:04] g [23:53:20] (03PS1) 10Dereckson: Content namespaces configuration for lt.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307455 (https://phabricator.wikimedia.org/T144118) [23:54:49] VETE A LA MIERDA ALVAROMOMINA [23:54:49] LALALALA [23:54:49] BASURA [23:57:03] (03Draft2) 10Paladox: Add text/python to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/307456 [23:57:07] (03Draft1) 10Paladox: Add text/python to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/307456 [23:57:39] (03PS3) 10Paladox: Add text/python to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/307456 [23:58:38] (03PS4) 10Paladox: Add text/x-python to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/307456