[00:07:44] AaronS: ok, rbf1001-1002 online =] [00:15:32] (03CR) 10coren: [C: 032] webnode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) (owner: 10Hedonil) [00:15:49] (03CR) 10coren: [V: 032] "Why missing +2V?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) (owner: 10Hedonil) [00:18:09] (03PS1) 10Bsitu: Enable job queue to process notification on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150742 [00:32:14] springle, yt? I noticed that every index on tag_summary is duplicated - it it intended? [00:34:30] *in production, not tables.sql :) [00:36:36] old and new? [00:36:40] one just not deleted? [00:39:50] MaxSem: on enwiki, yeah i'd noticed. needs to be fixed on a slave-by-slave basis when depooled, since no PK for online change [01:32:53] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [02:06:28] !log labsdb1001 migrating to mariadb 10, expect read-only and downtime, see labs-l [02:06:36] Logged the message, Master [02:20:21] !log LocalisationUpdate completed (1.24wmf14) at 2014-07-31 02:19:17+00:00 [02:20:27] Logged the message, Master [02:21:27] RECOVERY - mysqld processes on labsdb1001 is OK: PROCS OK: 1 process with command name mysqld [02:36:33] !log LocalisationUpdate completed (1.24wmf15) at 2014-07-31 02:35:29+00:00 [02:36:38] Logged the message, Master [02:39:20] PROBLEM - mysqld processes on labsdb1001 is CRITICAL: PROCS CRITICAL: 2 processes with command name mysqld [02:54:08] (03PS1) 10Withoutaname: Remove deprecated $wgHTCPMulticastAddress and replace with $wgHTCPRouting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150759 [03:18:25] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4273 MB (3% inode=94%): [03:19:35] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 31 03:18:07 UTC 2014 (duration 18m 6s) [03:19:40] Logged the message, Master [03:34:00] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [04:11:58] PROBLEM - Disk space on elastic1016 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19517 MB (3% inode=99%): [04:21:53] PROBLEM - Puppet freshness on labsdb1001 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 02:21:25 UTC [04:28:51] wmflabs services that access the database seem to be experiencing MySQL errors. [04:29:44] https://tools.wmflabs.org/sigma/editorinteract.py and other tools give errors like "Access denied for user 's51469'@'10.68.17.123' (using password: YES)" [04:29:59] James_F|Away / andrewbogott_afk ping [04:30:29] LFaraone: [19:06:28] !log labsdb1001 migrating to mariadb 10, expect read-only and downtime, see labs-l [04:30:42] yes, these are... read operations, no? [04:31:05] yes, he also said "downtime" though :P [04:31:57] that was... several hours ago, though, right? [04:33:47] About 2 and a half yeah, but I guess the migration will take some time [04:47:44] PROBLEM - Disk space on elastic1016 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19552 MB (3% inode=99%): [04:52:50] (03PS1) 10Springle: Remove mysql_multi_instance from labsdb1001 after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150764 [04:55:47] (03CR) 10Springle: [C: 032] Remove mysql_multi_instance from labsdb1001 after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150764 (owner: 10Springle) [05:02:44] PROBLEM - Disk space on elastic1016 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19877 MB (3% inode=99%): [05:05:23] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [05:08:30] RECOVERY - Puppet freshness on labsdb1001 is OK: puppet ran at Thu Jul 31 05:08:20 UTC 2014 [05:18:20] PROBLEM - Disk space on elastic1016 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 20164 MB (3% inode=99%): [05:33:21] PROBLEM - Disk space on elastic1016 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 20049 MB (3% inode=99%): [05:34:09] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [05:54:09] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 03:53:09 UTC [06:29:39] RECOVERY - Disk space on vanadium is OK: DISK OK [06:33:12] (03PS6) 10Ori.livneh: Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 [06:33:29] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jul 31 06:33:27 UTC 2014 [06:42:07] (03PS7) 10Ori.livneh: Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 [06:42:11] PROBLEM - puppet last run on virt0 is CRITICAL: CRITICAL: Puppet has 1 failures [06:50:11] !log labsdb1001 migration complete, should be all systems go [06:50:17] Logged the message, Master [06:50:25] LFaraone: ^ [06:56:00] PROBLEM - puppet last run on ssl1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:09] Cool [06:59:19] RECOVERY - puppet last run on virt0 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [07:11:55] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LGTM apart from a couple of comments - but you removed mediawiki::hhvm and I don't see references to it changed in the manifests." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [07:13:42] (03CR) 10Ori.livneh: Add HHVM module (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [07:14:09] RECOVERY - puppet last run on ssl1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [07:16:04] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "IMO, this is not something you can solve via puppet this way. The file is written by puppet itself and each time it gets rotated it will b" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [07:19:36] (03PS8) 10Ori.livneh: Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 [07:20:53] (03CR) 10Ori.livneh: "PS8 implements Giuseppe's suggestion and restores modules/mediawiki/manifests/hhvm.pp for now (I'll port it to use this module in a separa" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [07:22:46] (03PS4) 10Ori.livneh: Nutcracker: move declaration to role::mediawiki; parametrize [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 [07:35:09] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [07:43:08] (03CR) 10Giuseppe Lavagetto: [C: 031] Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [07:51:22] (03PS3) 10Giuseppe Lavagetto: nginx - remove cipher kEDH+AESGCM [operations/puppet] - 10https://gerrit.wikimedia.org/r/146806 (owner: 10Dzahn) [07:51:48] (03CR) 10Giuseppe Lavagetto: [C: 032] nginx - remove cipher kEDH+AESGCM [operations/puppet] - 10https://gerrit.wikimedia.org/r/146806 (owner: 10Dzahn) [07:56:50] PROBLEM - puppetmaster backend https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:57:39] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.053 second response time [08:03:09] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [08:07:36] (03CR) 10Alexandros Kosiaris: [C: 04-2] "This is an old version of the README.Debian file. You probably are still on the debian branch which does not exist anymore as you pointed " [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/150713 (owner: 10Ottomata) [08:14:32] <_joe_> sorry I forgot to merge that [08:15:09] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [08:15:21] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [08:31:39] RECOVERY - Disk space on elastic1016 is OK: DISK OK [08:43:39] <_joe_> !log start rolling reload of nginx to catch up with the new ssl config [08:43:45] Logged the message, Master [09:04:07] (03CR) 10Filippo Giunchedi: "the right solution would be to set the shell's umask in the cron invocation, but that might have other undesirable side effects if puppet " [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [09:04:30] <_joe_> godog: that was my thinking as well [09:04:50] (03CR) 10Filippo Giunchedi: restrict access to puppet logs to root users (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [09:05:00] <_joe_> godog: and I don't think that would affect puppet in general, the "file" resource is managed explicitly [09:05:14] <_joe_> but it needs testing [09:07:41] true it might just behave, not sure it is worth the risk and I think it was discussed to just go ahead with the stopgap and move eventually to syslog [09:31:29] PROBLEM - puppetmaster backend https on strontium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:19] RECOVERY - puppetmaster backend https on strontium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.023 second response time [09:32:46] <_joe_> no errors on strontium, just checked... [09:36:09] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [09:45:24] (03CR) 10Filippo Giunchedi: [C: 031] "note the per-API directory thing might be relevant to https://gerrit.wikimedia.org/r/#/c/150212 and related" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [10:11:11] (03CR) 10Filippo Giunchedi: [C: 04-1] "I think we'd need to discuss this a bit also to find a solution to be put in debian (related perhaps to the per-api (or per-abi?) director" [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150212 (owner: 10Giuseppe Lavagetto) [10:16:59] <_joe_> godog: about that, I concur, I tried to write up a simplicistic solution but we should do something else. Our main problem being, we have no way to determine if API/ABI has changed, really. see https://github.com/facebook/hhvm/pull/3322 (and the comment within, meh) [10:21:31] _joe_: indeed, yeah we'll have to find a solution but if the number isn't accurate not much else comes to mind ATM [10:21:58] (03PS1) 10Giuseppe Lavagetto: wmflib: add ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 [10:23:05] <_joe_> godog: we add a substvar manually and bump it when we know abi changes [10:23:08] <_joe_> :/ [10:23:44] <_joe_> i don't see any other solution, and hhvm guys do not want to commit to abi stability, and I do understand that, sort of [10:25:04] <_joe_> so, we have two ways to do that IMO: 1) we patch hhvm so that --version gives us some useful string for marking an ABI version 2) we hardcode the value in debian/rules [10:25:21] <_joe_> the 1) is my preferred solution ATM [10:25:33] <_joe_> it's a simple patch we can maintain without hassle [10:25:43] <_joe_> and it can be scripted easily [10:26:34] <_joe_> also, I hope in the future we can work with releases and not with the absolute bleeding edge [10:27:19] it is two different things though, it is fine to not have abi stability but would be nice if we had an easy way to know when that happens [10:28:17] yeah releases would make it less frequent perhaps [10:29:45] (03PS2) 10Filippo Giunchedi: swift: monitor object/container availability [operations/puppet] - 10https://gerrit.wikimedia.org/r/149019 [10:29:51] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: monitor object/container availability [operations/puppet] - 10https://gerrit.wikimedia.org/r/149019 (owner: 10Filippo Giunchedi) [10:31:59] !log Jenkins: tweaking jobs labels, that might eventually screw up Zuul/Jenkins entirely. [10:32:05] Logged the message, Master [10:49:12] !log Jenkins: attempting to poll a Trusty slave (integration-slave1004-trusty [10.68.17.148] with label UbuntuTrusty). [10:49:17] Logged the message, Master [10:53:09] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 08:52:54 UTC [10:53:29] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jul 31 10:53:20 UTC 2014 [10:59:19] (03CR) 10Alexandros Kosiaris: [C: 04-1] Mathoid configuration for beta labs (039 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [11:25:12] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Liking the general idea for sure, comments inline" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 (owner: 10Giuseppe Lavagetto) [11:30:53] (03PS1) 10Hedonil: exec_environ.pp: Install libaio1 to enable asynchronous I/O system calls [operations/puppet] - 10https://gerrit.wikimedia.org/r/150787 (https://bugzilla.wikimedia.org/68615) [11:37:09] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [11:37:37] !log Jenkins: upgrading almost all jobs to use a new label 'UbuntuPrecise' {{bug|68340}} {{gerrit|150785}} [11:37:43] Logged the message, Master [11:41:13] Reedy, will you be the one to switch enwiki 14->15? [12:04:31] !log reloading Jenkins configuration [12:04:35] Logged the message, Master [12:10:22] !log stopping Jenkins and restarting it [12:10:28] Logged the message, Master [12:14:20] snack lunch [12:14:26] jenkins works [12:37:03] (03CR) 10coren: [C: 032] "+package" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150787 (https://bugzilla.wikimedia.org/68615) (owner: 10Hedonil) [12:37:12] (03CR) 10coren: [V: 032] "+package" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150787 (https://bugzilla.wikimedia.org/68615) (owner: 10Hedonil) [12:38:04] * YuviPanda pokes Coren with https://gerrit.wikimedia.org/r/#/c/150425/ [12:38:52] Coren: also, we need the python-txstatsd package built and put into apt.wm.o for the graphite box. I can build the thing now, but don't think I've any rights to put things on apt.wm.o [12:38:57] or even unsure what the process is? [12:39:24] I'm not quite sure what the process is myself since I haven't had to do it yet. :-) [12:39:41] Coren: heh [12:39:54] Coren: I also take back what I said about packages being hard. at least for python... not so much [12:40:27] (03CR) 10coren: [C: 032] "If it breaks any toyes, it'll be those in your own pram. :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 (owner: 10Yuvipanda) [12:41:23] Coren: heh, yay :) [12:42:11] (03CR) 10Physikerwelt: Mathoid configuration for beta labs (039 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [12:43:15] (03PS15) 10Physikerwelt: Mathoid configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [12:59:16] (03CR) 10Giuseppe Lavagetto: wmflib: add ssl_ciphersuite (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 (owner: 10Giuseppe Lavagetto) [13:20:47] (03CR) 10Alexandros Kosiaris: [C: 032] spamassassin: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/149993 (owner: 10Matanya) [13:22:03] (03CR) 10Alexandros Kosiaris: [C: 032] swift:lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/150505 (owner: 10Matanya) [13:22:46] godog: merging this ^. Running through catalog-differ and it is ok [13:23:05] akosiaris: awesome! tyvm sir [13:23:32] thank you both godog and akosiaris [13:25:15] matanya: thank you! [13:29:27] <_joe_> matanya: the puppet masters are *really* grateful for all the work you've done on this [13:30:06] thank you _joe_ :) this is much encouraging ! [13:30:23] <_joe_> btw whenever you have time, take a look at http://etherpad.wikimedia.org/p/Hiera and see if you have comments [13:31:15] yay hiera [13:34:16] (03PS2) 10Giuseppe Lavagetto: wmflib: add ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 [13:34:24] (03CR) 10Ottomata: "Weird! I was on the master branch, I must have some weird local branch crap going on. Will check it." [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/150713 (owner: 10Ottomata) [13:34:35] <_joe_> YuviPanda: you as well [13:37:10] _joe_: yeah, leaving some comments about labs [13:37:46] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [13:38:50] <_joe_> YuviPanda: thanks a lot, I think we should actually start with labs [13:39:00] _joe_: +1 [13:39:07] (03Abandoned) 10Ottomata: Fix for debian/README.Debian [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/150713 (owner: 10Ottomata) [13:39:22] bahhh [13:39:28] hhvm is under mediawiki module ... :/ [13:42:12] <_joe_> hashar: yes but we're building a standalone module [13:42:27] <_joe_> that was a first, quick and dirty class for releasing the jobrunners [13:43:52] (03PS3) 10Giuseppe Lavagetto: mediawiki::web: get rid of envvars.appserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/147514 (owner: 10Ori.livneh) [13:44:24] !log reedy Synchronized php-1.24wmf15/extensions/WikimediaMessages: (no message) (duration: 00m 14s) [13:44:30] Logged the message, Master [13:44:48] !log reedy Synchronized php-1.24wmf15/extensions/RelatedSites/: (no message) (duration: 00m 15s) [13:44:54] Logged the message, Master [13:45:06] <_joe_> mmmh I will not release an apache change during swat [13:45:18] <_joe_> I promise I'll behave [13:45:24] _joe_: added a note about labs heira probably should be a different repo [13:45:29] unsure what that entails, though [13:46:15] <_joe_> YuviPanda: I do agree, I was thinking about doing something like keeping hiera data in operations/puppet for production, and having a separate one for labs maybe [13:46:27] (03PS1) 10Hashar: hhvm: create module + list all dev dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) [13:46:42] _joe_: hmm, keeping them in operations/heira/production and operations/heira/labs was what I was thinking [13:46:51] in the future, we might even have operations/heira/vagrant :) [13:47:54] !log reedy Started scap: Rebuild 1.24wmf15 l10n cache for WikimediaMessages updates [13:48:00] Logged the message, Master [13:48:10] (03PS2) 10Reedy: Move RelatedSites config to wgExtraInterlanguageLinkPrefixes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148314 (https://bugzilla.wikimedia.org/41209) (owner: 10TTO) [13:48:31] (03CR) 10Reedy: [C: 031] "Dependencies merged, scap running. -1 removed" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148314 (https://bugzilla.wikimedia.org/41209) (owner: 10TTO) [13:51:39] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Not sure if I got what you wanted to achieve here, but this doesn't probably work." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [13:51:39] Error: Failed to apply catalog: Could not find dependent Package[apache2-mpm-worker] for Apache::Mod_conf[mpm_worker] at /etc/puppet/modules/apache/manifests/mpm.pp:46 [13:51:43] puppet is never ending [13:51:53] <_joe_> wut> [13:52:09] <_joe_> hashar: where is that? [13:53:09] <_joe_> YuviPanda: yes that may work [13:53:32] (03CR) 10Hashar: hhvm: create module + list all dev dependencies (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [13:53:33] <_joe_> Reedy: please let me know when it's ok for me to fiddle with apache [13:53:47] _joe_: on a Trusty instance integration-slave1004-trusty.eqiad.wmflabs [13:54:11] it uses some contint:: classes [13:55:00] <_joe_> hashar: http://packages.ubuntu.com/search?keywords=apache2-mpm-worker&searchon=names&suite=trusty§ion=all [13:55:17] <_joe_> oh wait [13:55:23] <_joe_> that's saying something different [13:55:26] it is in puppet [13:55:37] Package[apache2-mpm-worker] isn't defined [13:55:41] hurra [13:55:56] <_joe_> correctly not [13:56:23] <_joe_> ok seen the problem [13:56:44] <_joe_> hashar: fixing [13:56:48] <_joe_> that's my fault btw [13:56:48] the labs project (integration) has a local puppetmaster so we can test it out :] [13:57:54] (03CR) 10Nemo bis: [C: 031] "Thanks" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148314 (https://bugzilla.wikimedia.org/41209) (owner: 10TTO) [14:03:36] (03PS1) 10Giuseppe Lavagetto: apache: remove $rejected_pkgs [operations/puppet] - 10https://gerrit.wikimedia.org/r/150816 [14:03:45] <_joe_> hashar: ^^ [14:03:52] ;-D [14:03:56] cherry picking [14:04:06] <_joe_> take a look and see if it makes sense [14:05:22] on Precise I now have: error ArgumentError: Invalid resource type monitor_group at /etc/puppet/manifests/facilities.pp:110 on node i-000001bd.eqiad.wmflabs [14:05:38] probably unrelated [14:05:55] !log removed labs-in4 and labs-in6 filters on vlan 1117 (labs-hosts1-a-eqiad) on cr[12]-eqiad [14:06:01] Logged the message, Master [14:06:49] yeah transient [14:07:38] (03CR) 10Hashar: "I have cherry picked the patch on 'integration' puppetmaster. Precise instance is happy." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150816 (owner: 10Giuseppe Lavagetto) [14:09:32] _joe_: thanks :° [14:09:47] <_joe_> hashar: eh, that was my error in the first place [14:10:05] <_joe_> it's part of the optimization-OCD ori and I share [14:10:20] <_joe_> I "perfected" his patch [14:10:28] <_joe_> and screwed up [14:10:34] <_joe_> well, only on trusty [14:10:34] !log reedy Finished scap: Rebuild 1.24wmf15 l10n cache for WikimediaMessages updates (duration: 22m 40s) [14:10:40] <_joe_> :P [14:10:40] Logged the message, Master [14:10:47] (03CR) 10Reedy: [C: 032] Move RelatedSites config to wgExtraInterlanguageLinkPrefixes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148314 (https://bugzilla.wikimedia.org/41209) (owner: 10TTO) [14:10:54] (03Merged) 10jenkins-bot: Move RelatedSites config to wgExtraInterlanguageLinkPrefixes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148314 (https://bugzilla.wikimedia.org/41209) (owner: 10TTO) [14:11:03] (03PS2) 10Hashar: hhvm: create module + list all dev dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) [14:11:06] (03PS1) 10Reedy: Revert "Wikivoyages back to 1.24wmf14" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150818 [14:11:09] (03PS2) 10Reedy: Revert "Wikivoyages back to 1.24wmf14" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150818 [14:11:14] (03CR) 10Reedy: [C: 032] Revert "Wikivoyages back to 1.24wmf14" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150818 (owner: 10Reedy) [14:11:19] (03Merged) 10jenkins-bot: Revert "Wikivoyages back to 1.24wmf14" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150818 (owner: 10Reedy) [14:11:37] akosiaris: i think my kafka change is wonky [14:11:44] (03CR) 10Hashar: "PS2 fix a typo in contint server: hvvm -> hhvm" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [14:11:45] it looks like it is trying to go on the debian branch, when it should be on master [14:12:04] !log reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 14s) [14:12:08] i'm trying to figure out how to switch it to debian branch, but I might have to submit a new gerrit change and abandon that one... [14:12:09] Logged the message, Master [14:12:25] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikivoyages to 1.24wmf15 [14:12:30] Logged the message, Master [14:12:37] ottomata: which one? [14:12:49] debian branch is dead btw [14:13:18] (03Abandoned) 10Giuseppe Lavagetto: hhvm: provide hhvm-build-$VERSION [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150212 (owner: 10Giuseppe Lavagetto) [14:13:25] yeah, but it is still in gerrit? not sure [14:13:28] this one [14:13:29] https://gerrit.wikimedia.org/r/#/c/149889/ [14:13:34] Branch [14:13:34] https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/debs/kafka+branch:debian,n,z [14:13:34] Topic [14:13:34] master [14:13:57] (03PS1) 10Alexandros Kosiaris: Split kafka package into 3 separate packages [operations/debs/kafka] - 10https://gerrit.wikimedia.org/r/150819 [14:14:12] cherry-picked to master [14:14:38] (03PS2) 10Ottomata: Split kafka package into 3 separate packages [operations/debs/kafka] - 10https://gerrit.wikimedia.org/r/150819 (owner: 10Alexandros Kosiaris) [14:14:45] oop [14:15:31] should be mergeable now [14:15:48] ok so, new change then, ok [14:15:51] _joe_: Should be fine to do apache stuffs now [14:16:01] the old one should be mergeable too [14:16:12] gerrit did not let me delete the debian branch [14:16:18] i think I just did the same you did, but via the CLI :p [14:16:21] hm [14:16:22] ok [14:16:24] merging, thanks [14:16:25] as long as there were unmerged changes [14:16:42] (03CR) 10Ottomata: [C: 032 V: 032] "This change was reviewed at https://gerrit.wikimedia.org/r/#/c/149889/" [operations/debs/kafka] - 10https://gerrit.wikimedia.org/r/150819 (owner: 10Alexandros Kosiaris) [14:16:51] aye [14:16:56] abandoning the other [14:17:11] (03CR) 10Hashar: "Cherry picked on integration puppet master. But that needs a bit of work as Giuseppe suggested." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [14:17:17] (03Abandoned) 10Ottomata: Split kafka package into 3 separate packages [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 (owner: 10Ottomata) [14:17:20] <_joe_> Reedy: thanks :) [14:17:33] let's see if I can delete the gerrit debian branch then.. [14:17:38] ottomata: btw how do you feel about this one https://gerrit.wikimedia.org/r/#/c/147499/ ? [14:17:46] ottomata: still 2 open [14:18:00] the one I mentioned and https://gerrit.wikimedia.org/r/#/c/148287/ [14:18:06] which I suppose we can abandon ? [14:18:07] yeah 14287 [14:18:20] we can abandon that I think, I think I fixed those issues in my change [14:18:31] cool abandoning it then [14:18:55] akosiaris: i've never used pbuilder, let's merge this and I will try to build the packages again using it and your readme instructions [14:19:04] (03Abandoned) 10Alexandros Kosiaris: Fix debian/bin/kafka [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148287 (owner: 10Plucas) [14:25:29] (03CR) 10Ottomata: [C: 032] "Let's do it!" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/147499 (owner: 10Alexandros Kosiaris) [14:25:40] akosiaris: cherry pick that one too then? [14:25:58] (btw, i probably won't see these changes unless you add me as reviewer) [14:26:39] ottomata: I tried... it has merge conflicts... resolving them now [14:27:13] (03CR) 10Giuseppe Lavagetto: hhvm: lintian fixes (031 comment) [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150213 (owner: 10Giuseppe Lavagetto) [14:27:54] (03PS1) 10Alexandros Kosiaris: Use pbuilder by default [operations/debs/kafka] - 10https://gerrit.wikimedia.org/r/150825 [14:28:02] (03PS1) 10Giuseppe Lavagetto: hhvm: lintian fixes [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150826 [14:28:04] heh... new change :-( [14:28:25] (03Abandoned) 10Alexandros Kosiaris: Use pbuilder by default [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/147499 (owner: 10Alexandros Kosiaris) [14:28:37] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] hhvm: lintian fixes [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150826 (owner: 10Giuseppe Lavagetto) [14:28:51] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Already reviewed in https://gerrit.wikimedia.org/r/147499" [operations/debs/kafka] - 10https://gerrit.wikimedia.org/r/150825 (owner: 10Alexandros Kosiaris) [14:29:25] <_joe_> I have a gerrit question [14:29:38] ah... everyone's favourite toy [14:29:41] <_joe_> oh nevermind [14:29:47] ahahahaha [14:29:48] <_joe_> it seems I got it right [14:29:50] A first build: https://integration.wikimedia.org/ci/job/php-FastStringSearch-hhvm-build/1/console [14:29:50] 00:00:05.874 cc1plus: error: /root/hhvm/joe/hhvm: Permission denied [14:29:50] :-D [14:29:58] <_joe_> I don't know why [14:29:59] poor FastStringSearch [14:30:08] <_joe_> hashar: that's fucking hphpize [14:30:16] anyway gotta move. But we have a Jenkins Trusty slave pooled [14:30:19] <_joe_> retaining somewhere the path it's been built into [14:30:32] <_joe_> next time I'll build in /tmp [14:30:33] which runs some experimental compilation of wikidiff2 luasandbox and FastStringSearch [14:30:36] details on https://bugzilla.wikimedia.org/show_bug.cgi?id=63120 [14:30:38] yeah [14:30:49] I am off for now [14:30:57] <_joe_> hashar: filippo discovered that, it sucks [14:31:10] bug fill it ! :-D [14:31:26] I will continue tomorrow [14:31:55] ottomata: done, thanks!!! [14:32:38] (03PS1) 10Reedy: Allow sysops and 'crats on wikimania2014wiki to grant confirmed [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150829 [14:33:12] (03CR) 10Reedy: [C: 032] Allow sysops and 'crats on wikimania2014wiki to grant confirmed [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150829 (owner: 10Reedy) [14:33:17] (03Merged) 10jenkins-bot: Allow sysops and 'crats on wikimania2014wiki to grant confirmed [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150829 (owner: 10Reedy) [14:33:56] !log reedy Synchronized wmf-config/InitialiseSettings.php: Allow sysops and 'crats on wikimania2014wiki to grant confirmed (duration: 00m 15s) [14:34:01] Logged the message, Master [14:38:52] (03PS1) 10BBlack: add labs-hosts1-a-eqiad to dhcpd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/150832 [14:39:36] (03CR) 10BBlack: [C: 032 V: 032] add labs-hosts1-a-eqiad to dhcpd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/150832 (owner: 10BBlack) [14:57:19] !log added labstore1003 to filter labs-in4 terms allow-labstore-(udp|tcp)4 on cr[12]-eqiad [14:57:24] Logged the message, Master [15:00:05] manybubbles, anomie, Reedy: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140731T1500). Please do the needful. [15:00:25] * anomie liked "the time is nigh" better [15:00:32] * anomie also observes no patches for SWAT [15:08:45] anomie: YuviPanda got all fancy with the messages and made them random -- https://github.com/wikimedia/wikimedia-bots-jouncebot/blob/master/DefaultConfig.yaml [15:09:18] (03Restored) 10Giuseppe Lavagetto: hhvm: provide hhvm-build-$VERSION [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150212 (owner: 10Giuseppe Lavagetto) [15:09:26] bd808: oh, did it say something now? [15:09:29] * YuviPanda lost scrollback [15:10:12] YuviPanda: It was "Respected human" this time but I saw "Dear anthropoid" yesterday [15:10:24] bd808: heh :) [15:10:41] sidenote - this makes my ping for the message harder :( [15:11:14] My client apparently won't let me set a watch on everything said by a given user [15:11:19] hmm, bah [15:11:44] * YuviPanda is unsure what to do / fix [15:11:55] meh not a big deal [15:12:49] bd808: I could add a 'notify users' feature, that'll PM you everytime there's a deployment [15:13:08] ew no thanks [15:13:15] bd808: :D [15:13:38] * YuviPanda instead makes icinga-wm PM bd808 every time something's critical [15:14:23] the new jouncebot message seems very discrimantory towards humans undeserving of respect :P [15:14:35] I actually liked it best when it was using /notice [15:14:55] I didn't like /notice, since that didn't actually ping me [15:15:15] bblack: "Lowly worker bee, now is the time to get to work" [15:15:31] * anomie has some sort of crossed meaning in his head between "anthropoid" and "arthropod" [15:15:34] bblack: it'll descriminate against anthropoids who don't want to be called 'dear' 50% of the time [15:15:56] anomie: Ah. My client flashes a desktop announcement for all /notices in all channels [15:29:34] LFaraone: Bah, back now; sorry you had issues overnight. :-( [15:29:51] LFaraone: (DB seems back now.) [15:33:49] (03PS9) 10Ori.livneh: Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 [15:34:11] (03CR) 10Ori.livneh: [C: 032 V: 032] Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [15:36:55] (03PS1) 10Alexandros Kosiaris: Setup labsdb1006, labsdb1007 as osmdbs [operations/puppet] - 10https://gerrit.wikimedia.org/r/150837 [15:37:45] (03PS1) 10Ori.livneh: hhvm: set hhvm.log.header = true [operations/puppet] - 10https://gerrit.wikimedia.org/r/150838 [15:38:47] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [15:47:11] (03PS1) 10Alexandros Kosiaris: Bump up postgresql max_connections to 120 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150841 [15:48:11] (03CR) 10Alexandros Kosiaris: [C: 032] Bump up postgresql max_connections to 120 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150841 (owner: 10Alexandros Kosiaris) [15:53:22] (03CR) 10Alexandros Kosiaris: [C: 032] Setup labsdb1006, labsdb1007 as osmdbs [operations/puppet] - 10https://gerrit.wikimedia.org/r/150837 (owner: 10Alexandros Kosiaris) [15:57:56] RECOVERY - Puppet freshness on labsdb1004 is OK: puppet ran at Thu Jul 31 15:57:48 UTC 2014 [15:58:06] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Puppet last ran 719156 seconds ago, expected 14400 [15:59:06] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:02:27] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [16:02:36] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [16:02:58] shame on me [16:03:27] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [16:03:36] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [16:03:46] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Puppet has 1 failures [16:03:53] * _joe_ throws rocks to akosiaris [16:04:06] I was sure I had merged btw [16:06:37] (03PS1) 10Giuseppe Lavagetto: hhvm: provide hhvm-api-$VERSION [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150845 [16:06:46] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:09] (03Abandoned) 10Giuseppe Lavagetto: hhvm: provide hhvm-build-$VERSION [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150212 (owner: 10Giuseppe Lavagetto) [16:08:46] (03PS2) 10Giuseppe Lavagetto: hhvm: provide hhvm-api-$VERSION [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150845 [16:09:37] (03CR) 10Ori.livneh: [C: 031] mediawiki: get rid of envvars files in puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150492 (owner: 10Giuseppe Lavagetto) [16:10:20] chasemp: I'm adding a new group to admins. Does the gid derive from ldap such that I should create a group in ldap first? [16:14:11] Not working today but saw this :). they are meant to match but so far it hasn't been worked out. So anything 700 unused should work [16:14:23] ok [16:14:43] I guess the next couple of days are going to be lonesome, what with everyone on their way to Paris :) [16:14:43] andrewbogott: I reply to the RT ticket regarding me btw :) [16:14:51] JohnLewis: great! thank you. [16:15:13] *replied for meta correction-ness :p [16:15:41] JohnLewis: I'm not sure what our RT policies are, I will check in with mutante if/when he comes to work :) [16:16:24] andrewbogott: alright - it was just a 'if I'm here doing this, RT may be useful but I'll let you guys decide whether its worth the effort your side' :) [16:17:00] andrewbogott: paris? [16:17:40] Oh, um, London [16:17:46] I'm confusing wikimania w/OpenStack [16:17:51] (neither of which I"m going to) [16:18:24] hm, I guess I missed the chance to convince YuviPanda that he's in the wrong country :( [16:18:32] heh [16:22:34] (03PS1) 10coren: labstore1003: Give up on Trusty for the time being [operations/puppet] - 10https://gerrit.wikimedia.org/r/150849 [16:22:44] andrewbogott: Can you +2 ^^ plz? [16:22:57] (03PS1) 10Andrew Bogott: New admin group for eventlogging troubleshooting. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150850 [16:23:15] (03CR) 10Andrew Bogott: [C: 032] ":(" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150849 (owner: 10coren) [16:25:22] (03PS1) 10Filippo Giunchedi: add ssh-based uploads to releases.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/150851 [16:26:53] andrewbogott: Did you already merge on carbon as well or not? [16:27:04] just on palladium [16:28:09] andrewbogott: I think I'll be able to sneak one boot in before carbon grabs it; wanna revert in the meantime? [16:30:53] (03PS1) 10Andrew Bogott: Revert "labstore1003: Give up on Trusty for the time being" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150853 [16:30:59] :( [16:31:16] ebernhardson: why :( [16:31:22] :( [16:31:25] (03CR) 10Andrew Bogott: [C: 032] Revert "labstore1003: Give up on Trusty for the time being" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150853 (owner: 10Andrew Bogott) [16:31:44] YuviPanda: i read wrong :P I thought that said give up on trusty, instead of revert give up on trusty [16:31:49] ebernhardson: heh :) [16:32:20] (03CR) 10Filippo Giunchedi: [C: 031] mediawiki::web: get rid of envvars.appserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/147514 (owner: 10Ori.livneh) [16:32:31] ori: I believe we're good ^ [16:32:53] with the code review that is, also as a general statement too [16:33:11] godog: should i merge it? [16:34:09] _joe_: what do you think? re: https://gerrit.wikimedia.org/r/147514 [16:35:04] ori: I would but I'm not overly familiar with testing/rolling out changes to the appservers yet [16:35:15] <_joe_> godog: I can show you [16:35:28] <_joe_> or explain, which is even simpler [16:35:29] <_joe_> :) [16:35:35] <_joe_> in 5 minutes [16:35:44] sure [16:41:03] _joe_: were you ever able to determine if HHVM handles logrotation gracefully? it appears to use SIGHUP for graceful-stop, i don't see anything special for SIGUSR1/2.. [16:41:36] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [16:41:49] <_joe_> ori: HHVM doesn't give a shit about signals, it just dies [16:42:06] <_joe_> I think I opened an issue [16:42:10] _joe_: there's https://github.com/facebook/hhvm/commit/91e8609 [16:42:24] maybe it doesn't work, but at least it claims to graceful on sighup [16:42:36] it appears to work for me but it's hard to test [16:42:49] sorry, i'll let you focus on the convo with godog [16:45:27] matanya: regarding https://gerrit.wikimedia.org/r/#/c/150521/ -- the ticket requests access to ocg-render-admin, which does not exist; your attached patch adds them to pdf-render-admin [16:45:29] can you explain? [16:46:20] <_joe_> ori: yes gimme a minute [16:46:49] <_joe_> you're allowing me to procrastinate some very important but tedious accounting work [16:46:59] <_joe_> the joys of being a contractor... [16:46:59] heh [16:47:52] mark: what's a reasonable time to wait on an app server to graceful-stop before sending SIGKILL? upstart's default is 5 seconds [16:48:17] i'd say a bit more, but I'd have to test [16:48:41] (03CR) 10Filippo Giunchedi: [C: 04-1] hhvm: lintian fixes (033 comments) [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150213 (owner: 10Giuseppe Lavagetto) [16:49:04] hmm. maybe 10 to start, and we adjust from there? [16:50:24] i'll test on osmium [16:51:18] ori: I'll probably merge/deploy first thing tomorrow, too late/tired now [16:51:33] (03CR) 10Mark Bergsma: [C: 031] seting the scs-[a|c]1-codfw.mgmt.codfw.wmnet entries [operations/dns] - 10https://gerrit.wikimedia.org/r/150629 (owner: 10RobH) [16:51:38] godog: +1. thanks! [16:51:49] godog: and thanks for the other reviews too! [16:52:29] <_joe_> ori: I think the value of Timeout we have in our apache config is a sensible time to wait :) [16:52:54] 200? [16:53:05] <_joe_> we have 200? ouch [16:53:10] <_joe_> that's incredibly long [16:53:32] the default is 300 [16:53:36] but yeah [16:53:54] <_joe_> yes default is for the webserver I run on my mediacenter :P [16:54:02] graceful stop is a bit more urgent tho [16:54:09] <_joe_> I usually set it to 20 or less [16:54:12] <_joe_> ori: yes [16:54:20] <_joe_> so set it to 10 [16:54:21] if you just want to leisurely decom a server then we just depool it and wait for the connections to drain [16:54:26] (03CR) 10Manybubbles: "I'm a bit concerned about the extra IO required to pump through 100MB/sec. Right now requiring the cluster to recover causes some nasty I" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150586 (owner: 10Chad) [16:54:37] * _joe_ nods [16:54:50] <_joe_> ori: I just thought we had a much shorter timeout [16:54:57] i'm going to leave it unset for now (which will make it default to 5) [16:55:10] if i set it at 10 it will become "magical", i.e. people may mistake that for a scientifically determined value [16:55:14] so better to leave it unset for now [16:55:20] (03CR) 10RobH: [C: 032] seting the scs-[a|c]1-codfw.mgmt.codfw.wmnet entries [operations/dns] - 10https://gerrit.wikimedia.org/r/150629 (owner: 10RobH) [16:55:35] <_joe_> eheh [16:55:56] <_joe_> or, you comment your choice declaring it's a guesswork [16:56:43] (03PS2) 10Ori.livneh: Tweaks to HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150838 [16:56:59] <_joe_> so, now I should really get to do my accounting work sorry [16:57:12] kk. i'll merge that one, it's trivial [16:57:19] unless you object [16:57:23] <_joe_> ori: I may be around later, but I thought I can do a bunch of apache merges tomorrow [16:57:33] _joe_: no worries, take care of your accounting stuff! [16:57:38] thanks for the reviews [16:57:55] <_joe_> +2 for log.header True [16:58:16] oh there is one thing that's possibly controversial [16:58:16] (03CR) 10Giuseppe Lavagetto: [C: 031] Tweaks to HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150838 (owner: 10Ori.livneh) [16:58:20] respawn limit unlimited [16:58:28] how do you feel about that? [16:58:41] <_joe_> that it does not make sense in general [16:58:57] <_joe_> we should give up eventually if hhvm doesn't respawn [16:59:10] <_joe_> but we can live with that [16:59:28] well, these boxes don't havemuch to do without hhvm anyway:P [16:59:30] <_joe_> also, our experience is showing hhvm fails in less obvious ways than just crashing [17:00:11] ok, let's think about this more. i'll amend the patch to drop that directive [17:00:25] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150858 [17:00:27] (03PS1) 10Reedy: testwiki to 1.24wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150859 [17:00:29] (03PS1) 10Reedy: Wikipedias to 1.24wmf15 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150860 [17:00:31] (03PS1) 10Reedy: group0 to 1.24wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150861 [17:00:48] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150858 (owner: 10Reedy) [17:00:51] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150858 (owner: 10Reedy) [17:00:51] <_joe_> MaxSem: that's not the reason. If there is some external reason making hhvm fail (like say, a db refusing connections due to a network partition) restarting it furiously can only make things worse [17:01:20] surely a db refusing connections won't make hhvm server die? [17:01:20] (03PS3) 10Ori.livneh: Tweaks to HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150838 [17:01:23] <_joe_> that's my only doubt [17:01:35] (I think I'm in favor of unlimited respawn) [17:02:13] we can add a brief sleep to the post-stop stanza [17:02:19] Hi ops! Quick question: does anything about caching or production setup make $_SESSION (and consequently WebRequest::getSessionData()/setSessionData()) unreliable? I have a bug in code that uses it, and the bug only shows up on production (on Meta), works fine on my local install [17:02:38] <_joe_> godog: I may recall wrong, but you usually tell hhvm not to respawn if more than N respawn attempts have happpened in M seconds [17:02:59] <_joe_> which is a way to avoid spawn-crash-respawn loops [17:03:02] AndyRussG: what exactly is in $_SESSION ? A cookie only for logged-in users? [17:03:04] (03PS3) 10Reedy: Add export-0.9.xsd [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149643 (https://bugzilla.wikimedia.org/68686) [17:03:10] AndyRussG, we use session for keeping 100,000s of users logged in [17:03:12] <^demon|away> AndyRussG: No, if you're hitting PHP you're already past varnish. [17:03:15] <_joe_> in any other situation, it will respawn like in the unlimited case [17:03:17] (03CR) 10Reedy: [C: 032] Add export-0.9.xsd [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149643 (https://bugzilla.wikimedia.org/68686) (owner: 10Reedy) [17:03:21] _joe_: but what's the benefit of staying down? [17:03:21] (03Merged) 10jenkins-bot: Add export-0.9.xsd [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149643 (https://bugzilla.wikimedia.org/68686) (owner: 10Reedy) [17:03:41] as MaxSem said these servers don't have much else to do [17:03:41] if there was a general problem, we would've already had complaints [17:03:43] <^demon|away> bblack: $_SESSION is the PHP superglobal that contains data you've saved to the user's session. [17:03:49] ah [17:03:57]