[00:00:05] RoanKattouw, ^d, marktraceur, MaxSem: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141210T0000). Please do the needful. [00:02:19] I'll take it [00:06:41] (03PS1) 10EBernhardson: Turn on subpages for Archive_talk on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178718 [00:07:12] RoanKattouw: sneaking in a late patch: https://gerrit.wikimedia.org/r/178718 (updating wikitech too) [00:09:16] (03CR) 10Catrope: [C: 032] Turn on subpages for Archive_talk on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178718 (owner: 10EBernhardson) [00:09:28] (03Merged) 10jenkins-bot: Turn on subpages for Archive_talk on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178718 (owner: 10EBernhardson) [00:16:13] !log catrope Synchronized wmf-config/InitialiseSettings.php: Subpages for Archive talk on officewiki (duration: 00m 06s) [00:16:18] Logged the message, Master [00:16:19] ebernhardson: ---^^ [00:16:37] RoanKattouw: tested and works, thanks! [00:17:54] !log catrope Synchronized php-1.25wmf11/includes/api/ApiOpenSearch.php: SwAT: fix empty LinkBatch in opensearch (duration: 00m 05s) [00:17:56] Logged the message, Master [00:19:55] Reedy: ---^^ [00:20:13] should mean the dberror logs aren't spammed to high hell [00:20:15] * Reedy checks [00:20:55] thanks RoanKattouw [00:20:57] AaronSchulz: ^^ [00:30:05] (03PS1) 10Springle: upgrade db1039 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178726 [00:31:30] (03CR) 10Springle: [C: 032] upgrade db1039 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178726 (owner: 10Springle) [00:32:13] springle: is 10.64.0.19 lagged or something? [00:32:46] * AaronSchulz keeps seeing slave wait timeouts in logs [00:33:52] ori: https://gerrit.wikimedia.org/r/178724 [00:34:19] Lag: 0 [00:34:21] hrm [00:34:25] * AaronSchulz keeps an eye on that [00:34:34] !log catrope Synchronized php-1.25wmf10/extensions/WikimediaEvents: SWAT: sendBeacon experiment (duration: 00m 06s) [00:34:38] Logged the message, Master [00:34:40] !log catrope Synchronized php-1.25wmf11/extensions/WikimediaEvents: SWAT: sendBeacon experiment (duration: 00m 05s) [00:34:43] Logged the message, Master [00:37:26] AaronSchulz: db1015 isn't currently lagged, but it has been spiking a bit. something jumped s6 load around 18:40, but havn't dug into it yet [00:37:48] ori: puppet changes unmerged; ok to do so? [00:38:01] springle: yes, thanks [00:38:15] tnx [00:38:17] timeout is like 10 seconds...which is an eternety [00:38:42] *eternity [00:38:44] AaronSchulz: can you rebase that so it doesn't depend on the abstract module patch? [00:39:00] superm401: Your WME thing is live now BTW [00:39:07] I didn't wan't to reference a method that didn't exist in the base class [00:39:08] ori: That's the sendBeacon thing ---^^ [00:39:15] nuria__: You probably care too --^^ [00:39:28] RoanKattouw, thanks. :) [00:39:36] RoanKattouw: NICE! [00:41:07] (03PS1) 10Ori.livneh: Set $wgWMEStatsdBaseUri for WikimediaEvents / statsv [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178729 [00:41:12] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [00:41:21] ori: Good call [00:41:33] (03CR) 10Ori.livneh: [C: 032] Set $wgWMEStatsdBaseUri for WikimediaEvents / statsv [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178729 (owner: 10Ori.livneh) [00:41:42] (03Merged) 10jenkins-bot: Set $wgWMEStatsdBaseUri for WikimediaEvents / statsv [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178729 (owner: 10Ori.livneh) [00:42:36] !log ori Synchronized wmf-config/CommonSettings.php: Set $wgWMEStatsdBaseUri for WikimediaEvents / statsv (duration: 00m 07s) [00:42:40] Logged the message, Master [00:46:53] RoanKattouw: we will let you know what we find, we need to make sure we get data from a variety of browsers. [00:47:05] Thanks [00:47:08] !log ori Synchronized php-1.25wmf10/extensions/Math: Ic438b307a3b46: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 07s) [00:47:11] Logged the message, Master [00:54:39] !log upgrade db1039 trusty [00:54:43] Logged the message, Master [00:54:43] !log catrope Synchronized php-1.25wmf11/extensions/WikimediaEvents: SWAT (duration: 00m 05s) [00:54:47] Logged the message, Master [00:54:50] !log catrope Synchronized php-1.25wmf11/extensions/VisualEditor: SWAT (duration: 00m 06s) [00:54:53] Logged the message, Master [01:12:54] !log ori Synchronized php-1.25wmf11/extensions/WikimediaEvents: Ie9ca5d3: Update WikimediaEvents for cherry-picks (duration: 00m 07s) [01:13:01] Logged the message, Master [01:18:55] (03PS1) 10Ori.livneh: mediawiki::monitoring::webserver: provision `apachetop` [puppet] - 10https://gerrit.wikimedia.org/r/178740 [01:19:18] (03PS2) 10Ori.livneh: mediawiki::monitoring::webserver: provision `apachetop` [puppet] - 10https://gerrit.wikimedia.org/r/178740 [01:30:37] (03PS1) 10Springle: repool db1039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178743 [01:31:00] (03CR) 10Springle: [C: 032] repool db1039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178743 (owner: 10Springle) [01:31:09] (03Merged) 10jenkins-bot: repool db1039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178743 (owner: 10Springle) [01:31:55] !log springle Synchronized wmf-config/db-eqiad.php: repool db1039, warm up (duration: 00m 05s) [01:32:02] Logged the message, Master [01:32:29] (03PS1) 10Yurik: Zero: Temporary removal of several XCS IDs from analytics [puppet] - 10https://gerrit.wikimedia.org/r/178745 [01:32:34] bblack, around ^ [01:56:24] (03PS1) 10Springle: depool db1015 RT 9027 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178751 [01:57:04] (03CR) 10Springle: [C: 032] depool db1015 RT 9027 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178751 (owner: 10Springle) [01:57:13] (03Merged) 10jenkins-bot: depool db1015 RT 9027 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178751 (owner: 10Springle) [01:57:31] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 75934 MB (3% inode=99%): [01:58:34] !log springle Synchronized wmf-config/db-eqiad.php: depool db1015 RT 9027 (duration: 00m 06s) [01:58:41] Logged the message, Master [01:59:01] !log manually ran /etc/cron.daily/logrotate on fluorine [01:59:05] Logged the message, Master [02:19:06] !log l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s) [02:19:10] !log LocalisationUpdate completed (1.25wmf10) at 2014-12-10 02:19:10+00:00 [02:19:11] Logged the message, Master [02:19:14] Logged the message, Master [02:26:34] !log l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s) [02:26:38] !log LocalisationUpdate completed (1.25wmf11) at 2014-12-10 02:26:38+00:00 [02:26:38] Logged the message, Master [02:26:41] Logged the message, Master [02:48:23] (03Abandoned) 10Tim Starling: Move idiosyncratic gdbinit to /home/ori [puppet] - 10https://gerrit.wikimedia.org/r/176307 (owner: 10Tim Starling) [03:26:19] OK, odd tech issue. [03:27:04] On commons, I (and other people) can edit a specific file page, but adding the text “{{duplicate|Ham the chimp (cropped).jpg}}” gives a 503 error. [03:27:28] Request: POST http://commons.wikimedia.org/w/index.php?title=File:Ham_the_chimp.jpg&action=submit, from 10.64.0.102 via cp1054 cp1054 ([10.64.32.106]:3128), Varnish XID 606578045 [04:06:39] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Dec 10 04:06:39 UTC 2014 (duration 6m 38s) [04:06:44] Logged the message, Master [04:34:16] revent: File a ticket at https://phabricator.wikimedia.org please. [04:34:34] Fiona: kk [05:08:10] (03PS1) 10Springle: depool db1037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178766 [05:08:33] (03CR) 10Springle: [C: 032] depool db1037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178766 (owner: 10Springle) [05:09:32] !log springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 06s) [05:09:36] Logged the message, Master [05:13:36] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 76228 MB (3% inode=99%): [05:19:26] !log s6 xtrabackup clone db1015 to db1037 [05:19:31] Logged the message, Master [05:34:56] RECOVERY - Disk space on fluorine is OK: DISK OK [05:38:25] (03PS1) 10KartikMistry: WIP: Fix feature flag wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178770 [05:38:33] (03CR) 10jenkins-bot: [V: 04-1] WIP: Fix feature flag wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178770 (owner: 10KartikMistry) [05:45:42] (03PS1) 10KartikMistry: Beta: Fix spacing and indentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178772 [05:50:23] (03PS2) 10KartikMistry: WIP: Fix feature flag wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178770 [05:50:31] (03CR) 10jenkins-bot: [V: 04-1] WIP: Fix feature flag wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178770 (owner: 10KartikMistry) [05:51:41] bah. Need coffee. [06:29:04] (03PS1) 10Springle: raise tcp back log for mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178777 [06:30:38] (03CR) 10Springle: [C: 032] raise tcp back log for mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178777 (owner: 10Springle) [06:31:54] (03PS1) 10Springle: upgrade db1037 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178778 [06:33:08] (03CR) 10Springle: [C: 032] upgrade db1037 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178778 (owner: 10Springle) [06:33:37] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: puppet fail [06:34:16] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:33] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:22] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: puppet fail [06:35:45] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 2 failures [06:36:25] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 2 failures [06:36:42] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:56] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:45:27] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [06:45:31] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:47:17] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:47:36] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:48:00] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:51] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:49:46] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:56:37] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:37] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:47] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:57:22] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:57:57] PROBLEM - puppet last run on search1007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:03] PROBLEM - puppet last run on plutonium is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:20] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:28] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:31] <_joe_> good morning [06:58:38] <_joe_> hi puppet [06:59:07] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:59:29] PROBLEM - puppet last run on virt1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:59:33] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Puppet has 1 failures [06:59:38] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: Puppet has 1 failures [07:00:14] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 1 failures [07:00:14] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: Puppet has 1 failures [07:00:30] PROBLEM - puppet last run on mw1111 is CRITICAL: CRITICAL: Puppet has 1 failures [07:00:53] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: Puppet has 1 failures [07:02:45] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: Puppet has 1 failures [07:03:04] <_joe_> (this is an apt failure) [07:03:41] PROBLEM - puppet last run on db1057 is CRITICAL: CRITICAL: Puppet has 1 failures [07:06:36] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:06:36] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [07:06:51] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:06:52] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [07:07:18] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:08:02] RECOVERY - puppet last run on search1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:08:13] RECOVERY - puppet last run on plutonium is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [07:08:14] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [07:08:37] hi _joe_ [07:08:40] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:08:54] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [07:08:55] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [07:08:59] darn it, "/nick puppet" didn't work [07:09:17] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [07:09:19] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:09:55] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:09:55] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:10:04] bblack, around? when you have a chnace. Thx! :) https://gerrit.wikimedia.org/r/#/c/178745/ [07:10:21] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [07:10:36] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:10:38] ah crap [07:11:02] Any op around to delete a file for me? [07:11:11] (03CR) 10Nikerabbit: [C: 04-1] "I don't get what we are trying to fix here." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178770 (owner: 10KartikMistry) [07:12:33] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:12:38] _joe_: ^ if you have a secnd [07:13:25] <_joe_> hoo: yeah [07:13:33] RECOVERY - puppet last run on db1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:13:34] :) [07:13:37] on snapshot1003 [07:14:19] <_joe_> hoo: which file? [07:14:47] /mnt/data/xmldatadumps/public/other/wikidata/20141209.json.gz [07:14:59] For some reason that dump seems to be incomplete [07:15:13] Haven't yet come around to debug that [07:15:15] <_joe_> {{done}} [07:15:20] \o/ Thanks [07:15:30] <_joe_> hoo: well this was easy :P [07:15:42] (03Abandoned) 10KartikMistry: WIP: Fix feature flag wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178770 (owner: 10KartikMistry) [07:16:16] <_joe_> Nikerabbit: so, still seeing bad things(TM) with HHVM on translatewiki? [07:18:42] !log upgrade db1007 trusty [07:18:47] Logged the message, Master [07:23:48] (03PS1) 10Springle: upgrade db1007 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178781 [07:24:22] (03PS1) 10Hoo man: Log stderr of dumpwikidatajson.sh [puppet] - 10https://gerrit.wikimedia.org/r/178782 [07:25:32] (03CR) 10Springle: [C: 032] upgrade db1007 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178781 (owner: 10Springle) [07:25:43] (03PS2) 10Hoo man: Log stderr of dumpwikidatajson.sh [puppet] - 10https://gerrit.wikimedia.org/r/178782 [07:32:45] Can I redirect multiple outputs into the same file, simultaneously [07:33:06] Multiple parallel process all being >> foo eg. [07:33:10] I think I can [07:35:11] (03PS1) 10KartikMistry: Beta: set wmgContentTranslationCluster to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178783 [07:38:05] (03PS3) 10Hoo man: Log stderr of dumpwikidatajson.sh [puppet] - 10https://gerrit.wikimedia.org/r/178782 [07:38:18] _joe_: yeah after a time it just stops responding and nginx complains about timeouts [07:39:00] ori: Could you have a look at https://gerrit.wikimedia.org/r/178782 please? [07:39:58] (03CR) 10Ori.livneh: [C: 032] Log stderr of dumpwikidatajson.sh [puppet] - 10https://gerrit.wikimedia.org/r/178782 (owner: 10Hoo man) [07:40:30] (03CR) 10Nikerabbit: [C: 032] "Let's try this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178783 (owner: 10KartikMistry) [07:40:38] Thanks :) [07:40:41] (03Merged) 10jenkins-bot: Beta: set wmgContentTranslationCluster to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178783 (owner: 10KartikMistry) [07:41:12] :P These log files will be owned by datasets, so I'll have to ask someone to copy them for me, if I want to have a look [07:42:45] ah crap [07:42:49] that wont work [07:42:58] *sigh* [07:44:23] _joe_: in fact it's currently in that state [07:44:53] <_joe_> Nikerabbit: whenever you like, I can try to help, but I'd need shell access [07:46:24] _joe_: I'm still getting occasional error pages from the API with the x-analytics header indicating php=zend [07:46:59] (03PS1) 10Springle: repool db1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178785 [07:48:09] I suspect that the header is bogus though [07:48:23] (03CR) 10Springle: [C: 032] repool db1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178785 (owner: 10Springle) [07:48:30] (03Merged) 10jenkins-bot: repool db1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178785 (owner: 10Springle) [07:48:49] (03PS1) 10Hoo man: Log stderr of dumpwikidatajson to /var/log/wikidatadump [puppet] - 10https://gerrit.wikimedia.org/r/178786 [07:48:55] ori: ^ [07:48:59] I think that will work [07:49:09] wait [07:49:15] !log springle Synchronized wmf-config/db-eqiad.php: repool db1007 (duration: 00m 05s) [07:49:17] $ groups datasets [07:49:17] datasets : datasets apache systemusers [07:49:19] Logged the message, Master [07:49:25] so maybe the old one would have worked as well [07:49:26] mh [07:49:37] This aint fun [07:50:58] drwxr-xr-x 2 apache wikidev 4096 Jun 3 2014 mediawiki [07:51:07] ok, so we actually need another folder [07:51:16] datasets wont be able to create files there AFAIS [07:51:30] <_joe_> gwicke: yeah I should fix that [07:51:54] <_joe_> gwicke: when the backend doesn't respond, varnish falls back to its default, which was php=zend [07:52:07] <_joe_> gwicke: is it an occasional api error? [07:52:30] <_joe_> I'd be interested in how parsoid sees the switch to hhvm [07:53:56] ori: _joe_: Could some please have a look at https://gerrit.wikimedia.org/r/178786 ? [07:54:09] I can't test that, but we really need some form of debug output [07:54:28] s/debug/log/ [08:00:04] _joe_: I'm mostly testing with restbase right now, which in turn hits parsoid for anything it doesn't have in storage [08:01:03] parsoid is not too well instrumented currently, but from what I see the parsoid response times seem to have improved quite a bit since the switch to HHVM [08:01:46] mean parsoid response time for misses is currently around 500ms [08:02:27] not long ago that was closer to 2s [08:02:50] <_joe_> gwicke: whoa [08:03:11] it's not much data to make a call on just yet [08:03:21] but so far it looks promising [08:03:52] we did see a slow-down in VE response times as the API cluster load rose [08:04:08] sadly that dashboard is currently broken: https://gdash.wikimedia.org/dashboards/ve/ [08:04:24] but you see the trend there improving recently [08:05:23] I'm curious what the null request time for HHVM API requests is [08:05:36] with zend it was quite high, something like 40ms [08:06:50] waiting for https://gdash.wikimedia.org/dashboards/apimethods/ to load [08:07:50] apergos: If you're around https://gerrit.wikimedia.org/r/178786 [08:09:26] morning [08:09:34] hi paravoid [08:09:40] _joe_: those api stats look a bit strange right now [08:09:45] you surely want to review something... https://gerrit.wikimedia.org/r/178786 [08:09:51] morning [08:10:05] !log ori Synchronized php-1.25wmf11/extensions/CommonsMetadata/TemplateParser.php: Update CommonsMetadata for cherry-picks (duration: 00m 05s) [08:10:08] Logged the message, Master [08:11:05] _joe_: one very noticeable change though is that I can now push the parsoid cluster to 50+% load without overloading the API cluster ;) [08:11:30] <_joe_> gwicke: eheh [08:11:36] <_joe_> hhvm >> node [08:12:24] are we keeping both of those clusters at their current capacity? [08:12:30] hhvm is catching up ;) [08:12:31] 10% and 20% respectively, I think [08:12:32] gwicke: don't do that just yet [08:12:41] it's kinda of a waste [08:13:41] the api cluster hasn't been on hhvm for 24 hours yet [08:13:51] let it settle, folks :P [08:14:02] afraid we're gonna jinx it? [08:14:21] I wasn't suggesting to do anything now [08:14:21] only because the last ten months have been a series of "OMG IT'S READY" followed by "oh, fuck" [08:14:23] hey, you'll see any issues in less time ;) [08:14:55] but yes, I'm happy to throttle to account for time of day etc [08:15:06] I think at some point I predicted something like "end of 2014" [08:15:15] ;) [08:15:16] I'm not pushing very hard though [08:15:27] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Parsoid%2520eqiad&tab=m&vn= [08:15:43] btw, _joe_/ori: don't forget we have to do an ICU transition after this too [08:15:45] <_joe_> paravoid: I did the same prediction btw, in june maybe? [08:15:56] <_joe_> paravoid: ICU? [08:16:00] libicu [08:16:06] <_joe_> oh ok [08:16:24] oh, crap [08:16:29] i forgot to walk the dog [08:16:32] <_joe_> ? [08:16:37] <_joe_> ahahahah [08:16:42] you have a dog too? [08:16:52] who do you think does my deploys? [08:17:00] lol [08:17:01] <_joe_> he's called "I don't want to mess with unicode" [08:17:10] but seriously, bbiab [08:17:16] <_joe_> eheh [08:18:04] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [500.0] [08:18:18] <_joe_> mh [08:18:34] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [500.0] [08:20:19] I'm seeing a bunch of errors in logstash now [08:20:37] Request: POST http://de.wikipedia.org/w/api.php, from 10.64.0.102 via cp1052 cp1052 ([10.64.32.104]:3128), Varnish XID 2961626817
Forwarded for: 10.64.0.200, 10.64.0.102
Error: 503, Service Unavailable at Wed, 10 Dec 2014 08:17:22 GMT [08:21:04] not too common just yet, similar rates as with zend so far [08:22:27] <_joe_> gwicke: well, we just had a 503 spike [08:22:55] *nod* [08:23:19] * hoo still looks for someone to review https://gerrit.wikimedia.org/r/178786 [08:24:21] <_joe_> hoo: sorry, got in a few other things in the meanwhile, I'll take a look now [08:24:31] Thanks [08:24:56] <_joe_> hoo: why the change of path? [08:25:01] <_joe_> so that you can access it? [08:25:17] _joe_: That and so that datasets is able to actually create the log file [08:25:20] :P [08:25:26] <_joe_> lol [08:25:47] I realized it wont be able to use /var/log/mediawiki [08:26:03] <_joe_> why is that? [08:26:17] <_joe_> I know almost zero about dumps, sorry [08:26:55] <_joe_> hoo: I'll merge it, but lemme take a look at the situation [08:27:06] drwxr-xr-x 2 apache wikidev 4096 Jun 3 2014 mediawiki [08:27:11] that /var/log/mediawiki [08:27:26] $ groups datasets datasets : datasets apache systemusers [08:27:38] <_joe_> ok [08:27:57] <_joe_> so making it group writable and of group apache would allow you to write there, right? [08:28:19] _joe_: It would allow datasets to write there and me to read it (as I can sudo -u apache) [08:28:22] <_joe_> is there any particular reason why you wanted to write to /var/log/mediawiki? dumps are kinda separate right? [08:28:31] Yeah, they are [08:28:46] but still the scripts are part of a mediawiki extension [08:28:52] so one could argue either way [08:30:28] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [08:30:58] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:34:12] _joe_: so, you're not going to like this solution [08:34:21] in fact, you may even say 'eww' [08:34:27] but i think it's a reasonable solution, so think it over [08:34:32] (path incoming) [08:34:35] *patch [08:34:44] <_joe_> ori: for what? [08:35:01] the gradual repo accretion [08:35:19] <_joe_> ori: looks like your patch worked earlier, but right now https://commons.wikimedia.org/wiki/File:DeEenzamenPost.gif is a 503 again [08:35:23] <_joe_> how's that possible? [08:35:29] i could have been wrong [08:35:32] i'll look [08:35:52] <_joe_> no I have the url in front of me rendered correctly earlier [08:38:54] !log ori Synchronized php-1.25wmf11/extensions/CommonsMetadata: (no message) (duration: 00m 07s) [08:38:56] Logged the message, Master [08:40:21] I'll be back in a bit, shouled there be more questions [08:40:39] (03CR) 10Giuseppe Lavagetto: [C: 032] Log stderr of dumpwikidatajson to /var/log/wikidatadump [puppet] - 10https://gerrit.wikimedia.org/r/178786 (owner: 10Hoo man) [08:40:47] <_joe_> actually no :P [08:43:50] <_joe_> ori: actually all tables in the sqlite database have a 'schema version' attached to the table name [08:44:01] <_joe_> so we could just delete the old ones [08:48:28] <_joe_> and this is a first order of business to do :) [08:49:59] <_joe_> hoo: merged [08:50:18] Awesome, thanks [08:50:46] _joe_: Once puppet has run on snapshot1003, could you possibly manually start the shell script? [08:51:03] Third or fourth time this fails this week [08:51:05] <_joe_> hoo: as root? [08:51:18] _joe_: no, datasets [08:51:44] it will run for ~12h [08:51:47] <_joe_> ok, gimme 10 minutes [08:56:30] (03PS1) 10Ori.livneh: HHVM: on service stop, 1-in-10 odds to clear repo [puppet] - 10https://gerrit.wikimedia.org/r/178794 [08:56:52] _joe_: i told you you weren't going to like it [08:56:55] but think it over :) [08:57:45] <_joe_> ori: 1 sec [08:58:08] (03PS2) 10Ori.livneh: HHVM: on service stop, 1-in-10 odds to clear repo [puppet] - 10https://gerrit.wikimedia.org/r/178794 [09:00:01] pros: solves the problem, cons: makes start-up performance somewhat non-deterministic (i.e., slower in 10% of cases) [09:00:22] <_joe_> !log cleaning and vacuuming the hhvm repo on mw1030 [09:00:27] Logged the message, Master [09:01:22] <_joe_> ori: mw1030 just went from 700 M to 110 [09:01:29] <_joe_> by just cleaning old schemas [09:01:45] <_joe_> without needing a restart [09:02:02] interesting [09:02:15] <_joe_> we should definitely remove it in the pre-install of the hhvm package, I guess [09:02:29] <_joe_> but how do we know where the file is? [09:02:51] well, because we do [09:03:13] <_joe_> well, we do, the package doesnt' necessarily know [09:03:24] i like that we try to be generic but there's no need to go full retard [09:03:26] <_joe_> now I just need to find a reliable way to find the current schema version [09:03:41] sorry, bad choice of words [09:03:55] it's from that stupid movie [09:03:56] <_joe_> which is not 'I happened to install a server yesterday, so I know the schema version' [09:04:00] <_joe_> ? [09:04:08] https://www.youtube.com/watch?v=X6WHBO_Qc-Q [09:04:52] <_joe_> oh never seen that [09:05:27] did you file this on hhvm's github? [09:05:36] <_joe_> paravoid: what? [09:05:39] we're not going to be the only ones with this problem [09:05:49] so they might be interested in either fixing it or documenting it somewhere [09:06:29] <_joe_> paravoid: it's not something that needs "fixing" per se, I'll prepare some docs when I figured this out completely [09:06:30] oh, wait [09:06:44] you were able to remove just the entries for the old schemas? [09:06:47] but leave the file in place? [09:06:51] <_joe_> yes [09:06:59] <_joe_> it's sqlite after all [09:06:59] ok, that's nice [09:07:25] anyways, you have curl localhost:9002/repo-schema [09:07:29] _joe_: of course it is, "running hhvm fills your /run" :) [09:07:33] <_joe_> ori: meh [09:07:43] btw, /run is probably a poort choice for a bytecode cache [09:07:46] poor* [09:07:51] from an FHS perspective [09:07:51] how come? [09:07:54] oh [09:07:54] that would be /var/cache [09:08:25] just be happy it's not /a/common/cache [09:08:31] <_joe_> paravoid: yes, I did that because it was in-memory and it looked like it would do a lot of I/O, which is not the case [09:08:53] /run is really for pidfiles, lockfiles, sockets etc. [09:08:54] <_joe_> and I told myself a few times already I should move it to /var/cache [09:09:02] <_joe_> yeah [09:09:04] or `hhvmadm repo-schema` [09:09:10] <_joe_> it's not like I don't know that [09:10:06] <_joe_> ori: ok, thanks [09:10:19] it's just a wrapper for curl though [09:11:16] <_joe_> ori: I'll do a small python script to allow this maintenance activity [09:11:44] <_joe_> then I think I'll move the cache location for a couple of servers to see if the performance hit is non-existent as I hope it is [09:12:30] (03PS2) 10Faidon Liambotis: Convert rsyslog config to RainerScript-based filters [puppet] - 10https://gerrit.wikimedia.org/r/177432 (owner: 10BryanDavis) [09:12:37] (03CR) 10Faidon Liambotis: [C: 032] Convert rsyslog config to RainerScript-based filters [puppet] - 10https://gerrit.wikimedia.org/r/177432 (owner: 10BryanDavis) [09:14:43] (03PS4) 10Faidon Liambotis: Redirect wikimedia.community to www.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/176508 (owner: 10Glaisher) [09:14:56] (03CR) 10Faidon Liambotis: [C: 032] Redirect wikimedia.community to www.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/176508 (owner: 10Glaisher) [09:19:40] (03CR) 10Faidon Liambotis: [C: 04-1] "I haven't looked at the code but I have some considerations over using misc-web for this purpose. Misc-web was envisioned as a Varnish clu" [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [09:20:09] _joe_: how about now? [09:24:14] (03PS2) 10Faidon Liambotis: ganglia_new: switch to stock upstart script [puppet] - 10https://gerrit.wikimedia.org/r/178512 [09:33:45] (03CR) 10Faidon Liambotis: [C: 032] ganglia_new: switch to stock upstart script [puppet] - 10https://gerrit.wikimedia.org/r/178512 (owner: 10Faidon Liambotis) [09:35:26] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 685 [09:35:51] _joe_: do you mind if i write it? it'll be fun for me [09:36:06] <_joe_> for me too :P [09:36:12] you? python? [09:36:18] but you never do that *g* [09:36:19] ok, go for it then [09:36:27] it's your idea, it's only fair [09:37:22] good night! [09:39:50] <_joe_> ori: night [09:40:20] RECOVERY - check_mysql on db1008 is OK: Uptime: 4824805 Threads: 71 Questions: 118352028 Slow queries: 32110 Opens: 92515 Flush tables: 2 Open tables: 64 Queries per second avg: 24.529 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [09:41:47] greetings [09:42:29] PROBLEM - puppet last run on es2006 is CRITICAL: CRITICAL: puppet fail [09:42:56] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: puppet fail [09:43:57] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: puppet fail [09:43:57] PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: puppet fail [09:44:09] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: puppet fail [09:44:20] PROBLEM - puppet last run on db2010 is CRITICAL: CRITICAL: puppet fail [09:44:27] that would be me [09:44:30] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: puppet fail [09:44:36] fixing [09:46:11] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: puppet fail [09:46:23] PROBLEM - puppet last run on ms-be2015 is CRITICAL: CRITICAL: puppet fail [09:46:23] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: puppet fail [09:47:08] PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: puppet fail [09:48:00] (03PS1) 10Faidon Liambotis: ganglia_new: fix broken dependency [puppet] - 10https://gerrit.wikimedia.org/r/178797 [09:48:02] (03PS1) 10Faidon Liambotis: ganglia: fix check-gmond-restart across distros [puppet] - 10https://gerrit.wikimedia.org/r/178798 [09:48:23] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail [09:48:47] PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: puppet fail [09:48:51] (03CR) 10Faidon Liambotis: [C: 032 V: 032] ganglia_new: fix broken dependency [puppet] - 10https://gerrit.wikimedia.org/r/178797 (owner: 10Faidon Liambotis) [09:49:02] (03CR) 10Faidon Liambotis: [C: 032 V: 032] ganglia: fix check-gmond-restart across distros [puppet] - 10https://gerrit.wikimedia.org/r/178798 (owner: 10Faidon Liambotis) [09:49:38] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: puppet fail [09:50:22] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: puppet fail [09:50:43] PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail [09:50:50] stupid ubuntu [09:50:53] PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail [09:50:53] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail [09:51:01] PROBLEM - puppet last run on labstore2001 is CRITICAL: CRITICAL: puppet fail [09:51:16] PROBLEM - puppet last run on db2019 is CRITICAL: CRITICAL: puppet fail [09:51:23] PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: puppet fail [09:51:32] PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: puppet fail [09:51:36] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [09:51:51] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: puppet fail [09:51:52] (it's fixed, these are old errors) [09:52:03] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail [09:52:04] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [09:52:41] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: puppet fail [09:53:10] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: puppet fail [09:53:35] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: puppet fail [09:53:36] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:53:45] PROBLEM - puppet last run on es2001 is CRITICAL: CRITICAL: puppet fail [09:54:12] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: puppet fail [09:54:34] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: puppet fail [09:54:46] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: puppet fail [09:54:50] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: puppet fail [09:54:51] PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: puppet fail [09:55:00] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: puppet fail [09:55:13] PROBLEM - puppet last run on ms-fe2003 is CRITICAL: CRITICAL: puppet fail [09:55:14] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail [09:55:32] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [09:55:46] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail [09:55:58] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: puppet fail [09:56:28] PROBLEM - puppet last run on db2007 is CRITICAL: CRITICAL: puppet fail [09:56:28] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: puppet fail [09:57:16] RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [09:57:17] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: puppet fail [09:57:28] RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:57:28] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [09:57:55] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [09:57:58] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: puppet fail [09:58:17] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: puppet fail [09:58:45] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [09:58:45] RECOVERY - puppet last run on es2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:58:45] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: puppet fail [09:59:36] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:59:38] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [09:59:38] RECOVERY - puppet last run on ms-be2015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:59:38] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:00:21] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:00:22] RECOVERY - puppet last run on lvs2005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:02:40] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:03:20] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:03:52] RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:03:52] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:03:52] RECOVERY - puppet last run on labstore2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:04:20] RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:04:32] RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:04:41] RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:04:46] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [10:04:46] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:04:53] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:05:00] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:05:06] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:06:32] RECOVERY - puppet last run on es2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:06:41] RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:07:08] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:07:31] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:07:40] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:07:47] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [10:08:03] RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:08:04] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:08:05] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [10:08:30] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:08:37] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:09:24] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:09:25] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:09:44] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:10:03] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:10:44] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:10:57] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:10:57] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:11:16] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:11:16] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:11:40] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:11:42] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:11:45] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:11:51] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:12:30] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:17:40] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: puppet fail [10:19:00] godog: lucid/statsd? :) [10:19:08] (nescio is still broken) [10:23:42] paravoid: heh, os_version used with || doesn't seem to DTRT [10:24:01] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:24:44] it doesn't? [10:25:27] (03PS1) 10Hashar: contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178799 [10:25:46] (03PS3) 10Christopher Johnson (WMDE): adds Sprint extension (0.6.1.2) to role::phabricator::main [puppet] - 10https://gerrit.wikimedia.org/r/178194 [10:25:54] hashar: that's a bit of a useless commit message [10:26:11] paravoid: let me wip it [10:26:37] (03PS2) 10Hashar: (WIP IGNORE) contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178799 [10:26:55] gerrit has draft changes too btw, won't spam with messages [10:26:58] (03CR) 10Hashar: [C: 04-1 V: 04-1] "Work in progress to stage change on the integration puppetmaster" [puppet] - 10https://gerrit.wikimedia.org/r/178799 (owner: 10Hashar) [10:27:05] indeed [10:27:54] godog: what was the problem you saw? [10:28:19] paravoid: when I merged the change with || puppet failed saying it couldn't find the reference to Class[packages::python-stats] and I'm assuming it was because require_package didn't get evaluated which means os_version was false [10:28:23] let me double check now [10:28:57] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I like the approach, -1 for minor code issues, otherwise LGTM" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/178470 (owner: 10Giuseppe Lavagetto) [10:29:17] <_joe_> akosiaris: thanks! [10:29:34] _joe_: :-) [10:30:34] <_joe_> akosiaris: I love when people point out how I can enhance my code [10:30:36] _joe_: will you be around this afternoon for some hhvm related questions I mighthave? [10:30:44] <_joe_> hashar: yes [10:31:07] <_joe_> although I'll probably take a longer break given yesterday I reimaged servers for 12 hours straight basically [10:31:07] _joe_: oh, and I will show up with some hiera... [10:31:35] (03PS1) 10Faidon Liambotis: Correct os_version's lucid/precise version numbers [puppet] - 10https://gerrit.wikimedia.org/r/178800 [10:31:38] _joe_: I am doing the basic grunt work now, if you are resting I will just drop you an email :] [10:31:51] <_joe_> akosiaris: since you're the best ruby coder around, can you take a look at https://gerrit.wikimedia.org/r/#/c/176334/ [10:32:06] <_joe_> hashar: not resting, maybe just not around all the time :) [10:32:07] i am what ? [10:32:10] <_joe_> ahahahah [10:32:15] <_joe_> :D [10:33:11] * _joe_ throws some gems to akosiaris [10:33:19] :P [10:33:21] godog: [10:33:21] package { [ 'python-diamond', 'python-configobj' ]: [10:33:21] ensure => present, [10:33:22] require => Class['packages::python_statsd'], [10:33:22] } [10:35:32] (03PS3) 10Hashar: (WIP IGNORE) contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178799 [10:35:46] godog: ^^^ that one is a draft but still notify :-( [10:36:37] hashar: 178799? doesn't look draft to me? [10:36:46] who is the "flow talk page manager" user? [10:36:49] godog: PS3 is ! [10:37:02] (03PS1) 10Faidon Liambotis: diamond: install python-statsd on >= precise [puppet] - 10https://gerrit.wikimedia.org/r/178801 [10:37:06] godog: ^ [10:37:39] (03CR) 10Faidon Liambotis: [C: 032] Correct os_version's lucid/precise version numbers [puppet] - 10https://gerrit.wikimedia.org/r/178800 (owner: 10Faidon Liambotis) [10:37:49] (03CR) 10Faidon Liambotis: [C: 032] diamond: install python-statsd on >= precise [puppet] - 10https://gerrit.wikimedia.org/r/178801 (owner: 10Faidon Liambotis) [10:38:05] godog: https://phabricator.wikimedia.org/F20461 :D [10:39:07] paravoid: hah okay, that makes more sense [10:39:46] hashar: ah indeed, what if you update the draft now? [10:39:46] btw, > matches is not currently a good idea [10:40:04] os_version assumes wheezy is e.g. "7.0", but stable updates bump the minor number [10:40:18] (wheezy is currently 7.7 with 7.8 coming out in january) [10:40:25] so > wheezy is going to match right now [10:42:31] (03PS3) 10Dzahn: icinga: use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178487 [10:42:38] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:42:56] paravoid: good to know, worth a disclaimer in os_version docs I think [10:43:57] (03PS1) 10Giuseppe Lavagetto: mediawiki: hhvm_cleanup_cache - housekeeping tool for hhvm cache [puppet] - 10https://gerrit.wikimedia.org/r/178802 [10:43:59] <_joe_> or, we could make it work correctly :) [10:44:17] <_joe_> s/correctly/as most people would expect/ [10:44:53] RECOVERY - puppet last run on ms1004 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:47:22] it's hard to do so without hardcoding stuff [10:47:31] (03PS2) 10Giuseppe Lavagetto: mediawiki: hhvm_cleanup_cache - housekeeping tool for hhvm cache [puppet] - 10https://gerrit.wikimedia.org/r/178802 [10:47:36] (03CR) 10Dzahn: [C: 032] icinga: use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178487 (owner: 10Dzahn) [10:47:39] woody was 3.0 and sarge was 3.1 [10:47:50] otoh, we could remove all these ancient versions [10:47:57] I doubt anyone is going to run puppet on sarge :P [10:48:11] (03PS4) 10Hashar: (WIP IGNORE) contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178799 [10:48:19] godog: same deal, drafts are not ignored :D [10:48:36] we could also remove the versioncmp deal altogether [10:48:49] and just make these arrays [10:48:57] regarding os_version, I attempted to write some rspec to test the puppet function but eventually got blocked with some error :-/ I can push the change if there is any interest [10:49:05] what the.. so that's why we did not switch icinga to use ssl_ciphersuite yet [10:49:21] undefined method `join' for nil:NilClass [10:50:49] hashar: I think because it started as a non-draft, I've just created https://gerrit.wikimedia.org/r/#/c/178803/ with git review -D and it didn't notify [10:52:43] (03CR) 10Dzahn: "undefined method `join' for nil:NilClass - can't parse template - where do i need to put the settings line then??" [puppet] - 10https://gerrit.wikimedia.org/r/178487 (owner: 10Dzahn) [10:55:09] (03PS1) 10Dzahn: icinga: move ssl_settings to role class [puppet] - 10https://gerrit.wikimedia.org/r/178805 [10:57:00] integration.wm is down - hashar, that's you working? [10:57:08] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [10:57:29] (03CR) 10Hashar: "There is a sorting order issue with redirects.conf. The apache lint job complains that the proposed redirects.conf does not match the outp" [puppet] - 10https://gerrit.wikimedia.org/r/176508 (owner: 10Glaisher) [10:57:40] Host integration.wikimedia.org not found: 3(NXDOMAIN) [10:57:47] paravoid: redirects.conf seems to be inconsistent now :( https://integration.wikimedia.org/ci/job/operations-apache-config-lint/5102/console [10:58:13] mutante: DNS issue on your end? [10:58:15] strange [10:58:38] hashar: yes, DNS issue on my end it looks indeed [10:58:58] mutante: it is a CNAME to the misc varnish ( misc-web-lb.eqiad.wikimedia.org. ) [10:59:04] <_joe_> mmmh my script fails on mw1018 because well, /run is already full and sqlite3 vacuum function sucks [11:00:00] (03Abandoned) 10Hashar: (WIP IGNORE) contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178799 (owner: 10Hashar) [11:00:20] (03PS1) 10Faidon Liambotis: apache: refresh redirects, were inconsistent [puppet] - 10https://gerrit.wikimedia.org/r/178807 [11:00:50] (03CR) 10Faidon Liambotis: [C: 032] apache: refresh redirects, were inconsistent [puppet] - 10https://gerrit.wikimedia.org/r/178807 (owner: 10Faidon Liambotis) [11:01:09] paravoid: the apache lint job is non voting for now since I am not sure whether it properly catch potential issues [11:01:12] _joe_ (or our excellent ruby coder, akosiaris): maybe we should revive https://gerrit.wikimedia.org/r/#/c/138292/ [11:01:33] hashar: we have jenkins tests that actually find errors!?! [11:01:37] <_joe_> paravoid: yeah I thought about that [11:01:48] in my experience, jenkins catches about 10-20% of errors [11:01:59] and consumes about 50% of my time when pushing [11:02:22] <_joe_> paravoid: well *if* we wrote unit tests for puppet modules [11:02:29] <_joe_> it *would* be able to catch some more [11:02:55] paravoid: yeah we did on the old operations/apache-config.git repo but they got lost when the content got migrated to puppet. Related task is https://phabricator.wikimedia.org/T72068 [11:03:09] it { should catch.stuff } [11:04:53] yes, we did have jenkins testing before it was moved out of apache-config repo [11:06:05] hashar: and here were some tests btw https://gerrit.wikimedia.org/r/#/c/176871/ but they were not touching redirects.conf [11:08:17] (03Abandoned) 10Dzahn: broken edit to test jenkins check [puppet] - 10https://gerrit.wikimedia.org/r/176871 (owner: 10Dzahn) [11:08:23] (03PS1) 10Hashar: Basic rspec setup [puppet] - 10https://gerrit.wikimedia.org/r/178810 [11:08:54] (03CR) 10Dzahn: [C: 032] icinga: move ssl_settings to role class [puppet] - 10https://gerrit.wikimedia.org/r/178805 (owner: 10Dzahn) [11:09:30] (03PS3) 10Giuseppe Lavagetto: mediawiki: hhvm_cleanup_cache - housekeeping tool for hhvm cache [puppet] - 10https://gerrit.wikimedia.org/r/178802 [11:09:58] (03PS4) 10Giuseppe Lavagetto: mediawiki: hhvm_cleanup_cache - housekeeping tool for hhvm cache [puppet] - 10https://gerrit.wikimedia.org/r/178802 [11:11:27] <_joe_> I'm merging this ^^ given it works and I kinda need it :) [11:12:02] (03CR) 10Hashar: "During summer 2013, Alexandros Andrew and I attempted to add some rspec based test for the puppet repository. Since rspec-puppet ends up " [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [11:12:51] mutante: that is the idea, we need some broken config to verify the job catch them [11:12:53] _joe_: now icinga also uses ssl_ciphersuite [11:12:58] I am off, be back later [11:13:04] <_joe_> mutante: great! [11:13:24] (03CR) 10Giuseppe Lavagetto: [C: 032] "Merging as it works (tested on mw1017) and it should help cleaning up hhvm opcode databases" [puppet] - 10https://gerrit.wikimedia.org/r/178802 (owner: 10Giuseppe Lavagetto) [11:16:05] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:23:34] gah who moved base into a module [11:23:41] (03CR) 10Dzahn: [C: 031] "sounds reasonable, syntax looks good" [puppet] - 10https://gerrit.wikimedia.org/r/177876 (owner: 10Cscott) [11:26:20] (03PS4) 10Dzahn: labsproxy: switch to use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178493 [11:26:25] (03PS1) 10Giuseppe Lavagetto: hhvm_cleanup_cache: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/178814 [11:26:42] (03CR) 10jenkins-bot: [V: 04-1] hhvm_cleanup_cache: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/178814 (owner: 10Giuseppe Lavagetto) [11:26:52] (03PS2) 10Giuseppe Lavagetto: hhvm_cleanup_cache: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/178814 [11:27:43] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] hhvm_cleanup_cache: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/178814 (owner: 10Giuseppe Lavagetto) [11:28:25] (03PS1) 10Faidon Liambotis: Kill base::platform with fire [puppet] - 10https://gerrit.wikimedia.org/r/178815 [11:28:27] (03PS1) 10Faidon Liambotis: Remove class base::packages::emacs [puppet] - 10https://gerrit.wikimedia.org/r/178816 [11:28:39] (03CR) 10Dzahn: "changed to use "compat"" [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [11:29:11] (03CR) 10Faidon Liambotis: [C: 032] Kill base::platform with fire [puppet] - 10https://gerrit.wikimedia.org/r/178815 (owner: 10Faidon Liambotis) [11:29:21] (03CR) 10Giuseppe Lavagetto: [C: 031] Remove class base::packages::emacs [puppet] - 10https://gerrit.wikimedia.org/r/178816 (owner: 10Faidon Liambotis) [11:29:38] <_joe_> we should move emacs to base::packages in fact :P [11:29:42] (03CR) 10Dzahn: [C: 031] Remove class base::packages::emacs [puppet] - 10https://gerrit.wikimedia.org/r/178816 (owner: 10Faidon Liambotis) [11:29:46] (03CR) 10Faidon Liambotis: [C: 032] Remove class base::packages::emacs [puppet] - 10https://gerrit.wikimedia.org/r/178816 (owner: 10Faidon Liambotis) [11:29:59] we could if you need it [11:31:27] YuviPanda: https://gerrit.wikimedia.org/r/#/c/178493/ [11:32:00] base is such a crappy module [11:32:05] half of it should be a "labs" module anyway [11:33:04] wouldn't a "labs" module be a guarantee that labs is unlike production though [11:33:25] yes, it would gurantee that production doesn't need "ec2id" or "projectgid" facts :P [11:33:41] :p heh, right [11:34:04] see base::environment [11:34:38] just stupid stuff like that [11:34:45] stupid as in trivial [11:34:46] anyway [11:34:50] another fight another day :) [11:34:54] if( $::instanceproject ) .. i see, yea [11:35:38] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [500.0] [11:35:50] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [500.0] [11:35:52] bit 5xx spike [11:35:54] that you, _joe_? [11:36:37] (03CR) 10Faidon Liambotis: "What is this? The name is very non-descriptive and the commit message doesn't help." [puppet] - 10https://gerrit.wikimedia.org/r/173251 (owner: 10Hashar) [11:37:43] <_joe_> paravoid: I don't think so [11:38:47] <_joe_> paravoid: but I do see hhvm: SlowTimer [10000ms] at runtime/ext_mysql: slow query: SELECT MASTER_POS_WAIT('db1038-bin.000997', 773956533, 10) [11:39:15] <_joe_> which looks like a good candidate for being the reason of the 503 spike [11:39:17] mutante: lgtm, let me merge and supervise runs / restarts [11:39:22] <_joe_> springle: ^^ [11:40:02] mutante: actually [11:40:36] (03CR) 10Yuvipanda: [C: 04-1] "Hmm, so this makes the templates in a module depend on a variable defined in the role, which feels icky. Can you make ssl_ciphersuites a p" [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [11:40:38] mutante: ^ [11:40:48] just realized it was in the role and not in the class [11:40:53] (03CR) 10Dzahn: "i think we should move the ssh override to base or elsewhere as ori suggested and yes to removing it from mediawiki module" [puppet] - 10https://gerrit.wikimedia.org/r/177863 (owner: 10Alexandros Kosiaris) [11:41:45] YuviPanda: i think it should be in the role class because it's a variable [11:42:07] i think "templates in a module depend on a variable defined in the role" is how we want it [11:42:09] _joe_: it would be unusual for slave lag to cause a 503 (afaik), unless it was shard-wide [11:42:15] mutante: sure, I agree, but it should be explicitly defined as a paramter to the class. [11:42:27] _joe_: where are you seeing those, btw? [11:42:30] mutante: otherwise, if it were used in any other role there’s no indication that this template depends on that parameter. [11:42:30] <_joe_> springle: if a query takes 10 seconds to execute [11:42:41] <_joe_> springle: in the hhvm log in at least 2 servers [11:42:49] <_joe_> lemme see the timings [11:43:07] _joe_: it is SELECT MASTER_POS_WAIT. the LB code uses that to temporarily let a slave catch up. it isn't running a query [11:43:12] <_joe_> 11:12:24 no it doesn't check with that [11:43:17] mutante: so we can just make it a required parameter in the class and have it be set in the role [11:43:19] <_joe_> ok nevermind [11:43:30] <_joe_> springle: right, ok [11:45:11] YuviPanda: ok, it does make sense, it's just also "do it different in all places" then [11:45:27] mutante: yeah, we shouldn’t be implicitly depending on variables like that... [11:46:41] <_joe_> !log cleaning and vacuuming the HHVM cache on a few hosts [11:46:45] Logged the message, Master [11:48:15] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [11:48:28] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:49:56] (03PS5) 10Dzahn: labsproxy: switch to use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178493 [11:52:24] (03PS1) 10Dzahn: dynamicproxy - require parameters before optional [puppet] - 10https://gerrit.wikimedia.org/r/178824 [11:52:44] <_joe_> I'd like to depool a few of the smaller appservers from the pool, to see what is the size we should really have (I'd say 30% cpu at peak for starters). Any objections? [11:52:48] (03PS2) 10Dzahn: dynamicproxy: required parameters before optional [puppet] - 10https://gerrit.wikimedia.org/r/178824 [11:53:09] <_joe_> so, not depooling /now/ [11:53:15] mutante: looking [11:54:23] _joe_: sounds good to me, the daily cycle doesn't seem to change much at the moment [11:54:24] (03PS2) 10Faidon Liambotis: mediawiki: remove ssh.override/nice -10 for SSH [puppet] - 10https://gerrit.wikimedia.org/r/177863 (owner: 10Alexandros Kosiaris) [11:54:24] wtf of the day [11:54:47] (03CR) 10Yuvipanda: [C: 04-1] "Parameter should be set in tools::proxy class as well, which uses this." [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [11:54:50] mutante: sorry, didn’t catch this before [11:55:37] (03CR) 10JanZerebecki: [C: 031] labsproxy: switch to use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [11:55:46] (03CR) 10JanZerebecki: [C: 031] dynamicproxy: required parameters before optional [puppet] - 10https://gerrit.wikimedia.org/r/178824 (owner: 10Dzahn) [12:00:21] (03CR) 10Yuvipanda: [C: 031] dynamicproxy: required parameters before optional [puppet] - 10https://gerrit.wikimedia.org/r/178824 (owner: 10Dzahn) [12:02:56] (03PS6) 10Dzahn: labsproxy: switch to use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178493 [12:03:04] YuviPanda: i might be confused by the proxies, but you meant that ?^ [12:03:29] in modules/toollabs? [12:03:34] mutante: yup, Idid [12:03:43] ok, final check then merge and apply [12:03:45] cool [12:03:46] * YuviPanda goes through it once more [12:03:49] thanks [12:04:22] (03PS7) 10Yuvipanda: labsproxy: switch to use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [12:04:27] mutante: lgtm, gonna merge and run. [12:04:31] run puppet, that is. [12:04:40] but then i would have to say that modules/toollabs/manifests/proxy.pp [12:04:47] should be a role file ?:p [12:05:00] YuviPanda: :) [12:05:46] (03CR) 10JanZerebecki: "This surprised me. I saw you then moved it to manifests/role/icinga.pp , perhaps the problem was the order in the file. Did you try moving" [puppet] - 10https://gerrit.wikimedia.org/r/178487 (owner: 10Dzahn) [12:06:11] mutante: there’s a role class for that as well. [12:06:32] mutante: it has other software set up (proxy listener), isn’t just applying other classes [12:06:51] (03PS3) 10Faidon Liambotis: mediawiki: remove ssh.override/nice -10 for SSH [puppet] - 10https://gerrit.wikimedia.org/r/177863 (owner: 10Alexandros Kosiaris) [12:06:52] ah, ok [12:07:08] (03CR) 10Yuvipanda: [C: 032] labsproxy: switch to use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [12:07:12] (03CR) 10Faidon Liambotis: [C: 032 V: 032] mediawiki: remove ssh.override/nice -10 for SSH [puppet] - 10https://gerrit.wikimedia.org/r/177863 (owner: 10Alexandros Kosiaris) [12:07:47] paravoid: puppet merged yours too, hope that’s ok. [12:08:02] well [12:08:09] other way around :) [12:08:10] or you did mine [12:08:12] yeah [12:08:13] :) [12:08:15] paravoid: heh. [12:09:39] puppet-merge should say something about "mid-air collision" in honor of Bugzilla [12:10:50] mutante: run and checked and restarted on both tools and general proxy \o/ [12:11:08] mutante: thanks [12:11:31] checking with ssllabs now [12:13:46] mutante: hmm, still B https://www.ssllabs.com/ssltest/analyze.html?d=tools.wmflabs.org and https://www.ssllabs.com/ssltest/analyze.html?d=wdq.wmflabs.org [12:13:53] mutante: did the change to remove RC4 from compat get merged? [12:14:04] <_joe_> YuviPanda: I hope not [12:14:08] ah, heh :) [12:14:13] well, that explains that [12:14:13] <_joe_> YuviPanda: it will break IE8 on XP [12:14:19] aaah, I see. [12:14:34] looks like we’ll have a B for a while then. [12:14:36] <_joe_> so not something we want to do now without some consulting with other people in the community [12:14:44] yeah, definitely [12:14:51] YuviPanda: yea, not yet, this was just making it use ciphersuite [12:14:59] <_joe_> and I honestly don't have the bandwidth to do this now [12:15:17] mutante: yeah, cool. now I don’t have to worry about cipher suites being out of date, so yay [12:15:19] the B is also for another reason than just RC4 ? [12:15:37] ok, let's see [12:15:48] mutante: maybe the SHA1 instead of SHA2? [12:16:32] that means i broke stats.wm for IE8 on XP [12:16:35] fwiw [12:16:39] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Overall approach seems sound, some minor comments. The big one is that I think soundly supporting the" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/176334 (owner: 10Giuseppe Lavagetto) [12:17:02] because stats is like the last thing that doesnt use ciphersuite now [12:17:24] <_joe_> akosiaris: there is no way in the world to support that [12:17:44] <_joe_> akosiaris: it's just how puppet works, I revolved its internals a few times already [12:18:06] <_joe_> if you have two role1 and role2, and both include ::ssh::server (say) [12:18:28] <_joe_> hiera automatic lookups for ::ssh::server will happen in the role1 context alone [12:18:55] <_joe_> and you have no way around that with anything that's not patching puppet internals [12:19:31] <_joe_> as any parser function is evaluated in-place [12:20:49] <_joe_> and most of your comments are for things I actually copied from puppet's own source code :P [12:21:22] YuviPanda: _joe_: technically there is one drawback to using ciphersuite, it's outweighed by the good reasons to do it, but now you can't say "on labs it's ok to break IE and while in prod it's not" [12:21:38] mutante: I’m going to disagree on ‘ok to break IE’ on labs :) [12:21:45] <_joe_> mutante: that would be the 'strong' chiphersuite [12:22:04] the reason for the "B" is just RC4, it caps it to B as a maximum [12:22:49] mutante: oh, wait, I misread your statement. [12:22:56] we actually agree, so yay. [12:24:12] _joe_: yes, heh, i used strong in the first PS, but then [12:24:20] got convinced by "RC4 will also be removed in compat and strong does have TLS1 disabled which results in quite many browsers not being supported. I think compat is the better choice for this service." [12:24:56] YuviPanda: ^ so we could go back to "strong" like in PS1 against that comment from Jan, and then it would be an A again :p [12:25:14] _joe_: I was afraid of that :-( [12:25:15] mutante: and would also break for IE users :D [12:25:30] _joe_: I do like the RSpec tests btw :-) [12:25:49] <_joe_> akosiaris: so, given we have rarely any case where we need more than one role per node [12:26:02] <_joe_> and the fact we can use it only at the node scope [12:26:17] <_joe_> I thought it's a nice tradeoff [12:26:34] true that... [12:26:56] YuviPanda: yea, so it was just if the "IE question" is answered the same in labs and in prod [12:26:58] <_joe_> akosiaris: the only way to do this correctly would be to create a new resource type for the puppet parser, etc etc [12:27:14] <_joe_> which would mean maintaing our own puppet fork [12:27:45] mutante: indeed, I think they should be, whenever someone decides to actually ask and sit through all the accusations/spam/drama [12:28:49] (03CR) 10Dzahn: "Jan, yea, moving it to role fixed that unexpected issue. i did not try moving it up within the file. i thought it may be related to this: " [puppet] - 10https://gerrit.wikimedia.org/r/178487 (owner: 10Dzahn) [12:29:26] YuviPanda: it's easier that way, yea. :p [12:29:33] yup. [12:31:41] (03CR) 10Dzahn: "..that said i think it should be in the role anyways, because setting variables. or become an actual class parameter as we did with dynami" [puppet] - 10https://gerrit.wikimedia.org/r/178487 (owner: 10Dzahn) [12:32:21] (03PS3) 10Dzahn: dynamicproxy: required parameters before optional [puppet] - 10https://gerrit.wikimedia.org/r/178824 [12:33:06] (03CR) 10Dzahn: [C: 032] dynamicproxy: required parameters before optional [puppet] - 10https://gerrit.wikimedia.org/r/178824 (owner: 10Dzahn) [12:33:40] (03CR) 10Giuseppe Lavagetto: "Sadly, while I agree that the multi-line form would be better, it's going to make puppet act weird. Also, being limited at node scope some" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/176334 (owner: 10Giuseppe Lavagetto) [12:35:50] YuviPanda: how do i know when virt1000 is ready to be scap'ed [12:36:11] * _joe_ flips a coin [12:36:19] <_joe_> mutante: it's not ready! [12:36:28] haha, ok :) thanks [12:36:46] <_joe_> well, the coin knows more about that than me [12:36:54] that reminds me of that wheel of blame thing we had [12:37:11] (03PS7) 10Giuseppe Lavagetto: hiera: role-based backend, role keyword [puppet] - 10https://gerrit.wikimedia.org/r/176334 [12:37:20] !blame is http://en.wikipedia.org/wiki/User:MZMcBride/Blame_wheel [12:37:20] Unable to modify db, access denied, link to database isn't valid [12:37:22] <_joe_> ahahah I've seen it [12:37:38] wm-bot: sudo !blame [12:37:47] <_joe_> mutante: we should probably change the wheel, it's outdated [12:37:57] hehe, yes [12:38:16] needs more options [12:38:24] <_joe_> but blaming domas is a good idea anyway [12:38:59] !access [12:39:18] !access is root [12:39:18] Unable to modify db, access denied, link to database isn't valid [12:39:19] mutante: asking andrewbogott_afk is the way to do it, really [12:39:52] * YuviPanda should write a proper ES / SQL backed irc logger bot so we don’t have to grep through logs to search for conversations [12:40:02] YuviPanda: alright, will do [12:40:33] what's the real issue "link to database isn't valid" or "access denied" [12:40:41] for wm-bot [12:41:41] <_joe_> while I don't like locking out users, I don't see how using SSL on win XP has any meaning [12:41:49] nutcracker alerts [12:41:57] also: move apache helper scripts: https://gerrit.wikimedia.org/r/#/c/177080/ so you can kill module "apachesync" [12:41:58] probably nrpe alerts? [12:42:09] Cannot assign requested address [12:42:09] <_joe_> paravoid: not sure, checking [12:42:20] <_joe_> yeah I've seen that [12:43:40] <_joe_> telnet localhost 11212 works on mw1190 [12:43:46] <_joe_> so now I need to check nrpe [12:44:53] (03CR) 10Alexandros Kosiaris: [C: 031] "I was afraid of that. OK, +1 then" [puppet] - 10https://gerrit.wikimedia.org/r/176334 (owner: 10Giuseppe Lavagetto) [12:45:26] <_joe_> sudo -u nagios /usr/lib/nagios/plugins/check_tcp -H 127.0.0.1 -p 11212 --timeout=2 also works [12:47:11] (03CR) 10Alexandros Kosiaris: "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/177863 (owner: 10Alexandros Kosiaris) [12:48:01] ? all nutcracker process in icinga web ui appear green [12:48:36] <_joe_> mutante: now yes [12:49:28] <_joe_> paravoid: netstat -tunap | fgrep 11212 | fgrep TIME_WAIT | wc -l on mw1190 is 26149, looks like a possible cause of the problem [12:49:31] ah, how about icinga-wm though [12:49:51] <_joe_> mutante: it was a one-off issue it appears [12:49:55] i see [12:50:49] <_joe_> mh nevermind, that number is pretty normal on api servers it appears [12:51:30] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: puppet fail [12:54:02] (03CR) 10Dzahn: add salt grain 'mediawiki-installation' in mw role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/177801 (owner: 10Dzahn) [12:54:39] (03CR) 10Giuseppe Lavagetto: "I don't honestly see how we may think supporting HTTPS on an unsupported, unpatched OS like XP can make any sense - we'd just give a FAKE " [puppet] - 10https://gerrit.wikimedia.org/r/178488 (owner: 10Giuseppe Lavagetto) [12:55:45] _joe_: https://gerrit.wikimedia.org/r/#/c/178555/ [12:55:48] (03CR) 10Dzahn: "checked existing salt grains on mw1033 as random appserver. we have "deployment_target: scap/scap". the question is, is that the identical" [puppet] - 10https://gerrit.wikimedia.org/r/177801 (owner: 10Dzahn) [12:55:58] <_joe_> paravoid: I'm going to -1 that [12:56:02] <_joe_> in fact [12:56:28] <_joe_> I removed 3DES and TLS_DHE_RSA_WITH_AES_128_CBC_SHA for performance reasons at the time [12:59:28] (03CR) 10Dzahn: "salt 'mw1033.eqiad.wmnet' grains.items" [puppet] - 10https://gerrit.wikimedia.org/r/177801 (owner: 10Dzahn) [12:59:56] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "We explicitly killed some ciphers because they would easly starve our resources in production and expose us to DoS attacks - all non-EC DH" [puppet] - 10https://gerrit.wikimedia.org/r/178555 (owner: 10BBlack) [13:03:49] (03CR) 10Dzahn: "that, or maybe the answer is to just match the existing grain we already have called: "deployment_target: scap/scap"" [puppet] - 10https://gerrit.wikimedia.org/r/160953 (owner: 10Alexandros Kosiaris) [13:07:01] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "If we want to have a grain based on the role, we should make this general using a 'roles: ' grain _everywhere_ that would be more accurate" [puppet] - 10https://gerrit.wikimedia.org/r/177801 (owner: 10Dzahn) [13:07:14] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:03] (03CR) 10Dzahn: "I tried to do that before in https://gerrit.wikimedia.org/r/#/c/107831/ and also had to revert it in https://gerrit.wikimedia.org/r/#/c/12" [puppet] - 10https://gerrit.wikimedia.org/r/177801 (owner: 10Dzahn) [13:15:11] PROBLEM - mysqld processes on db1037 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [13:16:51] oh [13:16:57] still me ^ [13:17:43] (03CR) 10Dzahn: [C: 032] "/usr/share/libthai comes from libthai-data" [puppet] - 10https://gerrit.wikimedia.org/r/177876 (owner: 10Cscott) [13:17:46] ACKNOWLEDGEMENT - mysqld processes on db1037 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld Sean Pringle almost done [13:18:44] (03PS1) 10Faidon Liambotis: reprepro: add component thirdparty [puppet] - 10https://gerrit.wikimedia.org/r/178830 [13:18:46] (03PS1) 10Faidon Liambotis: Replace "non-free" with "thirdparty" in apt/d-i [puppet] - 10https://gerrit.wikimedia.org/r/178831 [13:20:00] (03CR) 10Faidon Liambotis: [C: 032 V: 032] reprepro: add component thirdparty [puppet] - 10https://gerrit.wikimedia.org/r/178830 (owner: 10Faidon Liambotis) [13:22:20] (03CR) 10Dzahn: [C: 031] "the technical part looks good to me, should have a link to a ticket though please" [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [13:24:07] (03CR) 10Dzahn: "reopened RT #8286 and linked here" [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [13:24:23] (03PS9) 10Dzahn: Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [13:36:55] (03PS1) 10Dzahn: stats.wm.org: use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178833 [13:38:15] (03CR) 10Dzahn: "this should be the very last SSL config that doesn't use ssl_ciphersuite (grep -ri SSLCipher * in puppet repo etc)" [puppet] - 10https://gerrit.wikimedia.org/r/178833 (owner: 10Dzahn) [13:40:44] (03PS2) 10Dzahn: stats.wm.org: use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178833 [13:44:12] (03CR) 10Faidon Liambotis: [C: 032] Replace "non-free" with "thirdparty" in apt/d-i [puppet] - 10https://gerrit.wikimedia.org/r/178831 (owner: 10Faidon Liambotis) [13:50:21] (03PS1) 10Dzahn: labsdebrepo: move out of misc [puppet] - 10https://gerrit.wikimedia.org/r/178835 [13:53:38] (03PS1) 10Faidon Liambotis: Revert "reprepro: drop architecture i386" [puppet] - 10https://gerrit.wikimedia.org/r/178836 [13:57:34] (03PS2) 10Faidon Liambotis: Revert "reprepro: drop architecture i386" [puppet] - 10https://gerrit.wikimedia.org/r/178836 [13:58:15] (03PS1) 10Dzahn: outreach/contacts: move out of misc into role [puppet] - 10https://gerrit.wikimedia.org/r/178837 [13:58:48] (03CR) 10Faidon Liambotis: [C: 032] Revert "reprepro: drop architecture i386" [puppet] - 10https://gerrit.wikimedia.org/r/178836 (owner: 10Faidon Liambotis) [13:59:27] (03PS2) 10Dzahn: outreach/contacts: move out of misc into role [puppet] - 10https://gerrit.wikimedia.org/r/178837 [14:00:26] !log running dpkg --remove-architecture i386 (trusty); rm /etc/dpkg/dpkg.cfg.d/multiarch (precise) across the whole fleet with the exception of gallium/lanthanum [14:00:28] Logged the message, Master [14:01:48] (03CR) 10Dzahn: [C: 032] outreach/contacts: move out of misc into role [puppet] - 10https://gerrit.wikimedia.org/r/178837 (owner: 10Dzahn) [14:10:01] (03CR) 10JanZerebecki: "The version of nginx we were using back then worked with my generated dhparams. ( I used that one for testing: https://gerrit.wikimedia.or" [puppet] - 10https://gerrit.wikimedia.org/r/178555 (owner: 10BBlack) [14:10:01] PROBLEM - DPKG on db1037 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:11:31] (03PS3) 10Hashar: (WIP IGNORE) contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178806 [14:14:04] (03PS1) 10Faidon Liambotis: Add "standard" to ssl[13]xxx hosts [puppet] - 10https://gerrit.wikimedia.org/r/178838 [14:14:33] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Add "standard" to ssl[13]xxx hosts [puppet] - 10https://gerrit.wikimedia.org/r/178838 (owner: 10Faidon Liambotis) [14:16:15] (03PS3) 10Dzahn: Change ru.wikinews.org to HTTPS only. [puppet] - 10https://gerrit.wikimedia.org/r/178676 (owner: 10JanZerebecki) [14:17:17] (03PS4) 10Dzahn: Change ru.wikinews.org to HTTPS only. [puppet] - 10https://gerrit.wikimedia.org/r/178676 (owner: 10JanZerebecki) [14:18:21] I trust: petan|w.*wikimedia/Petrb (2admin), .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@mediawiki/Catrope (2admin), .*@wikimedia/RobH (2admin), .*@wikimedia/Ryan-lane (2admin), petan!.*@wikimedia/Petrb (2admin), .*@wikimedia/Krinkle (2admin), [14:18:21] @trusted [14:18:57] bblack: ^^ ( https://gerrit.wikimedia.org/r/178838 ) fwiw [14:19:39] (03PS4) 10Hashar: (WIP IGNORE) contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178806 [14:22:47] RECOVERY - DPKG on db1037 is OK: All packages OK [14:23:16] hashar: apt has a newer jenkins version [14:23:54] (03PS5) 10Hashar: (WIP IGNORE) contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178806 [14:24:41] (03CR) 10Ottomata: [C: 031] "Cool w me. BTW, I need to move this behind misc-web-lb very soon anyway. https://rt.wikimedia.org/Ticket/Display.html?id=8085 This need" [puppet] - 10https://gerrit.wikimedia.org/r/178833 (owner: 10Dzahn) [14:24:55] ottomata: I've added a "thirdparty" component to the repository [14:25:00] so CDH5 is there as well now [14:25:15] I didn't remove it with main yet, but I'd like to do so [14:25:24] same with elasticsearch and logstash too [14:26:16] someone has included logstash manually grr [14:26:24] i was about to ask that! [14:26:29] i saw all the reprepro updates [14:26:46] cool, I like it :) [14:27:40] (03PS1) 10Rush: phab update tag for upgrade per T77082 [puppet] - 10https://gerrit.wikimedia.org/r/178840 [14:27:51] (03PS1) 10Ottomata: Bump version to 0.0.9 [debs/logster] - 10https://gerrit.wikimedia.org/r/178841 [14:28:01] paravoid: thanks. Will upgrade Jenkins eventually tomorrow [14:28:11] (03CR) 10Ottomata: [C: 032 V: 032] Bump version to 0.0.9 [debs/logster] - 10https://gerrit.wikimedia.org/r/178841 (owner: 10Ottomata) [14:29:25] (03PS1) 10Faidon Liambotis: reprepro: use logstash 1.4 source [puppet] - 10https://gerrit.wikimedia.org/r/178842 [14:29:38] ottomata: about to reuse logster? [14:29:44] I don't like logster all that much tbh [14:29:50] (the concept of tailing logs for statistics) [14:29:58] (03CR) 10Faidon Liambotis: [C: 032 V: 032] reprepro: use logstash 1.4 source [puppet] - 10https://gerrit.wikimedia.org/r/178842 (owner: 10Faidon Liambotis) [14:30:29] (03CR) 10JanZerebecki: "You said in the other RC4 disable patch:" [puppet] - 10https://gerrit.wikimedia.org/r/178555 (owner: 10BBlack) [14:30:38] (03PS1) 10RobH: setting analytics1001/1002 mgmt ip info [dns] - 10https://gerrit.wikimedia.org/r/178843 [14:31:07] paravoid, yes, i was [14:31:25] was going to use it to send varnishkafka stats to statsd [14:31:34] it never worked right (because it runs in a cron) for ganglia [14:31:37] but should be fine for statsd [14:32:01] its running in a screen on cp1056 right now, godog wants me to hold off on sending all those stats to graphite right now [14:32:06] while he is improving it [14:32:10] ok.. [14:32:11] (03PS2) 10Rush: phab update tag for upgrade per T77082 [puppet] - 10https://gerrit.wikimedia.org/r/178840 [14:32:12] also, paravoid...it doesn't tail [14:32:14] i mean [14:32:18] constantly tail [14:32:31] it runs periodically [14:32:38] ha, well [14:32:39] (03CR) 10RobH: [C: 032] setting analytics1001/1002 mgmt ip info [dns] - 10https://gerrit.wikimedia.org/r/178843 (owner: 10RobH) [14:32:42] i suppose that is 'tailing' [14:32:45] but it isn't like tail -f :) [14:33:01] but...why?! :) [14:33:21] y u no like? [14:33:41] (03CR) 10Rush: [C: 032] phab update tag for upgrade per T77082 [puppet] - 10https://gerrit.wikimedia.org/r/178840 (owner: 10Rush) [14:33:52] RECOVERY - mysqld processes on db1037 is OK: PROCS OK: 1 process with command name mysqld [14:34:06] actually tail -f is quite smart [14:34:35] it uses inotify nowadays [14:35:24] probably more efficient than logster ) [14:35:25] :) [14:35:42] ? logster doesn't constantly tail though [14:35:48] it runs whenever you run it via cron [14:35:52] and just reads from the place it last left off [14:36:20] it uses logtail2 to do that [14:36:57] hi, there are three 503 on Varnish tickets in Phab (two refer to not being able to log in, one to editing a template) [14:37:12] anybody aware of recent issues, or who could investigate and ask the "right questions" on those tickets? [14:37:16] https://phabricator.wikimedia.org/maniphest/?ids=75462,78116,78028#R [14:37:33] (03PS4) 10Rush: adds Sprint extension (0.6.1.2) to role::phabricator::main [puppet] - 10https://gerrit.wikimedia.org/r/178194 (owner: 10Christopher Johnson (WMDE)) [14:38:13] ottomata: re: varnishkafka statsd is there a tracking ticket? [14:39:23] hm, godog, not a good one [14:39:24] i'm using this right now [14:39:25] https://phabricator.wikimedia.org/T76342 [14:40:23] ottomata: yep that'll do, I'll followup there thanks! [14:40:35] basically re-stating the options for statsd [14:41:20] (03PS5) 10Rush: phab aSprint extension (0.6.1.2) [puppet] - 10https://gerrit.wikimedia.org/r/178194 (owner: 10Christopher Johnson (WMDE)) [14:41:33] ottomata: so, can I remove cloudera packages from main? [14:42:12] (03PS6) 10Rush: phab add Sprint extension (0.6.1.2) [puppet] - 10https://gerrit.wikimedia.org/r/178194 (owner: 10Christopher Johnson (WMDE)) [14:42:14] (03CR) 10RobH: [C: 04-2] "I have set this to -2, do not submit, as a stall until the next Operations meeting. As this request is for sudo access, it must be specif" [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [14:42:47] (03PS6) 10Hashar: (WIP IGNORE) contint: +hhvm [puppet] - 10https://gerrit.wikimedia.org/r/178806 [14:43:32] paravoid, yes, don't see why not. [14:43:46] do you run CDH in Labs with a local puppetmaster? [14:44:18] (03CR) 10Rush: [C: 032] phab add Sprint extension (0.6.1.2) [puppet] - 10https://gerrit.wikimedia.org/r/178194 (owner: 10Christopher Johnson (WMDE)) [14:45:51] paravoid, yes, but none that I currently maintain [14:46:04] ok [14:46:05] i haven't logged into any of them in a long time, and if I need to again, i'd likely recreate them [14:46:06] I found differences [14:46:12] ? [14:46:18] we have a newer bigtop [14:46:21] oh [14:46:25] interesting [14:46:58] actually [14:47:06] main's bigtop is cdh4 [14:47:17] ah, ok. then i don't think we use that. [14:47:19] precise-wikimedia|main|amd64: bigtop-jsvc 1.0.10-1.cdh4.3.1.p0.71~precise-cdh4.3.1 [14:47:22] precise-wikimedia|main|source: bigtop-jsvc 1.0.10-1.cdh4.3.1.p0.71 [14:47:24] precise-wikimedia|thirdparty|amd64: bigtop-jsvc 0.6.0+cdh5.0.2+432-1.cdh5.0.2.p0.18~precise-cdh5.0.2 [14:47:27] precise-wikimedia|thirdparty|source: bigtop-jsvc 0.6.0+cdh5.0.2+432-1.cdh5.0.2.p0.18 [14:47:30] etc. [14:47:48] an10 has [14:47:48] Package: bigtop-utils [14:47:49] State: installed [14:47:49] Automatically installed: yes [14:47:49] Version: 0.7.0+cdh5.0.2+0-1.cdh5.0.2.p0.23~precise-cdh5.0.2 [14:47:58] yes bigtop-utils is fine [14:48:02] oh! [14:48:05] bigtop-jvc & bigtop-tomcat aren't [14:48:05] jsvc [14:48:06] weird [14:48:11] Package: bigtop-jsvc [14:48:11] State: installed [14:48:11] Automatically installed: yes [14:48:11] Version: 1.0.10-1.cdh4.3.1.p0.71~precise-cdh4.3.1 [14:48:14] right [14:48:17] it is installed. strange. [14:48:22] it's cdh4 [14:48:34] you can apt-get upgrade that if you want [14:49:24] huh, so main didn't have the cdh5 one? [14:49:47] nope [14:50:09] (03Draft2) 10Filippo Giunchedi: import debirf 0.34 [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/178845 [14:50:35] (03PS3) 10Filippo Giunchedi: import debirf 0.34 "rescue" configuration [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/178845 [14:51:06] weird. ok. well, i don't think i need to apt-get upgrade these, as I plan on upgrading to cdh 5.2 someitme soon anyway [14:51:27] i will get this new bigtop version on the namenodes when I get new ones and set those up [14:51:36] i'll make sure to watch for any weirdness from that [14:53:50] (03Draft3) 10Filippo Giunchedi: debirf.conf: adjust options for this environment [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/178846 [14:54:09] (03Draft3) 10Filippo Giunchedi: add hwraid repository and some utilities [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/178847 [14:57:38] (03Draft2) 10Filippo Giunchedi: add utility Makefile [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/178848 [14:57:51] (03Draft3) 10Filippo Giunchedi: ignore build artifacts [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/178849 [14:58:55] paravoid: ^ took a stab at debirf in T78135 (or anyone else who wants to take a look) [15:00:04] chasemp: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141210T1500). [15:01:10] Phabricator down? [15:01:37] PROBLEM - nutcracker port on mw1194 is CRITICAL: Cannot assign requested address [15:01:37] <^d> marktraceur: Planned window... [15:01:44] Okay [15:02:58] (03PS4) 10Faidon Liambotis: reprepro: add jessie-wikimedia to reprepro [puppet] - 10https://gerrit.wikimedia.org/r/178165 [15:03:28] (03PS1) 10Springle: repool db1037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178852 [15:03:34] <^d> marktraceur: You should install Phabricator on your dev machine. Then you never have to stop the fun :D [15:03:41] (03CR) 10Faidon Liambotis: [C: 032 V: 032] reprepro: add jessie-wikimedia to reprepro [puppet] - 10https://gerrit.wikimedia.org/r/178165 (owner: 10Faidon Liambotis) [15:04:00] (03CR) 10Springle: [C: 032] repool db1037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178852 (owner: 10Springle) [15:04:15] (03PS1) 10Filippo Giunchedi: rename graphite1001 to graphite1002 [puppet] - 10https://gerrit.wikimedia.org/r/178853 [15:04:32] "fun" [15:04:46] RECOVERY - nutcracker port on mw1194 is OK: TCP OK - 0.000 second response time on port 11212 [15:04:48] ^d: I have an instance on marktraceur.info that's basically just my personal to-do stuff. [15:05:23] In the middle of my backswing! Edit failed. [15:05:26] !log springle Synchronized wmf-config/db-eqiad.php: repool db1037, warm up (duration: 00m 09s) [15:05:30] Logged the message, Master [15:05:40] yay phab maintenance, k, be right back :D [15:05:53] * Krinkle needed a reason for a break [15:06:01] <^d> marktraceur: Does it use Elasticsearch? :) [15:06:07] Hell naw. [15:06:51] <^d> marktraceur: Every server should have Elasticsearch on it :) [15:07:14] Given my server has trouble with the load I already put on it, no thanks [15:07:43] <^d> You only need a few gigs of ram for the jvm :p [15:08:18] (03PS1) 10Filippo Giunchedi: rename graphite1001 to graphite1002 [dns] - 10https://gerrit.wikimedia.org/r/178854 [15:09:15] robh: ^ 178854 and 178853 to rename graphite1001 if you spare 5min [15:10:45] godog: cool, did you put in a ticket for label update or shall i? [15:10:55] (03CR) 10RobH: [C: 031] rename graphite1001 to graphite1002 [dns] - 10https://gerrit.wikimedia.org/r/178854 (owner: 10Filippo Giunchedi) [15:10:57] robh: ah not yet, doing it now [15:11:07] ok, i'll let you do it then =] [15:11:18] so for the ticket, pull up racktables and rename the 'name' field for the server [15:11:22] but not the 'visible label' field [15:11:31] that gets updated when cmjohnson updates the physical label [15:11:52] usually in the ticket to change the name, i'll also list the wmf asset tag to reduce confusion [15:12:03] asset tags are 99.9999% permanent [15:12:09] (unless we swap a chassis) [15:12:35] godog: i got back the quotes to upgrade the disk controllers on the future graphite servers [15:12:39] don't ever recall doing that [15:12:40] and escalated for purchase approvals already [15:12:42] cmjohnson: ? [15:12:48] swapping chassis [15:12:48] you never update the visible label field? [15:12:50] oh [15:12:56] i had it happen once to a switch [15:13:07] so the asset tag sticker was gone [15:13:21] i even emailed accounting about it when it happened cuz asset tags are not supposed to change normally [15:13:42] 'i am not stealing ex4200s, i have no need for a switch that sounds like a jet engine' [15:14:15] (03PS5) 10Faidon Liambotis: autoinstall: apt components based on Debian/Ubuntu [puppet] - 10https://gerrit.wikimedia.org/r/178167 [15:14:17] (03PS5) 10Faidon Liambotis: apt: switch components based on $::lsbdistid [puppet] - 10https://gerrit.wikimedia.org/r/178166 [15:15:32] (03PS6) 10Faidon Liambotis: autoinstall: apt components based on Debian/Ubuntu [puppet] - 10https://gerrit.wikimedia.org/r/178167 [15:15:59] (03CR) 10Faidon Liambotis: [C: 032] apt: switch components based on $::lsbdistid [puppet] - 10https://gerrit.wikimedia.org/r/178166 (owner: 10Faidon Liambotis) [15:16:03] robh: ack, thanks (related rt is #9038) [15:16:11] (03CR) 10Faidon Liambotis: [C: 032] autoinstall: apt components based on Debian/Ubuntu [puppet] - 10https://gerrit.wikimedia.org/r/178167 (owner: 10Faidon Liambotis) [15:20:57] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 592.404935724 [15:21:41] RECOVERY - Host d-i-test is UP: PING OK - Packet loss = 0%, RTA = 2.01 ms [15:27:51] PROBLEM - check configured eth on d-i-test is CRITICAL: Connection refused by host [15:28:11] PROBLEM - check if dhclient is running on d-i-test is CRITICAL: Connection refused by host [15:28:14] PROBLEM - check if salt-minion is running on d-i-test is CRITICAL: Connection refused by host [15:28:17] PROBLEM - puppet last run on d-i-test is CRITICAL: Connection refused by host [15:28:29] !log initiated replica election since analytics1021 timed out zk connection again (I had hoped we were done with this :( ) [15:28:32] Logged the message, Master [15:29:36] PROBLEM - DPKG on d-i-test is CRITICAL: Connection refused by host [15:29:58] PROBLEM - Disk space on d-i-test is CRITICAL: Connection refused by host [15:30:02] PROBLEM - RAID on d-i-test is CRITICAL: Connection refused by host [15:31:08] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 7404.07476122 [15:32:08] (03CR) 10JanZerebecki: [C: 031] "I don't like it when variables used in the template are not defined where it is used, but its already guilty of that..." [puppet] - 10https://gerrit.wikimedia.org/r/178833 (owner: 10Dzahn) [15:32:35] aw analytics1021 :( [15:56:40] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] rename graphite1001 to graphite1002 [dns] - 10https://gerrit.wikimedia.org/r/178854 (owner: 10Filippo Giunchedi) [15:57:34] (03PS2) 10Filippo Giunchedi: rename graphite1001 to graphite1002 [puppet] - 10https://gerrit.wikimedia.org/r/178853 [15:57:47] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] rename graphite1001 to graphite1002 [puppet] - 10https://gerrit.wikimedia.org/r/178853 (owner: 10Filippo Giunchedi) [16:00:04] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141210T1600). [16:00:19] * anomie sees nothing for SWAT this morning [16:00:36] <^d> go team. [16:02:00] Good job andre__afk [16:02:03] Errr... anomie [16:07:39] PROBLEM - NTP on d-i-test is CRITICAL: NTP CRITICAL: No response from NTP server [16:11:18] (03PS1) 10coren: Add reverse for toolserver mail relay [dns] - 10https://gerrit.wikimedia.org/r/178860 [16:17:55] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: Puppet has 1 failures [16:19:02] PROBLEM - swift-account-reaper on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [16:19:26] PROBLEM - swift-account-auditor on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [16:19:54] PROBLEM - swift-object-replicator on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:20:19] PROBLEM - swift-account-server on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [16:20:19] PROBLEM - swift-object-server on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [16:20:32] PROBLEM - swift-container-auditor on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [16:20:57] PROBLEM - swift-container-replicator on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [16:21:11] PROBLEM - swift-container-server on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [16:21:37] godog: that you? [16:22:21] PROBLEM - swift-account-replicator on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [16:22:47] paravoid: whoops, yes, silencing [16:22:54] also !log :) [16:22:58] PROBLEM - swift-container-updater on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [16:23:03] PROBLEM - swift-object-auditor on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:23:28] RECOVERY - check configured eth on d-i-test is OK: NRPE: Unable to read output [16:23:29] robh: When you get a minute, https://gerrit.wikimedia.org/r/#/c/178860 [16:23:33] hehe true [16:23:56] !log swapping sdm on ms-be2013 / ms-be2014 / ms-be2015 [16:23:58] Logged the message, Master [16:24:09] RECOVERY - DPKG on d-i-test is OK: All packages OK [16:24:25] RECOVERY - Disk space on d-i-test is OK: DISK OK [16:24:52] RECOVERY - check if dhclient is running on d-i-test is OK: PROCS OK: 0 processes with command name dhclient [16:24:52] RECOVERY - check if salt-minion is running on d-i-test is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:24:52] RECOVERY - RAID on d-i-test is OK: OK: no RAID installed [16:25:10] godog: I need python-diamond on jessie, so two questions [16:25:29] a) will you upload to Debian? (changes whether I'll include it to "backports" or "main") [16:25:38] _joe_: btw, disregard what I said last night about the parsoid speed-up; I was looking at a random mix of pages, while the 2 second number is for post-edit parses, which tend to favor larger pages [16:25:41] b) do you think a straight copy from trusty will work for now? [16:25:43] (03CR) 10RobH: [C: 031] "syntax looks right (thought i cannot evaluate if openstack knows to not allocate these IP addresses, it out of scope of this patchset ;)" [dns] - 10https://gerrit.wikimedia.org/r/178860 (owner: 10coren) [16:26:09] like my little cover my ass statement there ;D [16:26:17] Heh. [16:27:18] (03CR) 10coren: [C: 032] "It can't. Those IP /are/ allocated there, and we kinda-sorta hope nobody's going to mess with them since we can't sync up." [dns] - 10https://gerrit.wikimedia.org/r/178860 (owner: 10coren) [16:27:30] RECOVERY - NTP on d-i-test is OK: NTP OK: Offset 0.05508410931 secs [16:27:39] a.k.a.: Yeah, yeah, we know. [16:28:16] !log authdns-update to merge in https://gerrit.wikimedia.org/r/178860 [16:28:19] Logged the message, Master [16:28:34] we don't !log authdns-updates any more [16:28:43] I mean, feel free to, but don't feel obligated to :) [16:28:51] Ah, okay. :-) [16:29:07] since they stopped crashing the pdns process [16:29:12] since its not pdns now for them =] [16:29:33] (well, non recursors atleast) [16:29:55] (03PS1) 10Faidon Liambotis: Remove provider => upstart from Service[rsyslog] [puppet] - 10https://gerrit.wikimedia.org/r/178863 [16:30:18] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Remove provider => upstart from Service[rsyslog] [puppet] - 10https://gerrit.wikimedia.org/r/178863 (owner: 10Faidon Liambotis) [16:30:22] paravoid: a) is "not sure" so I'd go for main now b) I think so, yeah, happy to rebuild it for jessie in case [16:30:25] (03PS7) 10Hashar: contint: provision hhvm on CI slaves [puppet] - 10https://gerrit.wikimedia.org/r/178806 [16:30:40] how come you're not sure? [16:32:06] python-statsd isn't in jessie [16:32:14] hashar hasn't been doing a very good job maintaining it [16:32:16] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:32:16] bad hashar [16:32:50] (03CR) 10Hashar: "ori, _joe_ : I have deployed this patch on the contint puppetmaster and crafted a job running mw/core PHPUnit tests with Zend: https://int" [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [16:36:45] RECOVERY - puppet last run on d-i-test is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [16:37:30] ^^^^ :) [16:37:34] guess what that is :) [16:37:37] a clean puppet run [16:39:23] paravoid: it is a fair bit of a python codebase, perhaps team-maintained [16:39:52] there's a pkg-monitoring group [16:41:41] ah that fits too, was thinking of python-apps [16:53:24] !log reinstalling graphite1001 as graphite1002 [16:53:39] Logged the message, Master [17:01:06] what th ehell phabricator [17:02:09] <^d> hoo: Diffusion e-mails? Or something else? [17:02:40] ^d: Yeah [17:02:41] wtf [17:02:53] <^d> See wikitech-l :( [17:02:55] <^d> I'm sorry! [17:02:58] <^d> You can turn them off [17:03:05] <^d> In your e-mail prefs. [17:03:11] <^d> (they should not have gone out) [17:03:51] ^d: Ok... which pref is that? "A commit is created"? [17:04:02] <^d> Yep. [17:05:35] ok, done [17:11:13] <_joe_> paravoid: \o/ [17:11:20] <_joe_> (the clean puppet run on debian) [17:23:02] I would so love hhvm on the snapshot hosts... [17:27:49] PROBLEM - uWSGI web apps on graphite1002 is CRITICAL: Connection refused by host [17:28:38] hoo: File an RT ticket :) [17:28:49] PROBLEM - RAID on graphite1002 is CRITICAL: Connection refused by host [17:29:03] yeah yeah icinga-wm [17:29:23] mh, I might actually do that [17:30:14] (03CR) 10BBlack: [C: 04-1] "Just an administrative -1 to temporarily block merging: we probably shouldn't turn on HSTS for any of the public, primary sites until some" [puppet] - 10https://gerrit.wikimedia.org/r/178676 (owner: 10JanZerebecki) [17:32:52] robh: Who can I ask about mail logs? https://phabricator.wikimedia.org/T1077 [17:32:57] mutante: ?? ^^ [17:33:12] for phabricator? [17:33:17] yep [17:34:17] i'll try to take a look at it today [17:34:30] dont wanna shift focus midtask [17:34:32] =] [17:34:32] (03PS2) 10BBlack: Zero: Temporary removal of several XCS IDs from analytics [puppet] - 10https://gerrit.wikimedia.org/r/178745 (owner: 10Yurik) [17:34:42] (03CR) 10BBlack: [C: 032 V: 032] Zero: Temporary removal of several XCS IDs from analytics [puppet] - 10https://gerrit.wikimedia.org/r/178745 (owner: 10Yurik) [17:34:58] robh: ok, np. just fyi, I emails and put their log id info in the task [17:35:19] I *sent* emails... [17:36:57] <_joe_> godog: so we're losing all the data? [17:37:10] <_joe_> (re graphite being reinstalled) [17:38:09] _joe_: haha nono, the machine reinstalled isn't in service yet [17:38:34] Maybe there's a FAQ for this, but I noticed I don't have permissions to view the link included in "Diffusion" emails, e.g. https://phabricator.wikimedia.org/rECNO48782f4cd73a [17:38:36] <_joe_> ugh sorry, graphite1001, meh [17:38:44] <_joe_> ok, *off* [17:38:57] <^d> awight: You do. See wikitech-l. [17:39:08] <^d> Commits show up weird until they're done importing. [17:39:13] ^d: ok thx [17:39:15] <^d> "Give it time. And sorry for the e-mails :(" [17:39:27] (03CR) 1020after4: [C: 031] contint: provision hhvm on CI slaves [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [17:39:40] * awight sorts through spam to find spam with explanation :p [17:43:58] RECOVERY - RAID on graphite1002 is OK: OK: optimal, 2 logical, 4 physical [17:46:00] RECOVERY - uWSGI web apps on graphite1002 is OK: OK: All defined uWSGI apps are runnning. [17:46:47] (03CR) 10Hashar: "I would need the hhvm Jenkins job to inject in the env:" [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [17:57:12] ^d: thanks for mentioning the email preferences workaround! [17:57:17] <^d> yw [17:58:53] Mostly I was complaining earlier, cos I was excited to play in the new code browser and was denied [18:02:52] PROBLEM - puppet last run on ssl3003 is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [18:10:46] (03PS1) 10Dzahn: move mediawiki maintenance scripts to module [puppet] - 10https://gerrit.wikimedia.org/r/178873 [18:15:55] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [18:25:41] PROBLEM - swift-object-updater on ms-be2014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:26:45] !log replacing disk 0 db1010 [18:26:51] Logged the message, Master [18:28:04] (03CR) 10John F. Lewis: [C: 031] "Looks good - can't see any errors or nitpicks." [puppet] - 10https://gerrit.wikimedia.org/r/178873 (owner: 10Dzahn) [18:30:05] RECOVERY - swift-container-auditor on ms-be2014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:30:29] RECOVERY - swift-container-replicator on ms-be2014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [18:30:29] RECOVERY - swift-container-updater on ms-be2014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [18:30:46] RECOVERY - swift-container-server on ms-be2014 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [18:31:07] RECOVERY - swift-object-replicator on ms-be2014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [18:31:35] RECOVERY - swift-object-server on ms-be2014 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [18:31:51] RECOVERY - swift-object-updater on ms-be2014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:31:51] RECOVERY - swift-object-auditor on ms-be2014 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [18:31:52] that's me ^ [18:32:15] RECOVERY - swift-account-auditor on ms-be2014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [18:32:23] (03CR) 10John F. Lewis: [C: 031] remove slauerhoff and slauerhoff-array [dns] - 10https://gerrit.wikimedia.org/r/176868 (owner: 10Dzahn) [18:32:29] RECOVERY - swift-account-reaper on ms-be2014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [18:32:42] !log replacing disk slot 4 db1015 [18:32:46] Logged the message, Master [18:32:47] RECOVERY - swift-account-replicator on ms-be2014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [18:33:07] RECOVERY - swift-account-server on ms-be2014 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [18:41:02] PROBLEM - RAID on db1015 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [18:55:11] (03CR) 10Hoo man: "Thanks for keeping my file header about not using mwdeploy, btw :)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/178873 (owner: 10Dzahn) [18:57:17] (03CR) 10Hoo man: move mediawiki maintenance scripts to module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/178873 (owner: 10Dzahn) [19:00:04] Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141210T1900). [19:01:34] Reedy: If feasible, could you make sure https://gerrit.wikimedia.org/r/#/c/178883/ is included? [19:01:45] If not, that's fine, I'll submit it for the SWAT [19:05:38] Reedy: did you make the branch yet? [19:06:18] not yet says gitblit [19:06:44] (03CR) 10Catrope: "Sounds reasonable to me." [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [19:13:43] RECOVERY - RAID on db1015 is OK: OK: optimal, 1 logical, 2 physical [19:22:27] ^d: I just received another bunch of emails [19:22:46] <^d> Did you disable the preference? [19:22:57] " A commit is created." is "Ignore" [19:23:02] <^d> Boo!! :( [19:23:38] it might take a minute for the backlog in gmail to drain away [19:24:22] <^d> I also don't know exactly how it checks that preference. [19:24:31] <^d> If it does it at queue time we might be a little late for some of them. [19:27:27] aude: Hah! Now that we have logging the dump creation went totally fine :D [19:31:11] PROBLEM - gdash.wikimedia.org on graphite1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time [19:35:41] Why do I get the feeling that make-wmf-branch is going to barf at visualeditor branch already existing [19:35:53] Reedy: We checked the script. [19:35:59] Reedy: It doesn't. Or… shouldn't. [19:36:02] lol [19:36:10] It's easy enough to hack around if I need to [19:36:25] Reedy: It's mostly a test to see if we can pre-emptively pin extensions if we know breaking changes are a'coming. [19:36:37] Reedy: It's not actually vital at this point if you do re-create. :-) [19:37:02] Your .gitreview is going to be wrong ;) [19:37:12] Reedy: Ha. Point. [19:37:23] * James_F fixes. [19:37:28] That's easily remedied though [19:37:31] Ours is always wrong :P [19:37:37] hoo: :-) [19:37:37] Ours as in Wikidata/Wikibase [19:37:39] * Reedy runs the script and see's what damage he can cause [19:37:44] :-D [19:41:37] Reedy: let me know when train deployment to en.wiki is finished. Thanks! [19:42:46] Reedy: https://gerrit.wikimedia.org/r/178892 done. [19:43:17] (03PS1) 10Faidon Liambotis: grub: strip quiet even if splash is not there [puppet] - 10https://gerrit.wikimedia.org/r/178895 [19:43:19] (03PS1) 10Faidon Liambotis: grub: show the menu on the VGA console as well [puppet] - 10https://gerrit.wikimedia.org/r/178896 [19:43:21] (03PS1) 10Faidon Liambotis: grub: use augeas to modify the config [puppet] - 10https://gerrit.wikimedia.org/r/178897 [19:43:55] (03CR) 10Faidon Liambotis: [C: 032 V: 032] grub: strip quiet even if splash is not there [puppet] - 10https://gerrit.wikimedia.org/r/178895 (owner: 10Faidon Liambotis) [19:45:06] (03CR) 10Faidon Liambotis: [C: 032] grub: show the menu on the VGA console as well [puppet] - 10https://gerrit.wikimedia.org/r/178896 (owner: 10Faidon Liambotis) [19:46:18] (03CR) 10Faidon Liambotis: [V: 032] grub: show the menu on the VGA console as well [puppet] - 10https://gerrit.wikimedia.org/r/178896 (owner: 10Faidon Liambotis) [19:46:51] James_F: yup, it borks [19:46:52] To ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/VisualEditor.git [19:46:52] ! [rejected] wmf/1.25wmf12 -> wmf/1.25wmf12 (non-fast-forward) [19:46:52] error: failed to push some refs to 'ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/VisualEditor.git' [19:46:52] To prevent you from losing history, non-fast-forward updates were rejected [19:46:52] Merge the remote changes (e.g. 'git pull') before pushing again. See the [19:46:54] 'Note about fast-forwards' section of 'git push --help' for details. [19:46:56] sleeping for 5s [19:46:58] git exit with status 1 [19:48:31] Reedy: Ick. [19:48:37] Will be back in a bit [19:48:40] Reedy: Which script broke? [19:48:42] * Reedy files in phab [19:48:45] make-wmf-branch [19:48:50] Hmmmmm. [19:49:19] https://phabricator.wikimedia.org/T78188 [19:49:45] ottomata: can we add superm401 to reserach group so he can access data in EL on 1003? [19:49:52] *research [19:50:55] Reedy: I swore RoanKattouw and I checked make-wmf-branch. [19:51:11] ja, need RT ticket and usual process :/ [19:51:52] ottomata, I had access before, not sure why it was lost (maybe unintentionally). [19:52:00] I'll send a request. [19:52:10] ohg [19:52:11] you did [19:52:12] hm, ok [19:52:23] hm [19:52:52] ah you are matt! [19:52:56] James_F: FWIW, it's the least of my issues for today :) [19:53:05] matt, yoiu are in the researchers group [19:53:19] superm401: ^ [19:54:03] ottomata, it seems I was already in that. I'll try logging out and back on. [19:55:13] ottomata, I am in these groups: [19:55:19] wikidev researchers statistics-users [19:55:20] I get: [19:55:21] Reedy: Kk. [19:55:35] yup, that is the right group superm401 [19:55:36] ERROR 1045 (28000): Access denied for user 'research'@'208.80.154.82' (using password: YES) [19:55:50] Reedy: Oooh. We looked at https://github.com/wikimedia/mediawiki-tools-release/blob/master/make-extension-branches/make-extension-branches which does skip. [19:56:12] heh [19:56:42] Shouldn't be much work to fix it [19:57:10] Somehow my my.cnf was interfering, even though it didn't mention that host. [20:01:14] PROBLEM - Host d-i-test is DOWN: PING CRITICAL - Packet loss = 100% [20:04:01] hi, there are three 503 on Varnish tickets in Phab (two refer to not being able to log in, one to editing a template). Anybody aware of recent issues, or who could investigate and ask the "right questions" on those tickets? https://phabricator.wikimedia.org/maniphest/?ids=75462,78116,78028#R [20:05:42] andre__: I looked at another one not on your list the other day with chasemp [20:06:08] bblack, ah. got a number? was afk for the last hours [20:06:11] and thanks for taking a look! [20:06:23] (03CR) 10Dduvall: "I get a different error when executing the specs:" [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [20:06:49] the issue there was a phab bug was causing phab to die with an Exception, and for whatever reason in our apache/php/phab configuration, those Exceptions cause the connection from varnish->apache to abort rather than give e.g. 500 Internal Server Error, so thus varnish gives the user a 503 because the backend is acting like it's dead. [20:07:05] so my basic assumption is that all phab 503s are similar in nature [20:07:40] andre__: the one I looked at when debugging that was: https://phabricator.wikimedia.org/T77998 [20:08:04] oh, I see, interesting [20:08:22] but the ones I linked above are on Commons etc, not Phab related [20:09:12] oh sorry, I thought you meant "503 on Varnish tickets in Phab" to mean 503s specifically with the phab service [20:09:16] RECOVERY - Host d-i-test is UP: PING OK - Packet loss = 0%, RTA = 0.44 ms [20:12:52] andre__: looking at those tickets now, but it will take me a while to dig through this [20:13:07] bblack, sure, no problem. Thanks a lot for checking! [20:15:38] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [20:15:46] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: puppet fail [20:18:40] (03PS1) 10Faidon Liambotis: grub: check for the existence of /e/default/grub [puppet] - 10https://gerrit.wikimedia.org/r/178907 [20:18:57] (03CR) 10Faidon Liambotis: [C: 032 V: 032] grub: check for the existence of /e/default/grub [puppet] - 10https://gerrit.wikimedia.org/r/178907 (owner: 10Faidon Liambotis) [20:20:34] bblack: not sure if you saw https://gerrit.wikimedia.org/r/#/c/178419/ [20:22:04] (03PS2) 10Faidon Liambotis: grub: use augeas to modify the config [puppet] - 10https://gerrit.wikimedia.org/r/178897 [20:24:12] (03PS1) 10Reedy: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178908 [20:24:14] (03PS1) 10Reedy: testwiki to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178909 [20:24:16] (03PS1) 10Reedy: wikipedias to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178910 [20:24:18] (03PS1) 10Reedy: group0 to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178911 [20:24:29] (03CR) 10Reedy: [C: 032] Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178908 (owner: 10Reedy) [20:25:28] (03CR) 10Reedy: [C: 032] testwiki to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178909 (owner: 10Reedy) [20:27:16] Is jenkins deaded? [20:27:19] (03CR) 10BBlack: "Well, the parsoid varnishes are "special" with how they recurse into themselves for caching with if-only-cached, which means their configu" [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [20:27:27] !log reedy Started scap: testwiki to 1.25wmf12 [20:27:33] Logged the message, Master [20:29:59] Reedy: i don't think so, it seems to be a lot of patches in queue :P [20:30:45] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:30:53] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:39:39] (03Merged) 10jenkins-bot: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178908 (owner: 10Reedy) [20:39:45] (03Merged) 10jenkins-bot: testwiki to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178909 (owner: 10Reedy) [20:39:57] PROBLEM - https://phabricator.wikimedia.org on iridium is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string Wikimedia and MediaWiki not found on https://phabricator.wikimedia.org:443https://phabricator.wikimedia.org/ - 10130 bytes in 0.257 second response time [20:39:57] Nice of you to join us Jenkins [20:41:04] (03CR) 10GWicke: [C: 031] "The Parsoid varnishes were a stop-gap in themselves. They have huge disks as a form of ghetto storage, which does not turn out to be such " [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [20:42:01] (03PS3) 10Faidon Liambotis: grub: use augeas to modify the config [puppet] - 10https://gerrit.wikimedia.org/r/178897 [20:44:34] (03CR) 10BBlack: "Regardless, Faidon's point is still critical: The "misc" varnish service is not meant to host production services, only internal toolsy th" [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [20:49:59] (03PS1) 10Faidon Liambotis: pybal: use Apache instead of os_version [puppet] - 10https://gerrit.wikimedia.org/r/178917 [20:50:38] ori: any reason you didn't do this in the first place? [20:51:17] pretty sure you know about IfVersion, considering you apache::mod::version is yours [20:53:15] bblack: ping [20:53:45] (03PS1) 10Faidon Liambotis: contint: use Apache instead of os_version [puppet] - 10https://gerrit.wikimedia.org/r/178919 [20:55:21] jgage: do you have any plans to upgrade logstash to trusty? [20:55:45] We were kind of talking about it [20:55:56] no date set at this point [20:56:13] bblack: do you know if your patch has fixed the issue? I'm looking to close this ticket: https://phabricator.wikimedia.org/T67683 [20:56:28] paravoid: We were talking about new hardware too, so it might wait for that [20:56:45] it should be relatively easy, right? [20:56:48] same elasticsearch version [20:56:53] It should be yeah [20:56:54] same java version too I think [20:57:16] and we have the data on separate disk now [20:57:52] If we had the debs in apt we could try updating in beta pretty easily [20:57:57] we have them [20:58:02] sweet [20:58:03] oh logstash debs you mean? [20:58:11] yes [20:58:45] sec [20:58:51] (03CR) 10Faidon Liambotis: [C: 032] pybal: use Apache instead of os_version [puppet] - 10https://gerrit.wikimedia.org/r/178917 (owner: 10Faidon Liambotis) [21:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141210T2100). [21:01:27] (03CR) 10Faidon Liambotis: [C: 032] contint: use Apache instead of os_version [puppet] - 10https://gerrit.wikimedia.org/r/178919 (owner: 10Faidon Liambotis) [21:02:07] chrismcmahon: I really can't confirm any better than reporters can tbh. I think it fixed the issue, though. the cause was pretty straightforward and obvious [21:02:08] (03PS1) 10Faidon Liambotis: reprepro: add source logstash to trusty-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/178921 [21:02:17] gwicke: pong [21:02:23] bblack: thanks! [21:02:24] (03CR) 10Faidon Liambotis: [C: 032] reprepro: add source logstash to trusty-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/178921 (owner: 10Faidon Liambotis) [21:02:31] (03CR) 10Faidon Liambotis: [V: 032] reprepro: add source logstash to trusty-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/178921 (owner: 10Faidon Liambotis) [21:03:06] bblack: re public end points for services -- we'll need one for restbase soon as well [21:03:13] paravoid: good timing to move to jessie: http://www.markshuttleworth.com/archives/1434 [21:03:29] bblack: just writing up the ticket, moment.. [21:03:31] "ubuntu core" [21:03:39] bd808: done [21:03:53] paravoid: neat. I'll open a task to try it out in beta [21:05:28] ubuntu core seems pretty neat actually, for docker-y things [21:05:48] it's a bit weird [21:05:54] I booted it [21:05:59] it's a regular vivid system [21:06:06] 21:05:27 sudo -u mwdeploy -n -- /srv/deployment/scap/scap/bin/scap-rebuild-cdbs on mw1176 returned [255]: Error reading response length from authentication socket. [21:06:06] Permission denied (publickey). [21:06:12] bblack: https://phabricator.wikimedia.org/T78194 [21:06:13] it even says so on getty/motd [21:06:23] it runs all the regular stuff a Ubuntu base install runs [21:06:27] but then has no apt-get [21:06:30] it has this snappy thing [21:06:35] but... why? [21:06:40] why not have them coexist [21:06:58] !log got 21:05:27 sudo -u mwdeploy -n -- /srv/deployment/scap/scap/bin/scap-rebuild-cdbs on mw1176 returned [255]: Error reading response length from authentication socket. Permission denied (publickey). from mw1176 [21:07:04] Logged the message, Master [21:07:21] yeah that part I don't really grok (lack of apt-get) [21:07:35] Reedy: It lets me in. Shared agent key missing there? [21:07:43] my naive assumption would have been that the best way to build "ubuntu core" would be to build it up via apt and snapshot it as "core" releases. [21:07:58] just lacking most packages, and updateable via diff of the whole core snapshot for users. [21:07:59] bd808: Not sure. It worked fine for sync-apaches, it just borked on rebuild-cdbs [21:08:10] (but still apt-based on the side that generates those core snapshots) [21:08:15] (03PS2) 10Hashar: Basic rspec setup [puppet] - 10https://gerrit.wikimedia.org/r/178810 [21:09:02] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [21:09:11] (03CR) 10Hashar: "Ah sorry Dan, I should have written a TESTING.md documentation or something similar." [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [21:09:17] (03CR) 10GWicke: "We'll also need an entry point for restbase as discussed in https://phabricator.wikimedia.org/T78194. Maybe we could share some infrastruc" [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [21:09:53] coreos did the same, building from a gentoo/chromiumos base [21:10:08] (03CR) 10Ori.livneh: [C: 04-1] "Let's make the package present (rather than latest) on CI as well." [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [21:10:48] !log reedy Finished scap: testwiki to 1.25wmf12 (duration: 43m 21s) [21:10:53] Logged the message, Master [21:11:50] bblack: that would make too much sense [21:12:26] jamesofur: someone is looking for emergency, did you get it ? [21:13:07] (03CR) 10Hashar: "> Let's make the package present (rather than latest) on CI as well." [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [21:13:45] * jamesofur looks [21:13:50] (03CR) 10Reedy: [C: 032] wikipedias to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178910 (owner: 10Reedy) [21:13:56] finally :) [21:14:05] (03Merged) 10jenkins-bot: wikipedias to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178910 (owner: 10Reedy) [21:14:06] jamesofur: pm me if need help [21:14:06] matanya: not since 2am [21:14:14] pming [21:14:44] RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:15:31] !log manually ran scap-rebuild-cdbs on mw1176 [21:15:35] Logged the message, Master [21:16:30] RECOVERY - puppet last run on ssl3003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:16:30] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf11 [21:16:37] Logged the message, Master [21:18:20] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:19:41] (03CR) 10Reedy: [C: 032] group0 to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178911 (owner: 10Reedy) [21:19:50] (03Merged) 10jenkins-bot: group0 to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178911 (owner: 10Reedy) [21:20:14] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf12 [21:20:20] Logged the message, Master [21:22:02] Reedy: I wonder if sometimes the ssh-agent on tin gets overloaded by the request volume. That might explain the intermittent failures during a scap. [22:04:44] (03PS3) 10Dduvall: Basic rspec setup [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [22:06:29] (03PS1) 10BBlack: Reclaim ssl[13]00x [puppet] - 10https://gerrit.wikimedia.org/r/178973 [22:09:48] (03CR) 10Dduvall: "Thanks for getting me over that hurdle, Antoine." [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [22:18:37] (03CR) 10BBlack: [C: 032] Reclaim ssl[13]00x [puppet] - 10https://gerrit.wikimedia.org/r/178973 (owner: 10BBlack) [22:27:06] (03PS1) 10BryanDavis: beta: change monolog config to dynamically generated [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178977 [22:27:08] (03PS1) 10BryanDavis: Configure logging to use MWLoggerMonologSpi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178978 [22:27:44] (03CR) 10BryanDavis: [C: 04-2] "I76d9953 needs to be tested in beta first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178978 (owner: 10BryanDavis) [22:34:27] (03PS1) 10BBlack: remove ssl[13]00x from DNS [dns] - 10https://gerrit.wikimedia.org/r/178985 [22:36:08] (03CR) 10BBlack: [C: 032] remove ssl[13]00x from DNS [dns] - 10https://gerrit.wikimedia.org/r/178985 (owner: 10BBlack) [22:36:46] (03CR) 10BryanDavis: [C: 032] "beta only change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178977 (owner: 10BryanDavis) [22:36:56] (03Merged) 10jenkins-bot: beta: change monolog config to dynamically generated [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178977 (owner: 10BryanDavis) [22:38:05] !log bd808 Synchronized wmf-config/logging-labs.php: Beta monolog config (I76d9953) (duration: 00m 05s) [22:38:12] Logged the message, Master [22:38:33] greg-g, we have some deployment to do today, if bd808 and gwicke are done, any objections? [22:38:51] * bd808 is {{done}} [22:39:12] jouncebot: next [22:39:12] In 1 hour(s) and 20 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141211T0000) [22:40:18] cscott, see yurikR ^^ [22:41:47] ok, should be enough time for me [22:47:34] (03PS1) 10MaxSem: Enable MobileFrontend on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178990 [22:54:16] (03CR) 10BryanDavis: "I76d9953 working fine in beta. Will propose for SWAT tomorrow." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178978 (owner: 10BryanDavis) [22:55:17] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: Puppet has 1 failures [22:57:12] yurikR: parsoid's not deploying today. i took too long reviewing changes. so go ahead. [22:57:27] greg-g: i'll probably try to squeeze in a parsoid deploy tomorrow. but i'll deal with that tomorrow. [22:57:29] * yurikR noted [23:07:22] (03PS2) 10BryanDavis: Configure logging to use MWLoggerMonologSpi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178978 [23:10:16] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:12:14] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: Puppet has 1 failures [23:12:23] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [23:12:41] (03CR) 10Aude: [C: 031] "would be nice to have Special:Nearby (it's been requested)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178990 (owner: 10MaxSem) [23:13:23] PROBLEM - HTTP 5xx req/min on graphite1002 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [23:14:03] !log yurik Synchronized php-1.25wmf11/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 06s) [23:14:08] Logged the message, Master [23:19:27] !log yurik Synchronized php-1.25wmf12/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 05s) [23:19:35] Logged the message, Master [23:19:50] !log yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master (duration: 00m 06s) [23:19:55] Logged the message, Master [23:21:50] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [23:23:02] RECOVERY - HTTP 5xx req/min on graphite1002 is OK: OK: Less than 1.00% above the threshold [250.0] [23:26:55] bd808, do you know if logstash is working? [23:26:59] fatalmon is emty [23:27:02] empty [23:27:25] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:27:26] yurikR: ?? https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor [23:27:34] has lots of stuff in my view [23:28:25] bd808, oh, its because somehow the "open" box has hid the fatalmonitor from the list [23:28:51] it only show N things in the list. you have to type to see other things [23:37:01] something is being weird with the update, i will trying scaping. [23:38:19] greg-g, bd808, csteipp - are there any issues with scaping now? [23:38:36] not a biggie for us, just in case [23:38:48] no issues that I'm aware of other than time [23:38:56] bd808, eta? [23:39:55] swat starts in 20 minutes. scap could take 20-30m [23:40:11] Hopefully shorter since it just happened this morning [23:40:19] oh, hmm, ok [23:40:27] scaping ... [23:40:55] !log yurik Started scap: ZeroBanner had some i18n changes, plus bits seems to be out of sync for it [23:41:02] Logged the message, Master [23:49:31] (03PS1) 10Hoo man: Compute $wgWBClientSettings['excludeNamespaces'] on demand [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179008 [23:54:29] (03PS1) 10MaxSem: Reenable WikiGrok UI on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179010