[00:01:20] (03CR) 10Dzahn: [C: 031] Tools: Install libcgi-fast-perl [operations/puppet] - 10https://gerrit.wikimedia.org/r/147866 (https://bugzilla.wikimedia.org/68269) (owner: 10Tim Landscheidt) [00:04:50] (03CR) 10Dzahn: [C: 04-1] "background image URLs are 404 – File not found" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/144640 (owner: 10ArielGlenn) [00:20:14] (03PS1) 10Dzahn: bugzilla - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148285 [00:20:47] (03PS2) 10Dzahn: bugzilla - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148285 [00:24:55] (03CR) 10Ori.livneh: [C: 032] mediawiki::jobrunner: deduplicate job groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/147871 (owner: 10Ori.livneh) [00:26:46] (03PS1) 10Plucas: Fix debian/bin/kafka [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148287 [00:28:25] (03PS1) 10Dzahn: phab - small lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/148288 [00:30:59] (03PS1) 10Dzahn: OTRS - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148289 [00:34:50] (03PS1) 10Dzahn: wikitech - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148290 [00:37:02] (03CR) 10Dzahn: [C: 031] "that being said, still " Max-age limits should be carefully considered as infrequent visitors may find your site inaccessible if you relax" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148289 (owner: 10Dzahn) [00:38:16] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Mon 21 Jul 2014 22:37:25 UTC [00:41:40] (03PS1) 10Dzahn: swift - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/148293 [00:45:26] (03PS1) 10Dzahn: the last tab char in any .pp file !? [operations/puppet] - 10https://gerrit.wikimedia.org/r/148295 [00:52:17] (03CR) 10Dzahn: "i know it seems stupid, but had to search once to see if we're finally done.. it seems we are after role/swift.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148295 (owner: 10Dzahn) [00:56:38] !log tungsten,fluorine, search1001-1006 - upgraded libssl [00:56:43] Logged the message, Master [01:06:05] oh noes, why people hate tabs in .pp so much? :P [01:08:17] because if we remove them all we can finally let puppet-lint vote via jenkins [01:08:31] puppet-lint hates them [01:09:15] so fix one file instead of 1000 ;) [01:09:27] it's the last damn one [01:09:39] but you have somebody unhappy no matter what, either it's "hate tabs" or "hate lint changes" or "hate the mix of spaces and tabs" [01:09:55] haters gonna hate [01:10:19] lets indent with 3 spaces [01:10:20] now to the next repo .. hehe [01:10:22] DNS [01:10:31] ah [01:10:32] hashar: groar [01:10:39] i thought you are sleeping :) *g* [01:10:44] so i could sneak something in [01:11:12] oh I should [01:11:23] only had a nap from 00:30 to 2:15am [01:11:38] going out now? :) [01:11:43] that is when my daughter started crying because she "could not keep her eyes closed" [01:11:52] and now I don't feel like sleeping:/ [01:11:55] "are you sleeping?" "yes" [01:12:59] hashar: drink a hot beer [01:13:45] or you can review https://gerrit.wikimedia.org/r/#/c/147168/4/templates/wikimedia.org until you you fall asleep :) [01:14:11] ahah [01:15:39] meanwhile Mac OS is upgraded [01:15:41] "Improves the reliability of waking from sleep" [01:15:57] for some reason my CPU skyrocket when resuming from sleep :-: [01:16:12] now you are talking about the computer resuming from sleep ?:) [01:16:31] yeah [01:16:34] applying it :D [01:16:36] :) [01:16:38] brb [01:17:36] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Tue Jul 22 01:17:33 UTC 2014 [01:22:16] mutante: speaking of puppet, I wish we had a nicer documentation [01:22:44] the puppet doc command is not very nice (i.e. https://doc.wikimedia.org/puppet/ ) [01:24:57] what do you mean? the doc command itself seems fine, but not enough actual comments in source [01:25:08] and many have a new line that prevents them from being found [01:25:12] mutante: it is ugly :-] [01:25:17] the frameset is [01:25:47] i would want deep links to a specific module [01:26:00] but it's not really possible to link [01:26:26] is that also created by puppet doc though? [01:26:58] well with frames there is not much we can do [01:27:50] hashar: we should have started a graph about the number of lint errors (per line) [01:27:55] it would look nice by now [01:28:27] If mode is not 'rdoc', then this command generates a Markdown document [01:28:28] ah [01:29:06] hashar: mediawiki extension that creates doc on wiki [01:29:18] it could find new classes, then make a page for each [01:29:33] potentially we could use "puppet doc" to generate a markdown document [01:29:37] then post process that [01:29:43] to have a nicer layout [01:29:55] http://puppetlabs.com/blog/automated-ebook-generation-convert-markdown-epub-mobi-pandoc-kindlegen [01:30:24] make OCG do it ? heh [01:31:07] :D [02:14:43] !log LocalisationUpdate completed (1.24wmf13) at 2014-07-22 02:13:40+00:00 [02:14:53] Logged the message, Master [02:37:30] !log LocalisationUpdate completed (1.24wmf14) at 2014-07-22 02:36:27+00:00 [02:37:35] Logged the message, Master [02:42:06] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [03:00:06] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [03:02:48] (03PS1) 10Springle: Labs MariaDB 10 all-shards replica configuration, post migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/148311 [03:02:59] hi springle [03:03:21] hi :) [03:05:16] springle: I was talking with Coren earlier about https://bugzilla.wikimedia.org/68356, which he said could be done after the mariadb 10 migration...do you have an ETA on how long that would take? It's blocking a monitoring tool for GlobalRename I'd like to have running by Friday, if that's not gonna happen, I'll look into alternative solutions [03:13:16] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 22 03:12:10 UTC 2014 (duration 12m 9s) [03:13:16] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Tue 22 Jul 2014 01:12:40 UTC [03:13:22] Logged the message, Master [03:13:45] legoktm: theoretically, the first mariadb 10 labsdb replica instance should be un by friday. it's actually up now, but still cloning upstream shards and soon importing user data [03:14:18] whether that helps your tool specifcially, i don't know. we're starting with s5 [03:14:49] I just need the centralauth.renameuser_status table available somehow, which is on s7 [03:15:32] which labsdb replica are you using? or rather, to which IP or name do you connect? [03:15:54] um, $ sql centralauth_p [03:16:42] _p not _f_p ... i guess that sends you directly to s7 [03:17:00] I think so [03:17:12] * springle checks s7 migration prospects [03:20:41] legoktm: is there data to migrate for your tool, or can it start afresh easily? [03:22:11] there's no previous data, the table should be empty on prod 99% of the time [03:22:59] basically CentralAuth uses the table store state and account locks, and I'm just writing a simple script to email me if the table isn't cleared after a certain amount of time [03:26:02] legoktm: then we can definitely do it before friday [03:26:21] :D thank you [03:26:23] (03PS1) 10TTO: Move RelatedSites config to wgExtraInterlanguageLinkPrefixes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148314 (https://bugzilla.wikimedia.org/41209) [03:27:36] (03CR) 10TTO: [C: 04-1] "Do not merge for now." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148314 (https://bugzilla.wikimedia.org/41209) (owner: 10TTO) [03:27:41] legoktm: s/before friday/by friday/ [03:27:56] works for me :) [03:28:15] excellent :) [03:29:56] PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: Puppet has 1 failures [03:32:24] (03CR) 10Springle: [C: 032] Labs MariaDB 10 all-shards replica configuration, post migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/148311 (owner: 10Springle) [03:33:36] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Tue Jul 22 03:33:31 UTC 2014 [03:47:57] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [03:50:36] (03CR) 10Chmarkine: [C: 031] bugzilla - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148285 (owner: 10Dzahn) [03:55:08] (03CR) 10Chmarkine: [C: 031] wikitech - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148290 (owner: 10Dzahn) [03:58:07] (03CR) 10Chmarkine: [C: 031] OTRS - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148289 (owner: 10Dzahn) [04:03:30] (03Abandoned) 10Chmarkine: blog -- update cipher suite list to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147739 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [05:31:14] !log authdns servers (mexia, rubidium, eeden) updated to gdnsd-1.11.4~precise1 [05:31:19] Logged the message, Master [05:33:16] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: Puppet has 1 failures [05:33:53] ^ that's me, apparently it ran while the package install was going on :P [05:35:16] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [05:53:15] (03PS1) 10BBlack: role::ocg::production -> add LVS config + better icinga check [operations/puppet] - 10https://gerrit.wikimedia.org/r/148318 [05:53:17] (03PS1) 10BBlack: node config for ocg100[123] [operations/puppet] - 10https://gerrit.wikimedia.org/r/148319 [05:56:37] (03CR) 10BBlack: [C: 032] role::ocg::production -> add LVS config + better icinga check [operations/puppet] - 10https://gerrit.wikimedia.org/r/148318 (owner: 10BBlack) [05:57:05] (03CR) 10BBlack: [C: 032] node config for ocg100[123] [operations/puppet] - 10https://gerrit.wikimedia.org/r/148319 (owner: 10BBlack) [06:00:27] PROBLEM - Disk space on ocg1002 is CRITICAL: Timeout while attempting connection [06:00:36] PROBLEM - puppet last run on ocg1001 is CRITICAL: Timeout while attempting connection [06:00:36] PROBLEM - DPKG on ocg1001 is CRITICAL: Timeout while attempting connection [06:00:37] PROBLEM - Disk space on ocg1001 is CRITICAL: Timeout while attempting connection [06:00:41] icinga you're awesome :P [06:00:46] PROBLEM - check configured eth on ocg1002 is CRITICAL: Timeout while attempting connection [06:00:46] PROBLEM - RAID on ocg1001 is CRITICAL: Timeout while attempting connection [06:00:47] PROBLEM - check if dhclient is running on ocg1002 is CRITICAL: Timeout while attempting connection [06:00:56] PROBLEM - RAID on ocg1002 is CRITICAL: Timeout while attempting connection [06:00:56] PROBLEM - puppet disabled on ocg1002 is CRITICAL: Timeout while attempting connection [06:00:57] PROBLEM - DPKG on ocg1002 is CRITICAL: Timeout while attempting connection [06:01:06] PROBLEM - check configured eth on ocg1001 is CRITICAL: Timeout while attempting connection [06:01:06] PROBLEM - check if dhclient is running on ocg1001 is CRITICAL: Timeout while attempting connection [06:01:07] PROBLEM - puppet last run on ocg1002 is CRITICAL: Timeout while attempting connection [06:01:07] PROBLEM - puppet disabled on ocg1001 is CRITICAL: Timeout while attempting connection [06:02:00] PROBLEM - they're brand new, give me a break :P [06:03:57] RECOVERY - check configured eth on ocg1001 is OK: NRPE: Unable to read output [06:03:57] RECOVERY - check if dhclient is running on ocg1001 is OK: PROCS OK: 0 processes with command name dhclient [06:04:06] RECOVERY - puppet disabled on ocg1001 is OK: OK [06:04:06] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 1564 seconds ago with 0 failures [06:04:26] RECOVERY - Disk space on ocg1002 is OK: DISK OK [06:04:27] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 1399 seconds ago with 0 failures [06:04:36] RECOVERY - Disk space on ocg1001 is OK: DISK OK [06:04:36] RECOVERY - check configured eth on ocg1002 is OK: NRPE: Unable to read output [06:04:46] RECOVERY - check if dhclient is running on ocg1002 is OK: PROCS OK: 0 processes with command name dhclient [06:04:46] RECOVERY - RAID on ocg1001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [06:04:46] RECOVERY - RAID on ocg1002 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [06:04:46] RECOVERY - puppet disabled on ocg1002 is OK: OK [06:05:36] RECOVERY - DPKG on ocg1001 is OK: All packages OK [06:05:56] RECOVERY - DPKG on ocg1002 is OK: All packages OK [06:18:47] (03PS2) 10BBlack: beta::natfix removal step 2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 [06:19:55] (03CR) 10BBlack: [C: 031] put contacts.wm.org behind misc. varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/146823 (owner: 10Dzahn) [06:21:15] (03PS4) 10BBlack: naggen2: only pick up resources older than 1 hour by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/145315 [06:23:11] (03CR) 10BBlack: [C: 032] naggen2: only pick up resources older than 1 hour by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/145315 (owner: 10BBlack) [06:28:12] (03PS1) 10QChris: Fix typo in log aggregation settings [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/148321 [06:29:03] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:03] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:03] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:13] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:43] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:54] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:13] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:13] PROBLEM - puppet last run on db1017 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:43] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:45:53] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:47:03] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:47:04] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:48:04] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:56:03] (03CR) 10Matanya: [C: 031] the last tab char in any .pp file !? [operations/puppet] - 10https://gerrit.wikimedia.org/r/148295 (owner: 10Dzahn) [06:56:50] (03CR) 10Matanya: "partial duplicate of https://gerrit.wikimedia.org/r/#/c/140654/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148293 (owner: 10Dzahn) [06:58:10] * bblack puts random tabs in his next puppet commit [06:59:13] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [07:01:38] bblack: I dare you! :) [07:03:13] RECOVERY - puppet last run on db1017 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [07:05:58] (03PS1) 10BBlack: fix missing comma in naggen2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148323 [07:06:04] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Complete puppet failure [07:07:24] (03PS2) 10BBlack: fix missing comma in naggen2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148323 [07:07:55] (03CR) 10BBlack: [C: 032 V: 032] fix missing comma in naggen2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148323 (owner: 10BBlack) [07:12:04] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [07:12:16] (03PS2) 10Ori.livneh: apache: ensure 'Include' wildcard globs always match [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 [07:14:16] (03CR) 10Matanya: mediawiki: lint (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148010 (owner: 10Ori.livneh) [07:19:13] (03CR) 10Ori.livneh: [C: 04-1] mediawiki: use mods-enabled, prepare for HAT (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [07:23:57] (03CR) 10Ori.livneh: "Why would we throw in a new mpm into the mix? fcgi w/ mpm worker isn't even faster than fcgi w / prefork in some cases. Prefork is safer, " [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [07:41:37] <_joe_> morning [07:46:03] <_joe_> ori: still around? [07:49:08] <_joe_> I was guessing where you got that apache's mpm worker isn't faster that prefork - there is clearly people claiming that and since in 8 years I've never seen that to be the case when I did transition, but quite the opposite, I was curious on what was your source of information. It may well be I don't know some details. [07:53:51] (03CR) 10Giuseppe Lavagetto: "mpm worker is the most developed AFAIK, works better with newer CPUs (threading is definitely more efficient that forking) and using it is" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [08:09:53] PROBLEM - check google safe browsing for wikimedia.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:09:53] PROBLEM - check google safe browsing for wikinews.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:09:53] PROBLEM - check google safe browsing for wikisource.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:09:54] PROBLEM - check google safe browsing for wikibooks.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:09:54] PROBLEM - check google safe browsing for wikiversity.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:10:03] PROBLEM - check google safe browsing for wikipedia.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:10:13] PROBLEM - check google safe browsing for wikiquotes.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:10:13] PROBLEM - check google safe browsing for wiktionary.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:10:43] RECOVERY - check google safe browsing for wikimedia.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 4145 bytes in 0.096 second response time [08:10:43] RECOVERY - check google safe browsing for wikiversity.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 3848 bytes in 0.092 second response time [08:10:44] RECOVERY - check google safe browsing for wikibooks.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 3918 bytes in 0.115 second response time [08:10:44] RECOVERY - check google safe browsing for wikisource.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 3845 bytes in 4.367 second response time [08:10:53] RECOVERY - check google safe browsing for wikinews.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 3913 bytes in 6.962 second response time [08:10:53] RECOVERY - check google safe browsing for wikipedia.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 3991 bytes in 0.095 second response time [08:11:04] RECOVERY - check google safe browsing for wikiquotes.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 3485 bytes in 0.152 second response time [08:11:04] RECOVERY - check google safe browsing for wiktionary.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 3923 bytes in 0.143 second response time [08:16:16] (03PS1) 10Hashar: beta: sort Parsoid localsettings entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/148329 [08:16:18] (03PS1) 10Hashar: beta: update Parsoid localsettings entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/148330 (https://bugzilla.wikimedia.org/65939) [08:19:34] (03CR) 10Hashar: [C: 031] "cherry picked on beta cluster puppetmaster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/148329 (owner: 10Hashar) [08:19:44] (03CR) 10Hashar: "cherry picked on beta cluster puppetmaster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/148330 (https://bugzilla.wikimedia.org/65939) (owner: 10Hashar) [08:20:37] (03PS8) 10Giuseppe Lavagetto: jobrunner: create hhvm-only jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 [08:27:15] (03CR) 10Alexandros Kosiaris: [C: 032] puppetmaster: qualify var [operations/puppet] - 10https://gerrit.wikimedia.org/r/147900 (owner: 10Matanya) [08:27:53] (03CR) 10Alexandros Kosiaris: [C: 032] nginx: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/147902 (owner: 10Matanya) [08:34:07] (03PS9) 10Giuseppe Lavagetto: jobrunner: create hhvm-only jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 [08:34:17] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Typos here and there" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148035 (owner: 10Matanya) [08:34:31] (03PS10) 10Giuseppe Lavagetto: jobrunner: create hhvm-only jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 [08:34:51] <_joe_> hi akosiaris [08:34:58] hey [08:35:13] hello guys :-D [08:35:49] akosiaris: while you are at triaging puppet patches. I have two simple one to tweak Parsoid configuration on the beta cluster : https://gerrit.wikimedia.org/r/148329 https://gerrit.wikimedia.org/r/148330 [08:35:56] both cherry picked on the local puppetmaster already :D [08:36:32] (03PS13) 10Hashar: sanity test for refreshWikiversionsCDB [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105698 [08:37:16] (03CR) 10Alexandros Kosiaris: [C: 032] beta: sort Parsoid localsettings entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/148329 (owner: 10Hashar) [08:37:18] (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunner: create hhvm-only jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 (owner: 10Giuseppe Lavagetto) [08:37:46] <_joe_> akosiaris: should I merge your change as well? [08:37:49] (03CR) 10Alexandros Kosiaris: [C: 032] beta: update Parsoid localsettings entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/148330 (https://bugzilla.wikimedia.org/65939) (owner: 10Hashar) [08:37:56] hashar: simple enough [08:38:10] _joe_: both :-) [08:38:22] <_joe_> akosiaris: ok [08:38:30] thanks [08:39:40] (03PS2) 10Matanya: ldap: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/148035 [08:40:02] sorry for the typos akosiaris [08:40:07] copy-paste fail [08:42:34] matanya: no worries. Thanks to all those patches the output rate of deprecation messages on puppetmaster is getting smaller every day :-) [08:43:03] kudos matanya ! :-] [08:43:14] :) [08:48:12] _joe_: will you get any time this week for my Zuul puppet patches or should I bother another person ? :-] [08:49:33] <_joe_> hashar: errr, as you may have guessed, I am a little short on time [08:55:46] _joe_: noticed that :-] [09:01:02] (03PS9) 10Hashar: zuul: migrate settings to role::zuul::configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/144709 [09:01:43] (03CR) 10Hashar: "Since this patch change the Gearman IP address to attach to (from 127.0.0.1 to gallium public IP), PS9 is adjusting the ferm rule." [operations/puppet] - 10https://gerrit.wikimedia.org/r/144709 (owner: 10Hashar) [09:01:52] will poke ops list [09:06:15] (03CR) 10Giuseppe Lavagetto: "@ori: Thanks for the comments, I catched a couple of things that will make the module much easier." (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [09:23:23] (03CR) 10Alexandros Kosiaris: "I am also up for being able to use other mpm's as well. worker is indeed quite stable and performs better (as in less memory usage - not r" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [09:31:44] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures [09:37:33] (03CR) 10Hashar: [C: 04-1] "Thanks for the cleanup!" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 (owner: 10BBlack) [09:40:33] (03PS3) 10Hashar: beta::natfix removal step 2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 (owner: 10BBlack) [09:41:16] (03CR) 10Hashar: [C: 031] "PS3 move the ferm rule used on beta cluster from role::ci::* to role::beta::scap_target" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 (owner: 10BBlack) [09:49:44] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:25:04] (03PS1) 10Alexandros Kosiaris: Stabilize dnsmasq-nova hash [operations/puppet] - 10https://gerrit.wikimedia.org/r/148345 [10:25:06] (03PS1) 10Alexandros Kosiaris: ganglia_view.json.erb variable qualification [operations/puppet] - 10https://gerrit.wikimedia.org/r/148346 [10:27:27] (03PS1) 10Matanya: ircecho: qualify var [operations/puppet] - 10https://gerrit.wikimedia.org/r/148347 [10:28:09] (03CR) 10Ricordisamoa: "@Reedy @Hoo man: is this ready now?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 (owner: 10Ricordisamoa) [10:28:35] apergos: https://gerrit.wikimedia.org/r/148094 want to merge? It's a trivial follow-up [10:28:52] we're going to backport the wikibase change today, so it would be nice to get that merged :) [10:30:22] (03CR) 10Hoo man: [C: 031] "I think so, but don't have enough time to properly review this now. You might just want to schedule it for a SWAT" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 (owner: 10Ricordisamoa) [10:31:55] (03PS1) 10Matanya: memcached: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/148348 [10:45:04] hoo: I assume you have tested this with a small run? [10:45:24] (03CR) 10Aude: [C: 031] Make use of new lines more consistent within wikidata json dumps [operations/puppet] - 10https://gerrit.wikimedia.org/r/148094 (owner: 10Hoo man) [10:54:02] hmm, _joe_ memorysize is a puppet fact, right ? [10:54:59] <_joe_> matanya: I guess so, but I'd have to check [10:55:08] <_joe_> btw, the compiler is working again [10:59:05] thanks! [10:59:41] (03CR) 10JanZerebecki: [C: 031] bugzilla - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148285 (owner: 10Dzahn) [11:02:29] (03PS3) 10JanZerebecki: bugzilla - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148285 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [11:02:38] (03PS2) 10Giuseppe Lavagetto: mediawiki: use mods-enabled, prepare for HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 [11:02:40] (03PS2) 10Giuseppe Lavagetto: mediawiki/apache: use ports.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/148098 [11:03:39] * YuviPanda is experimenting with git subtrees [11:03:48] (03PS2) 10JanZerebecki: OTRS - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148289 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [11:03:56] <_joe_> subtrees as a replacement of submodules? [11:07:56] _joe_: found it - http://docs.puppetlabs.com/facter/2.0/core_facts.html#memorysize [11:08:12] puppet docs are good, but so cluttered. [11:09:58] (03PS1) 10Matanya: mysql_wmf: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/148356 [11:10:08] (03CR) 10JanZerebecki: [C: 031] "In this context relax would mean not offering HTTPS that the browser trusts anymore. Yes that is exactly what we want to prevent. To remov" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148289 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [11:10:50] (03PS2) 10JanZerebecki: wikitech - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148290 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [11:11:40] _joe_: yup [11:11:44] (03CR) 10JanZerebecki: [C: 031] wikitech - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148290 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [11:12:02] _joe_: so I wanted to subtree the ops repo into another repo to play with, but of course the ops repo itself uses submodules, so submodules in subtrees don't work too well [11:12:27] but outside of that, I like what I'm seeing about submodules [11:12:34] err [11:12:35] subtrees [11:13:07] <_joe_> bbl, lunch [11:42:04] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [12:00:04] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:06:39] (03CR) 10Filippo Giunchedi: [C: 04-1] "-1 on bundling mpm switch with hhvm switch, minor comments" (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [12:38:33] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Tue 22 Jul 2014 10:37:46 UTC [12:40:33] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Tue 22 Jul 2014 10:39:38 UTC [12:51:18] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I agree in general, just doubtful of how puppet will behave." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 (owner: 10Ori.livneh) [12:53:00] (03PS1) 10Tim Landscheidt: gridengine::master::monitoring: Add dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/148367 [12:57:33] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Tue Jul 22 12:57:28 UTC 2014 [13:00:23] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Tue Jul 22 13:00:14 UTC 2014 [13:11:47] (03CR) 10Alexandros Kosiaris: [C: 032] beta: New script to restart apaches [operations/puppet] - 10https://gerrit.wikimedia.org/r/125888 (https://bugzilla.wikimedia.org/36422) (owner: 10BryanDavis) [13:30:46] (03CR) 10Ottomata: [C: 032 V: 032] "Cool! Good catch. This made me look up this property. I didn't know there were yarn CLI commands to get the app logs!" [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/148321 (owner: 10QChris) [13:31:58] (03PS1) 10Ottomata: Update cdh module with yarn.log-aggregation-enable fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/148369 [13:32:15] (03CR) 10Ottomata: [C: 032 V: 032] Update cdh module with yarn.log-aggregation-enable fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/148369 (owner: 10Ottomata) [13:32:42] akosiaris: ottomata: you are my heroes :-] Will add you as reviewers to my patches [13:34:08] (03PS1) 10Giuseppe Lavagetto: HAT: reinstall mw1053 with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/148370 [13:34:28] (03PS2) 10Giuseppe Lavagetto: HAT: reinstall mw1053 with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/148370 [13:40:25] (03PS1) 10Hashar: Beta: fill missing $lvs_service_ips['ocg'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/148371 [13:41:34] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [13:41:52] <_joe_> ottomata: sudo -i when puppet-merging ;) [13:42:21] hmm, _joe_, i think I am still logging in as root, mainly because I have a little local script that does this for me [13:42:30] so I don't have to manually log into palladium all the time [13:42:36] i SUPPOSE i should adjust my script... [13:42:55] let's see... [13:43:25] (03CR) 10Giuseppe Lavagetto: "http://puppet-compiler.wmflabs.org/170/change/148370/html/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148370 (owner: 10Giuseppe Lavagetto) [13:43:31] so, _joe_, I can't just sudo puppet-merge? [13:43:38] I actually have to change to root? [13:43:55] ssh palladium; sudo -i; puppet-merge; [13:43:55] ? [13:44:11] I can't: sudo puppet-merge; ? [13:44:20] hmm, i think I remember saying I would try to figure out why not... [13:44:54] <_joe_> ottomata: no you can't [13:45:03] <_joe_> it has to do with env variables [13:45:56] (03PS3) 10Giuseppe Lavagetto: HAT: reinstall mw1053 with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/148370 [13:46:28] yeah it has something to do with the value of $USER when the hook runs or something [13:46:31] hmm, ok _joe_, I might try to fix this...I just ran sudo puppet-merge, which seems to work ok if there are no changes to merge [13:46:34] hmmm oh the git hooks [13:46:34] hmmm [13:46:42] (03CR) 10Hashar: "bits got broken on beta cluster because of https://gerrit.wikimedia.org/r/#/c/146860/ which adds OCG has a varnish backend causing:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148371 (owner: 10Hashar) [13:46:46] <_joe_> ottomata: the git hooks, exactly [13:47:11] <_joe_> why is beta so different from prod? [13:47:19] * _joe_ sighs [13:47:29] oh nice, I broke something in beta (again) [13:47:41] <_joe_> bblack: you're in very good company [13:47:49] ottomata: I suspect the hook can be fixed by getting rid of the $USER check and replacing it with something based in `id` [13:52:31] bblack: well there is nothing set up to prevent you from breaking beta :-D [13:52:40] bblack: as long as prod is happy. I guess it is fine [13:54:22] <_joe_> hashar: yes the point is we should _not_ break beta [13:54:31] <_joe_> it's not necessary [13:54:49] <_joe_> so, why is beta so easy to break with prod changes? [13:55:02] <_joe_> because our puppet is not properly modularized I'd say [13:55:14] brandon added a new lvs service IP for something named 'ocg' [13:55:33] <_joe_> oh ok, data in manifests [13:55:33] $lvs_service_ips is a long hash that provide the backend IPs and is partionned by realm [13:55:37] it would be better with heira, or at least more-obvious that the only necessary fix was to make beta data too [13:55:38] in this case 'labs' was not correct [13:55:38] <_joe_> nevermind [13:55:40] <_joe_> yes yes [13:55:42] hiera :) [13:55:44] <_joe_> 'hiera' [13:55:47] <_joe_> :) [13:55:55] then sometime varnish conf is updated in puppet which is applied on beta automatically by beta [13:56:01] (03CR) 10Alexandros Kosiaris: [C: 032] HAT: reinstall mw1053 with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/148370 (owner: 10Giuseppe Lavagetto) [13:56:02] BUT varnish packages are not automatically updated [13:56:09] so there is sometime some oddities :-] [13:57:34] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [13:57:51] _joe_: and yeah hiera() would be nice [13:58:01] mark: make hiera happens pleaaaaase :D [13:58:18] it is, but the person who was working on that got assigned to HHVM ;) [13:58:21] so it's a bit delayed [13:58:45] happy to know it is on the roadmap [13:58:50] sure is [13:59:16] bblack: got some nice varnish error [13:59:16] WARNING: (-spersistent) file size reduced to 19770736640 (80% of available disk space) [13:59:16] Could not mmap SILO (/srv/vdb/varnish.main2) at target 1556389888, was mapped at 3231932416 instead [13:59:24] bblack: I though you one day told me to poke whenever that happened [13:59:45] just keep restarting it [13:59:51] ahh [13:59:54] or if it was the result of an apt-get install upgrade [14:00:04] that is an apt upgrade [14:00:05] "apt-get -f install" to retry the restart [14:00:07] http://paste.debian.net/110993/ [14:00:13] (so that it knows it finished its task) [14:00:44] <_joe_> !log reinstalling mw1053 in 5 minutes, downtime on icinga, puppet disabled, setting to 'false' everywhere in pybal [14:00:49] Logged the message, Master [14:01:06] sometimes it will take several attempts. if it's really bad, you can try doing something else first to bump some memory addresses around (run some other commands that consume memory and then exit) [14:01:19] (03CR) 10Andrew Bogott: "Can you explain in the commit message about why we need the client installed alongside the master?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148367 (owner: 10Tim Landscheidt) [14:02:36] bblack: sounds easy. [14:03:16] yeah it's a really lame bug, and upstream will never fix it because they're deprecating the persistent engine altogether in 4.x ... [14:03:31] <_joe_> oh yea [14:04:17] (03CR) 10Andrew Bogott: [C: 032] wikitech - raise max-age for STS to 1 year [operations/puppet] - 10https://gerrit.wikimedia.org/r/148290 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [14:05:19] (03PS2) 10Andrew Bogott: Tools: Install libcgi-fast-perl [operations/puppet] - 10https://gerrit.wikimedia.org/r/147866 (https://bugzilla.wikimedia.org/68269) (owner: 10Tim Landscheidt) [14:14:45] (03CR) 10Andrew Bogott: [C: 032] Tools: Install libcgi-fast-perl [operations/puppet] - 10https://gerrit.wikimedia.org/r/147866 (https://bugzilla.wikimedia.org/68269) (owner: 10Tim Landscheidt) [14:20:16] bblack: got varnish fixed by deleting the persistent storage files :-] [14:20:53] that works too :) [14:21:01] did it just keep failing over and over? [14:21:24] in prod it's not really an option, you have to keep restarting till it randomly gets the right memory address :( [14:21:44] bblack: yeah it kept failing :/ [14:21:51] even after a reboot [14:21:54] nice [14:22:00] may be that VMs are worse at it [14:22:27] I have a potential long-term fix for it when I get around to it [14:22:29] (03CR) 10Manybubbles: [C: 031] Added mrwiki to commonsuploads.dblist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147906 (https://bugzilla.wikimedia.org/68292) (owner: 10Vogone) [14:22:52] (which is to add a commandline parameter to set the memory address explicitly, and we know some good ranges that are never taken on x86-64) [14:23:09] but then even after that rolls out, each machine will only be fixed after wiping and recreating its cache [14:23:19] but then if persistent storage is deprecated in varnish 4.x what will happen? [14:23:27] do they have a new system to replace it? [14:23:39] well, barring a better plan arising, we'll just keep maintaining it in our fork [14:23:50] I don't think they'll actually remove it for a while, they just don't care if future changes break it [14:25:45] (03PS2) 10Hoo man: Set Wikibase client's allowArbitraryDataAccess to false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147129 [14:25:56] aude: ^ fyi [14:26:01] no also covers testwikidata [14:26:05] * now [14:26:48] (03CR) 10Alexandros Kosiaris: [C: 032] mediawiki::web: use floor/min instead of inline_template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147511 (owner: 10Ori.livneh) [14:27:58] hmmmm [14:28:09] i'll have to rebase on top of that [14:28:16] dobut that [14:28:26] your other change is in an other point of the file AFAIR [14:28:28] really even in the cases where the -spersistent cache has good hitrate, we don't *need* the persistence 99% of the time. Even on varnish restarts for upgrades, we could pace them out slow enough that the impact isn't huge. [14:28:30] * place [14:28:42] so they should both merge clenaly [14:28:44] * cleany [14:28:46] nope [14:28:48] * cleanly :P [14:28:48] the one case that makes it worth it is if we lost power to a datacenter and all the caches rebooted and wiped at once [14:28:52] mh :S [14:30:35] aude: we're talking about https://gerrit.wikimedia.org/r/147888 right? [14:30:54] nope [14:31:02] (03PS1) 10Aude: Add config to allow test.wikidata to be a Wikibase client [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148382 [14:31:07] i can rebase [14:31:28] we'll want that patch also though [14:36:11] (03PS2) 10BBlack: Beta: fill missing $lvs_service_ips['ocg'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/148371 (owner: 10Hashar) [14:36:13] akosiaris: would 9am UTC tomorrow works for you? [14:36:37] (03CR) 10BBlack: [C: 031] Beta: fill missing $lvs_service_ips['ocg'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/148371 (owner: 10Hashar) [14:37:57] hashar: yes, sounds fine [14:40:13] (03CR) 10Hoo man: [C: 04-1] Add config to allow test.wikidata to be a Wikibase client (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148382 (owner: 10Aude) [14:40:45] (03PS2) 10Aude: Add config to allow test.wikidata to be a Wikibase client [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148382 [14:41:26] (03PS2) 10Tim Landscheidt: gridengine::master::monitoring: Add dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/148367 [14:44:05] <_joe_> 1log removed old, unused puppet 2.7 packages from reprepro for trusty [14:45:48] _joe_: that uh didn't work out :) [14:46:48] hm, _joe_, I'm not so sure about the $USER git hook thing, as far as I can tell, su - $git_user sets $USER properly [14:47:03] hmmm [14:48:03] <_joe_> !log removed old, unused puppet 2.7 packages from reprepro for trusty [14:48:08] Logged the message, Master [14:48:09] doing more testing... [14:48:16] (03PS2) 10Rush: phab - small lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/148288 (owner: 10Dzahn) [14:48:22] (03CR) 10Rush: [C: 031] "nice" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148288 (owner: 10Dzahn) [14:51:25] (03PS1) 10BryanDavis: beta: Fix signal sent by beta-apaches script [operations/puppet] - 10https://gerrit.wikimedia.org/r/148386 [14:52:58] (03CR) 10BryanDavis: beta: New script to restart apaches (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125888 (https://bugzilla.wikimedia.org/36422) (owner: 10BryanDavis) [14:58:00] (03CR) 10Andrew Bogott: [C: 032] gridengine::master::monitoring: Add dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/148367 (owner: 10Tim Landscheidt) [14:58:06] (03PS3) 10Aude: Add config to allow test.wikidata to be a Wikibase client [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148382 [14:58:14] Is someone actually doing the SWAT? :) [14:58:51] marktraceur: me [14:59:02] Wow, cool. [14:59:06] Vogone: are you around to verify your SWAT? [14:59:38] I thought I'd be on vacation today - but I'm having to swap this morning with friday afternoon so I can drive [14:59:47] I haven't asked robla about it yet, but I imagine it'll be ok [15:00:04] The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140722T1500) [15:01:11] manybubbles: the change seems simple [15:01:22] marktraceur: yeah -I'm not worried [15:01:27] K [15:01:32] Yesterday was sort of in the air [15:01:35] Figured I'd check in. [15:01:37] Nemo_bis is probably also around to verify? (as he filed the bug) [15:01:52] MatmaRex: I'll do my change first to give someone a chance to show [15:01:53] James_F|Away|Awa had said I should sign up to help out for the 08:00 slot [15:02:06] Least I can do is make sure someone's actually around for it :) [15:02:26] * Nemo_bis shrugs [15:04:33] ok, _joe_, any more insight as to why puppet-merge fails with sudo? I'm doing some tests on palladium with a separate repo and a modified puppet-merge script [15:04:36] as far as I can tell [15:04:43] the $USER in git hooks is properly gitpuppet [15:04:55] PROBLEM - MySQL Processlist on db1002 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 199 statistics [15:06:40] !log manybubbles Synchronized php-1.24wmf14/extensions/CirrusSearch/: SWAT small cirrus fixes (duration: 00m 08s) [15:06:45] Logged the message, Master [15:06:55] RECOVERY - MySQL Processlist on db1002 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 16 statistics [15:09:23] (03CR) 10Manybubbles: [C: 032] Added mrwiki to commonsuploads.dblist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147906 (https://bugzilla.wikimedia.org/68292) (owner: 10Vogone) [15:09:30] ok - my change is in and looks good [15:09:32] (03Merged) 10jenkins-bot: Added mrwiki to commonsuploads.dblist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147906 (https://bugzilla.wikimedia.org/68292) (owner: 10Vogone) [15:09:38] now for this next one [15:09:52] (03CR) 10Ottomata: "Is there a use case where the zuul user will need to exist without the main zuul (init.pp) class included? If not, then there probably is" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145278 (owner: 10Hashar) [15:10:44] !log manybubbles Synchronized commonsuploads.dblist: SWAT add mrwiki to commonsuploads list (duration: 00m 08s) [15:10:49] Logged the message, Master [15:11:10] (03CR) 10Ottomata: [C: 032] admin: contint-admins can now sudo as 'zuul' [operations/puppet] - 10https://gerrit.wikimedia.org/r/145289 (owner: 10Hashar) [15:11:25] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - touching InitializeSettings.php to make dblist change go (duration: 00m 06s) [15:11:30] Logged the message, Master [15:12:39] !log done with SWAT [15:12:44] (03PS4) 10Aude: Add config to allow test.wikidata to be a Wikibase client [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148382 [15:12:44] Logged the message, Master [15:12:46] (03CR) 10Hashar: "Yup later on we might not need a zuul user. Zuul has three components:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145278 (owner: 10Hashar) [15:15:36] (03CR) 10Ottomata: "This change looks good, but I'm not sure about the copying of ssh keys. It seems to me that it woudl be better to generate a new ssh key " [operations/puppet] - 10https://gerrit.wikimedia.org/r/145290 (owner: 10Hashar) [15:16:26] (03CR) 10Ottomata: "Ok, cool, LGTM. Alex, you ok with this?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145278 (owner: 10Hashar) [15:16:29] akosiaris: ^ [15:17:53] (03CR) 10Ottomata: [C: 031] "Ha, ok with me. I find it crazy that you got away with being allowed to use pip in the first place (instead of a .deb!). Lucky you!!! :D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145300 (owner: 10Hashar) [15:18:08] ottomata: sorry the patches are a bit crazy [15:18:19] ottomata: some prepare the field for a followup patch [15:18:27] ja its cool [15:19:06] oooo, role::cache::configuration! [15:19:21] hashar, q: is there a use case where you will need to know labs config from prod, or prod config from labs? [15:19:29] e.g. if $::realm == 'production' [15:19:33] would you ever try to access: [15:19:53] $role::zuul::configuration['labs']['some_config_key'] [15:19:54] ? [15:20:26] So I guess operations/puppet.git doesn't set salt grains for each applied role anymore? [15:21:26] Apparently not for some time -- https://github.com/wikimedia/operations-puppet/commit/0c9f2b5e8fb27af01f5b8866ef71920220d61042 [15:21:38] ottomata: so yeah I reused the technique used for varnish caches to vary configuration between prod and labs [15:22:38] ottomata: the whole point (and I should have explained it) is to move from realm based class to role based class [15:23:01] ottomata: so instead of zuul::production and zuul::labs I will end up with zuul::server and zuul::merger and the config is looked up via ::realm [15:23:08] sounds easier to maintain [15:24:04] (03CR) 10BryanDavis: "Do we have any reasonable path to bring this functionality back? Being able to target salt commands by system role was quite handy in beta" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123834 (owner: 10Dzahn) [15:24:16] and to reply, no I will never have to use labs settings on prod or prod settings on labs [15:24:32] the conf would just be deplicated in the config hash [15:24:57] yes, hashar i like this setup too [15:25:06] i just personally prefer not setting the values for the alternate env at all [15:25:07] e.g. [15:25:08] this: [15:25:11] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/analytics/hadoop.pp [15:25:46] that completely removes the needs for any realm named classes [15:26:14] yeah same result :-} [15:26:25] I found editing a hash easier to figure out what is going on [15:26:43] but yeah that is very similar [15:29:33] manybubbles: yes, I'm here … sorry for answering so late, but I see it's already been merged :) [15:29:59] Vogone: no problem in this case:) it was pretty simple so we figured we'd just merge and deploy [15:30:09] can you verify that it took? [15:30:31] (03CR) 10BryanDavis: "Cherry-picked to beta and tested:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148386 (owner: 10BryanDavis) [15:30:52] hashar, do you like using ::labs and ::production classes over not having to think about it :p? [15:31:29] ottomata: I just want to apply role::zuul::server regardless of the realm :-] [15:31:30] i'm fine with it if you like it better, as I think it is a personal preference, but it seems much cleaner to me to not have to worry about what realm is as a user of a class [15:31:55] one of the patch get rid of role::zuul::production and role::zuul::labs entirely [15:32:03] in favor of applying role::zuul::server and role::zuul::merger [15:32:07] (03CR) 10Ori.livneh: "Puppet does the right thing: the almost-empty source directory does not clobber any additional file resources declared to be in it. It's a" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 (owner: 10Ori.livneh) [15:32:24] (03PS2) 10BryanDavis: beta: Fix signal sent by beta-apaches script [operations/puppet] - 10https://gerrit.wikimedia.org/r/148386 [15:33:36] oh oh, i see that [15:33:38] ok cool, i like that [15:33:50] cool cool [15:34:00] (03CR) 10Filippo Giunchedi: [C: 04-1] apache: ensure 'Include' wildcard globs always match (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 (owner: 10Ori.livneh) [15:34:07] mutante, are you the king of mailman? I'm wondering if we can change a default setting... [15:34:22] ottomata: yeah sorry I should have given a bit more context in my ops list [15:34:25] ops mail [15:34:27] (03CR) 10BryanDavis: [C: 04-1] "The code is right now, but it won't actually work until we have a reasonable way to register grains with the salt master via puppet runs. " [operations/puppet] - 10https://gerrit.wikimedia.org/r/148386 (owner: 10BryanDavis) [15:34:36] (03CR) 10Ottomata: [C: 032] "I like this change, especially since a follow up commit (https://gerrit.wikimedia.org/r/#/c/145047/) replaces the ::labs and ::production " [operations/puppet] - 10https://gerrit.wikimedia.org/r/144708 (owner: 10Hashar) [15:35:45] (03CR) 10Filippo Giunchedi: [C: 031] mediawiki/apache: use ports.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/148098 (owner: 10Giuseppe Lavagetto) [15:35:54] (03CR) 10Ottomata: [C: 032] zuul: migrate settings to role::zuul::configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/144709 (owner: 10Hashar) [15:35:55] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:36:16] (03CR) 10Ottomata: [C: 032] zuul: remove $zuul_url from zuul::server [operations/puppet] - 10https://gerrit.wikimedia.org/r/144997 (owner: 10Hashar) [15:36:44] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.003 second response time [15:37:05] (03CR) 10Ottomata: [C: 032] zuul: phase out zuulwikimedia [operations/puppet] - 10https://gerrit.wikimedia.org/r/145047 (owner: 10Hashar) [15:37:27] (03CR) 10Ottomata: [C: 032] zuul: introduce 'zuul' system user [operations/puppet] - 10https://gerrit.wikimedia.org/r/145278 (owner: 10Hashar) [15:37:46] (03CR) 10Ottomata: [C: 031] zuul: switch to run as 'zuul' user BREAKING CHANGE [operations/puppet] - 10https://gerrit.wikimedia.org/r/145290 (owner: 10Hashar) [15:38:12] ok, I'm off to this wikilead thing, ping chrismcmahon if things go sideways [15:38:21] ok, hashar, I went through them all, and except for my comments about your migration process (chown, etc.), its cool w me [15:38:33] I'd really like to get akosiaris to at least +1 the ::user class [15:39:48] ottomata: do comment on Gerrit [15:40:00] I did, ja? [15:40:35] akosiaris: user change is this one: https://gerrit.wikimedia.org/r/#/c/145278 [15:40:40] (03PS3) 10Ori.livneh: apache: ensure 'Include' wildcard globs always match [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 [15:40:44] godog: ^ [15:41:06] hashar: migration plan comments here: https://gerrit.wikimedia.org/r/#/c/145290/ [15:41:51] (03CR) 10Ottomata: "i.e." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145290 (owner: 10Hashar) [15:42:11] greg-g: have fun [15:43:59] all ok on mr.wiki: You do not have permission to upload this file, for the following reason: The action you have requested is limited to users in the group: Administrators. [15:44:14] <_joe_> ori: hey! [15:44:20] hey _joe_ [15:44:21] (03PS2) 10Ori.livneh: mediawiki::web: use floor/min instead of inline_template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147511 [15:44:43] _joe_: i don't mind switching to mpm worker, i just want to separate that migration from this one [15:44:57] <_joe_> ori: ok look at my last patch [15:44:59] _joe_: don't forget tim is really familiar with prefork and that's something to consider too [15:45:01] * ori does [15:45:02] ottomata: great thanks a ton :-) [15:45:06] hey, if anybody has some puppet merges to puppet-merge [15:45:09] let me know [15:45:14] <_joe_> it's very easy to do that this way [15:45:15] ottomata: will catch up with Alexandros tomorrow probably get everything merged tomorrow [15:45:15] i'd like to do them so I can do some sudo experiements [15:45:17] ottomata: i do [15:45:25] (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki::web: use floor/min instead of inline_template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147511 (owner: 10Ori.livneh) [15:45:28] that one ^ [15:45:34] ah ok [15:45:34] cool [15:45:53] <_joe_> once you assume you don't need non-hhvm trustys [15:46:26] manybubbles: see Nemo's comment above :) [15:46:38] sweet [15:46:40] thanks [15:46:47] (03CR) 10Filippo Giunchedi: [C: 031] apache: ensure 'Include' wildcard globs always match [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 (owner: 10Ori.livneh) [15:46:54] ori: aye! TYVM sir [15:47:07] huh [15:47:09] thank you [15:47:11] ori, done [15:47:17] _joe_, i just did sudo puppet-merge on palladium [15:47:20] everything looked fine to me... [15:47:35] <_joe_> ... [15:47:47] the post-merge hook ran fine too [15:47:52] checked strontium puppet clone [15:47:56] looks like it got updated fine [15:48:02] the change is bad [15:48:05] blech. [15:48:12] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: comparison of String with 98 failed at /etc/puppet/modules/mediawiki/manifests/web.pp:16 on node mw1041.eqiad.wmnet [15:48:19] i'll revert in a sec [15:48:25] k [15:48:27] ottomata: was your ssh agent forwared by any chance? [15:48:51] nope [15:49:09] <_joe_> ori: deactivate puppet on mw* [15:49:16] ottomata: bah :( [15:49:17] <_joe_> I'll do that [15:49:30] godog: ? [15:49:48] _joe_: done [15:50:03] _joe_, godog: i'm going to email ops list and ask if anyone remembers why sudo puppet-merge is purported to not work [15:50:18] <_joe_> ottomata: just observation [15:50:29] haha, k would love to reproduce myself! :) [15:50:33] ottomata: oh ok so nothing changed but it worked? sadface because I couldn't understand why :) [15:50:55] <_joe_> godog: eh. [15:51:15] <_joe_> ottomata: what did you merge exactly? [15:51:21] <_joe_> a submodule change? [15:51:41] no, just a simple change from ori [15:51:57] were they submodule changes that casued problems before? [15:53:14] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Tue 22 Jul 2014 13:52:54 UTC [15:54:38] <_joe_> ottomata: nope [15:55:02] k [15:55:07] sent email. [15:55:08] thanks [15:55:20] _joe_, try just doing sudo puppet-merge for a while [15:55:27] maybe you'll reproduce and be able to tell me how [15:55:35] <_joe_> ok [15:55:40] bah will try to reproduce if I need to merge as well [15:55:45] PROBLEM - check if dhclient is running on virt1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:56:00] cool thanks [15:56:44] RECOVERY - check if dhclient is running on virt1008 is OK: PROCS OK: 0 processes with command name dhclient [15:57:49] (03PS3) 10Hashar: zuul: switch to run as 'zuul' user BREAKING CHANGE [operations/puppet] - 10https://gerrit.wikimedia.org/r/145290 [15:58:05] (03CR) 10Hashar: "Updated commit message for the chown command" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145290 (owner: 10Hashar) [15:58:27] (03PS1) 10Ori.livneh: Fix stdlib's min() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148391 [15:58:49] <_joe_> lol [15:58:57] <_joe_> "fix puppet" [15:59:07] +1? [15:59:27] ottomata: what happens when you sudo puppet-merge? [15:59:38] Or, actually, is it just that you're doing puppet-merge vs. puppet merge? [16:00:04] The time is nigh to deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140722T1600) [16:00:09] _joe_: going to merge, don't want to leave puppet broken any longer, can revert if you find an issue [16:00:22] <_joe_> ori: ehy [16:00:23] (03CR) 10Ori.livneh: [C: 032] Fix stdlib's min() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148391 (owner: 10Ori.livneh) [16:00:37] <_joe_> I see there is a removal of mediawiki::web::sites there [16:00:45] <_joe_> did you forget to do that before? [16:00:53] <_joe_> you moved it to the role, right? [16:01:01] <_joe_> I'm a little bit worried by that [16:01:20] <_joe_> try to run puppet on one api appserver and on one appserver first [16:01:21] i did [16:01:44] <_joe_> eh let me check one thing :) [16:02:44] PROBLEM - puppet last run on mw1041 is CRITICAL: CRITICAL: Complete puppet failure [16:02:58] applied correctly [16:03:02] on mw1041 [16:03:06] dunno why it's complaining [16:03:14] Info: Caching catalog for mw1041.eqiad.wmnet [16:03:14] Info: Applying configuration version '1406044890' [16:03:14] Notice: /Stage[first]/Apt::Update/Exec[/usr/bin/apt-get update]/returns: executed successfully [16:03:16] Notice: /Stage[main]/Base::Puppet/Exec[neon puppet snmp trap]/returns: executed successfully [16:03:18] Notice: Finished catalog run in 48.12 seconds [16:03:35] <_joe_> ori: don't worry about that [16:03:44] RECOVERY - puppet last run on mw1041 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:05:30] (03PS4) 10Ori.livneh: apache: ensure 'Include' wildcard globs always match [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 [16:07:36] (03PS3) 10Filippo Giunchedi: releases: add reprepro repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/146826 [16:08:22] "Complete puppet failure" is amusing, I giggle every time [16:08:36] we should make it say "epic puppet fail" [16:09:17] hehehe, it also reminds me of the "malformed mime header" http://www2.b3ta.com/fp-archive/host/10124499-1.jpg [16:10:26] why did jouncebot not say my name? [16:10:52] (03CR) 10Alexandros Kosiaris: [C: 031] "yes, LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145278 (owner: 10Hashar) [16:10:54] aude: It's been made "polite" [16:11:01] grrrr [16:11:08] which for some reason made it forget names [16:11:13] oooo [16:11:15] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] releases: add reprepro repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/146826 (owner: 10Filippo Giunchedi) [16:11:16] m.walker knows and will fix when/if he has time [16:11:21] ok [16:11:37] or, take a look: https://github.com/mattofak/jouncebot :) [16:12:00] (03CR) 10Giuseppe Lavagetto: [C: 031] role::cache on beta: switch over appservers to HHVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/148263 (owner: 10Ori.livneh) [16:12:13] hoo: want me to deploy the config changes? [16:12:16] (just ran sudo puppet-merge on palladium btw) [16:12:21] then we can update submodule [16:12:22] (03PS2) 10Ori.livneh: role::cache on beta: switch over appservers to HHVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/148263 [16:12:29] (03CR) 10Ori.livneh: [C: 032 V: 032] role::cache on beta: switch over appservers to HHVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/148263 (owner: 10Ori.livneh) [16:12:33] aude: sounds good to me [16:12:35] * aude making submodule patch [16:12:40] (03PS5) 10Ori.livneh: apache: ensure 'Include' wildcard globs always match [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 [16:12:44] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Tue Jul 22 16:12:42 UTC 2014 [16:12:46] (03CR) 10Ori.livneh: [C: 032 V: 032] apache: ensure 'Include' wildcard globs always match [operations/puppet] - 10https://gerrit.wikimedia.org/r/148230 (owner: 10Ori.livneh) [16:13:25] (03PS1) 10Andrew Bogott: Reword failure message [operations/puppet] - 10https://gerrit.wikimedia.org/r/148394 [16:13:36] (03CR) 10Aude: [C: 032] Set Wikibase client's allowArbitraryDataAccess to false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147129 (owner: 10Hoo man) [16:13:50] (03PS1) 10Reedy: Non wikipedias to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148395 [16:14:17] (03Merged) 10jenkins-bot: Set Wikibase client's allowArbitraryDataAccess to false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147129 (owner: 10Hoo man) [16:14:42] (03CR) 10Ori.livneh: [C: 031] Reword failure message [operations/puppet] - 10https://gerrit.wikimedia.org/r/148394 (owner: 10Andrew Bogott) [16:14:48] Reedy: i see non-wikipedias to 1.24-wmf14 on tin [16:14:57] don't think i'm supposed to deploy that [16:15:12] aude: It should've been reverted after I pushed it... [16:15:16] ok [16:15:18] (03CR) 10Andrew Bogott: [C: 032] Reword failure message [operations/puppet] - 10https://gerrit.wikimedia.org/r/148394 (owner: 10Andrew Bogott) [16:15:24] ah yes [16:15:27] I see Merge "Added mrwiki to commonsuploads.dblist" at the top of mine [16:15:29] race condition :) [16:15:31] * aude too [16:17:02] (03CR) 10Aude: [C: 032] Add config to allow test.wikidata to be a Wikibase client [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148382 (owner: 10Aude) [16:17:09] !log reedy Purged l10n cache for 1.24wmf10 [16:17:14] Logged the message, Master [16:17:34] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet has 1 failures [16:17:45] PROBLEM - puppet last run on caesium is CRITICAL: CRITICAL: Puppet has 2 failures [16:18:10] !log reedy Purged l10n cache for 1.24wmf11 [16:18:16] Logged the message, Master [16:18:34] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: Puppet has 1 failures [16:18:35] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Puppet has 1 failures [16:18:35] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:18:37] (03Merged) 10jenkins-bot: Add config to allow test.wikidata to be a Wikibase client [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148382 (owner: 10Aude) [16:18:44] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Puppet has 3 failures [16:18:45] !log reedy Purged l10n cache for 1.24wmf12 [16:18:51] Logged the message, Master [16:18:52] (03PS2) 10Aude: set useRedirectTargetColumn setting to false for Wikibase [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147888 [16:18:55] puppet failures are puppetmaster 500s [16:18:55] PROBLEM - puppet last run on db60 is CRITICAL: CRITICAL: Puppet has 1 failures [16:18:55] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:18:58] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148231 (owner: 10Ori.livneh) [16:19:14] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: Puppet has 1 failures [16:19:20] (03PS3) 10Ori.livneh: beta: load site configs from /usr/local/apache/conf/all.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/148231 [16:19:34] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: Puppet has 3 failures [16:19:45] (03CR) 10Ori.livneh: [C: 032 V: 032] beta: load site configs from /usr/local/apache/conf/all.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/148231 (owner: 10Ori.livneh) [16:19:55] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: Puppet has 1 failures [16:20:12] (03CR) 10Aude: [C: 032] set useRedirectTargetColumn setting to false for Wikibase [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147888 (owner: 10Aude) [16:20:19] (03Merged) 10jenkins-bot: set useRedirectTargetColumn setting to false for Wikibase [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147888 (owner: 10Aude) [16:20:55] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Puppet has 1 failures [16:22:34] RECOVERY - puppet last run on strontium is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:23:52] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LGTM, but see comments for improvements" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148248 (owner: 10Ori.livneh) [16:25:41] !log aude Synchronized wmf-config/Wikibase.php: add settings for enabling WikibaseClient on test wikidata (duration: 00m 04s) [16:25:47] Logged the message, Master [16:26:06] !log aude Synchronized wmf-config/InitialiseSettings.php: enable WikibaseClient on test wikidata (duration: 00m 07s) [16:26:12] Logged the message, Master [16:26:30] That sounds quite recursive [16:26:34] alright, so it should be *not* enabled on wikidata and enabled on test wikidata [16:26:41] heh [16:27:21] (03PS1) 10Filippo Giunchedi: releases: add repository public key [operations/puppet] - 10https://gerrit.wikimedia.org/r/148398 [16:27:39] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] releases: add repository public key [operations/puppet] - 10https://gerrit.wikimedia.org/r/148398 (owner: 10Filippo Giunchedi) [16:27:43] Special:Version looks ok [16:27:48] looks correct [16:28:21] it will be necessary to purge items on test.wikidata to see the new site link section [16:28:38] and it will be missing messages (until thursday) but not a big deal [16:28:49] I can run scap in my window if you want [16:28:59] no [16:29:10] relevant changes are not merged and submodule not updated [16:29:15] * aude lazy and can wait :) [16:29:45] RECOVERY - puppet last run on caesium is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:29:57] now to update wikidata submodule for some bug fixes [16:30:02] then i am done [16:30:23] ottomata: thanks for the reviews :-] [16:31:51] (03PS2) 10Ori.livneh: mediawiki: add a trusty check to apache envvars [operations/puppet] - 10https://gerrit.wikimedia.org/r/148248 [16:32:27] (03PS3) 10Ori.livneh: mediawiki: add a trusty check to apache envvars [operations/puppet] - 10https://gerrit.wikimedia.org/r/148248 [16:32:38] (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki: add a trusty check to apache envvars [operations/puppet] - 10https://gerrit.wikimedia.org/r/148248 (owner: 10Ori.livneh) [16:35:11] * aude waits [16:35:14] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:35:54] RECOVERY - puppet last run on db60 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:35:55] RECOVERY - puppet last run on wtp1001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:36:34] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [16:36:35] RECOVERY - puppet last run on db1045 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:36:35] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:36:35] RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:36:44] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:36:54] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:37:09] (03CR) 10Ori.livneh: [C: 031] Beta: fill missing $lvs_service_ips['ocg'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/148371 (owner: 10Hashar) [16:37:54] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:41:50] where is jenkins? [16:42:49] oh, it doesn't report +2 in irc now [16:42:53] (03PS1) 10Mwalker: Moving Petition to cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148400 [16:43:42] (03PS1) 10Ori.livneh: mediawiki: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/148402 [16:43:59] (03Abandoned) 10Ori.livneh: mediawiki: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/148402 (owner: 10Ori.livneh) [16:44:26] (03PS3) 10Ori.livneh: mediawiki: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/148010 [16:46:07] (03CR) 10Ori.livneh: [C: 032] mediawiki: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/148010 (owner: 10Ori.livneh) [16:47:56] (03CR) 10Reedy: [C: 04-1] "Adding it to wmf-config/extension-list won't work if you only have it in one deployment branch." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148400 (owner: 10Mwalker) [16:48:29] !log aude Synchronized php-1.24wmf14/extensions/Wikidata: Update Wikidata: js and json dump fixes (duration: 00m 11s) [16:48:33] hoo: [16:48:33] Logged the message, Master [16:48:41] looking [16:48:51] (03CR) 10Reedy: "I see you're branching for both. Removing -1" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148400 (owner: 10Mwalker) [16:52:31] aude: lib/jquery.ui/jquery.ui.suggester.js [16:52:35] in Valueview [16:52:40] ok [16:53:36] Reedy, does everything else look OK with my prep to deploy petition? [16:53:55] mwalker: let me check [16:54:18] mwalker: Can you add it to mediawiki/tools/release.git make-wmf-branch/default.conf if you haven't already? [16:54:43] that would be https://gerrit.wikimedia.org/r/#/c/148401/ [16:55:02] sweet [16:55:05] !log aude Synchronized php-1.24wmf14/extensions/Wikidata/extensions/ValueView/lib/jquery.ui/jquery.ui.suggester.js: touch jquery.ui.suggester.js for Wikidata (duration: 00m 05s) [16:55:19] there's some caching weirdness but otherwise things work in debug mode [16:55:55] also i can add commons link in the wikidata section :/ [16:56:10] this is why we have this on test.wikidata :) [17:00:04] The time is nigh to deploy Petition (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140722T1700) [17:00:07] (03CR) 10Reedy: [C: 031] Moving Petition to cluster (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148400 (owner: 10Mwalker) [17:00:25] mwalker: LGTM. Just add the database tables before running scap [17:00:54] PROBLEM - puppet last run on db1058 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:04] (03CR) 10Mwalker: Moving Petition to cluster (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148400 (owner: 10Mwalker) [17:02:16] Reedy, awesome; thanks for the double check [17:02:45] !log aude Synchronized php-1.24wmf14/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.js: touch wikibase.js for test wikidata only, fix caching issues (duration: 00m 05s) [17:02:49] Logged the message, Master [17:02:54] hope that works [17:03:34] * aude rage [17:03:48] TOUCH ALL THE THINGS [17:03:53] missed updating our submodule! [17:03:59] haha [17:04:07] wondering why the bug fixes don't fix [17:06:36] last time [17:06:38] !log aude Synchronized php-1.24wmf14/extensions/Wikidata: Update Wikidata submodule for test wikidata, for real! (duration: 00m 06s) [17:06:44] Logged the message, Master [17:06:49] aude, let me know when you're done playing [17:06:59] think i am done, but will let hoo check [17:10:56] need to touch one more time [17:10:57] :( [17:11:14] I [17:11:17] m going to do a scap [17:11:21] if that's going to help you at all [17:11:32] probably not [17:11:47] aude: recursively touch the lot :D [17:12:15] !log aude Synchronized php-1.24wmf14/extensions/Wikidata/extensions/ValueView/lib/jquery.ui/jquery.ui.suggester.js: touch jquery.ui.suggester.js for Wikidata (duration: 00m 05s) [17:12:15] Reedy: find -name '*js' -exec touch {} \ [17:12:17] last one and then we give up / are done [17:12:18] :P [17:15:22] mwalker: go ahead [17:16:25] hokay! [17:16:37] (03CR) 10Mwalker: [C: 032] Moving Petition to cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148400 (owner: 10Mwalker) [17:16:43] (03Merged) 10jenkins-bot: Moving Petition to cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148400 (owner: 10Mwalker) [17:16:52] :) [17:17:54] RECOVERY - puppet last run on db1058 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [17:19:34] !log mwalker Started scap: Deploying Petition extension to the cluster [17:19:39] Logged the message, Master [17:20:45] (03CR) 10Gilles: "Done: https://gerrit.wikimedia.org/r/148414" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145132 (https://bugzilla.wikimedia.org/67525) (owner: 10Gergő Tisza) [17:39:52] (03CR) 10Physikerwelt: [C: 031] Describe Math related packages in a class [operations/puppet] - 10https://gerrit.wikimedia.org/r/115133 (https://bugzilla.wikimedia.org/61090) (owner: 10Hashar) [17:43:03] (03PS3) 10BBlack: Beta: fill missing $lvs_service_ips['ocg'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/148371 (owner: 10Hashar) [17:43:10] (03CR) 10BBlack: [C: 032 V: 032] Beta: fill missing $lvs_service_ips['ocg'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/148371 (owner: 10Hashar) [17:48:01] !log mwalker Finished scap: Deploying Petition extension to the cluster (duration: 28m 27s) [17:48:07] Logged the message, Master [17:48:12] (03PS1) 10Jgreen: add tmpfs for ocg production role, minor ocg manifest cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/148418 [17:49:13] (03PS4) 10BBlack: beta::natfix removal step 2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 [17:51:35] (03CR) 10BBlack: [C: 031] add tmpfs for ocg production role, minor ocg manifest cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/148418 (owner: 10Jgreen) [17:53:31] (03CR) 10Mwalker: [C: 031] add tmpfs for ocg production role, minor ocg manifest cleanup (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148418 (owner: 10Jgreen) [17:53:58] Nikerabbit: does MessageGroupStats::clear() often delete 0 rows? [17:59:55] AaronSchulz: sometimes (which reminds me... I still haven't looked into that bug) [18:00:04] The time is nigh to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140722T1800) [18:00:34] (03PS1) 10Ori.livneh: wmflib: add convenience funcs require_realm() and require_ubuntu() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148422 [18:00:36] mwalker: all done? [18:01:13] Reedy, all done [18:01:35] (03PS2) 10Reedy: Non wikipedias to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148395 [18:01:36] AaronSchulz: it is called every time someone makes a translation, so if the numbers haven't been repopulated in the between.... [18:01:41] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148395 (owner: 10Reedy) [18:02:11] (03Merged) 10jenkins-bot: Non wikipedias to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148395 (owner: 10Reedy) [18:03:01] bblack: do you feel like reviewing ? (i tested all cases in vagrant) [18:03:13] (03CR) 10Jgreen: [C: 032 V: 031] add tmpfs for ocg production role, minor ocg manifest cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/148418 (owner: 10Jgreen) [18:04:22] (03CR) 10Ori.livneh: "Should it perhaps be 'requires_ubuntu()' instead of 'require_ubuntu()'?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148422 (owner: 10Ori.livneh) [18:05:24] Nikerabbit: in one case I see DELETE and INSERT on tgs_group = 'page-Help:CirrusSearch', tgs_lang = 'en' coming in around the same time [18:05:29] (03CR) 10BBlack: [C: 031] wmflib: add convenience funcs require_realm() and require_ubuntu() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148422 (owner: 10Ori.livneh) [18:05:40] thank you [18:05:48] the job runners are in auto-commit mode, but it still deadlocks...which is annoying [18:06:08] of course that's still possible since mysql does not atomically require internal locks [18:06:33] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf14 [18:06:39] Logged the message, Master [18:08:44] (03PS1) 10Ori.livneh: graphite::web: bump uWSGI workers from 4 to 8 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148427 [18:10:24] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias back to 1.24wmf13 due to Wikidata and Cirrus fatals [18:10:29] Logged the message, Master [18:11:24] (03PS2) 10Ori.livneh: wmflib: add funcs requires_realm() and requires_ubuntu() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148422 [18:11:48] (03CR) 10Ori.livneh: [C: 032 V: 032] wmflib: add funcs requires_realm() and requires_ubuntu() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148422 (owner: 10Ori.livneh) [18:13:57] !log Running sync-common on mw1081 [18:14:02] Logged the message, Master [18:14:05] (03PS1) 10Ori.livneh: rcstream: require ubuntu >= trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/148429 [18:14:37] (03CR) 10Ori.livneh: [C: 032 V: 032] "(typo fix for If91d79178.)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148429 (owner: 10Ori.livneh) [18:15:34] PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: Epic puppet fail [18:16:34] RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [18:17:25] hmm, happens with two inserts too [18:17:32] (03PS1) 10RobH: labmon1001 dns assignment, rename from neodymium [operations/dns] - 10https://gerrit.wikimedia.org/r/148430 [18:17:38] andrewbogott: :)) (re: epic fail) [18:17:59] w00t labmon [18:18:01] Reedy: https://gerrit.wikimedia.org/r/#/c/148428/2/includes/Search/Result.php [18:18:14] for cirrus [18:19:17] chasemp: feedback svp re: https://gerrit.wikimedia.org/r/#/c/148427/ [18:19:33] (03CR) 10RobH: [C: 032] labmon1001 dns assignment, rename from neodymium [operations/dns] - 10https://gerrit.wikimedia.org/r/148430 (owner: 10RobH) [18:19:43] YuviPanda: yea im working on its install now [18:19:44] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /a/common/). [18:19:49] (03CR) 10Rush: [C: 031] "please" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148427 (owner: 10Ori.livneh) [18:19:55] surprised it's only 4 now :) [18:19:59] RobH: :D ty [18:20:10] chasemp: should it be even higher? [18:20:26] than 8, i mean [18:20:32] I think 4->8 is a significant step atm? [18:20:40] yes, wise. ok. [18:20:41] I'm a fan of let's try it and see [18:20:44] * ori nods. [18:20:59] (03CR) 10Ori.livneh: [C: 032] graphite::web: bump uWSGI workers from 4 to 8 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148427 (owner: 10Ori.livneh) [18:30:35] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [18:30:47] (03PS1) 10RobH: setting labmon1001 install parameters [operations/puppet] - 10https://gerrit.wikimedia.org/r/148434 [18:30:49] hrmm [18:30:53] someone didnt merge strontium [18:31:02] perhaps the sudo non -i issue? [18:31:14] ottomata: ^ you made list mention of that [18:31:22] and i backed you up, but we may have been wrong ;] [18:31:35] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [18:31:38] ... [18:31:47] (someone do that or on its own?) [18:32:01] <_joe_> ori: tungsten will die badly if we add more workers [18:32:11] OoooOoo [18:32:15] RobH, interesting [18:32:16] hey [18:32:21] ottomata, RobH: i did that [18:32:28] here's what happened: [18:32:50] i have puppet-merge on iron aliased to 'ssh -A palladium -t -- puppet-merge' [18:33:05] urgh [18:33:08] you are forwarding keys? [18:33:13] (you shouldnt do that ;) [18:33:20] <_joe_> EWWW [18:33:20] can't deploy mediawiki otherwise [18:33:22] <_joe_> :) [18:33:29] yea, but you can merge puppet changes iwthout ;] [18:33:35] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [18:33:40] ok, so, the strontium thing right now is not related to sudo puppet-merge? [18:33:47] no, it may be [18:33:56] cuz ori does the merge a bit differently than standard though no tusre [18:34:02] wow, typos. [18:34:06] my ~/.ssh/config on iron sets User = root [18:34:13] for palladium, that is [18:34:31] since I need root for the two things I do on that box (salt, puppet-merge) [18:34:39] should I do things differently? [18:34:42] ottomata: well, im about to do a merge and i'll do it as sudo puppet-merge on palladium, without the -i flag [18:34:47] ok cool [18:34:50] ja check it [18:35:04] ori: well, i dont alias anything at all cuz im paranoid about unintended operation, heh [18:35:07] _joe_: tungsten is doing better ever since i hacked mwprof to discard query metrics [18:35:09] but i ssh as myself into palladium [18:35:17] and run sudo things directly there, not via a forward [18:35:23] <_joe_> ori: oh great! [18:35:29] in fact, most of ops made a large push to eliminate key forwarding for all ops infrastructure [18:35:35] and other than mediawiki/apache, i think we've done it. [18:35:49] though we may be bringing it back for the new isntall key for labs host,s but its a special case [18:35:58] RobH: i only do it for palladium, and i can stop if it's an anti-pattern [18:36:03] in fact, i will [18:36:16] I'd suggest conforming to what most of ops is doing, only so you get the strange same errors we do =] [18:36:23] yes, my thought exactly [18:36:29] annoying, but we all had to deal with the annoyance a month or so back [18:36:41] i've adapted! [18:36:42] whenever chase and daniel finished the major push for admins refactor [18:36:44] cool [18:36:56] the alias has '-A' because i disable forwarding except for that circumstance and mw deploy [18:37:15] yea, same, except for just the mediawiki/apache deploy stuff [18:37:43] andrewbogott: i really don't know about changing the default settings so far, but TheHelpfulOne is back and he does tons of mailman stuff [18:37:43] it just flagged to our attention when strontium was out of sync [18:37:56] because we have an ongoing discussoin to track down exactly what sudo flags we need to make it work right [18:38:20] (03CR) 10RobH: [C: 032] setting labmon1001 install parameters [operations/puppet] - 10https://gerrit.wikimedia.org/r/148434 (owner: 10RobH) [18:38:32] ok, merging my change on palladium with just sudo puppet-merge [18:38:35] lets see how it goes [18:39:14] ottomata: So yea, my puppet-merge sees it sync to strontium and seems legit [18:39:26] So I'll reply back to your thread confirming what you put forth [18:40:23] (03CR) 10Ori.livneh: [C: 031] "+1, but going to make a couple of tiny changes" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [18:41:57] ok cool [18:42:41] mwalker: daily test please ? [18:43:23] matanya, ok; what do you have for me today? [18:43:32] come and see [18:44:32] http://208.80.155.185:5080/openmeetings/#room/ [18:44:48] mwalker: http://208.80.155.185:5080/openmeetings/#room/2 [18:45:11] !log reedy Synchronized php-1.24wmf14/extensions/CirrusSearch/: Fix fatal (duration: 00m 15s) [18:45:17] Logged the message, Master [18:45:28] maybe we should make trusty the default installer now so we only have to add stuff when it's NOT supposed to be trusty [18:45:31] I see... webex clone [18:45:36] in install-server i mean [18:45:59] mutante: has mw servers adapted to trusty? [18:46:17] historically we tend to roll the default to match that, dunno if its intentional or merely due to that being the largest mass of service group servers [18:46:34] matanya, I tried to join public room 2 using chrome -- all I get is a white screen [18:46:38] or intentional due to the latter, whatevs heh =] [18:46:40] RobH: no, i think not. you're right [18:46:44] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [18:46:50] mwalker: it takes time [18:46:53] java crap [18:46:58] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf14 again [18:47:04] Logged the message, Master [18:47:06] mutante: but i agree these days every misc server is trusty [18:47:09] matanya, ah... let me use a java enabled browser then [18:47:15] it seems [18:48:47] greg-g, want to join us in a webex clone? it only requires flash? [18:52:06] (03CR) 10Ottomata: RT 7858: datasets Apache and Puppet edits. (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147226 (owner: 10Scottlee) [18:53:05] (03CR) 10Ottomata: "Hi you two!" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/127804 (owner: 10CSteipp) [18:54:12] Coren and/or RobH, come join us? http://208.80.155.185:5080/openmeetings/#room/ <- takes a long time to load [18:54:51] (03CR) 10Dzahn: [C: 032] phab - small lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/148288 (owner: 10Dzahn) [19:01:38] andrewbogott: i click and it wants me to register, whats this for? [19:01:52] hangout alternative for 16+ participants? [19:01:54] RobH: this is a labs-hosted videoconf solution that Matanya set up [19:02:01] yeah, trying a bit of a stress test [19:02:04] although we won't make it to 16 [19:02:05] never loads for me [19:02:11] http://208.80.155.185:5080/openmeetings/#room/15 [19:02:15] registered [19:02:16] chasemp: it takes like ~1 minute of staring at a blank screen [19:02:18] before anything happens [19:02:24] (03PS1) 10Dzahn: wikimedia.org - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/148437 [19:02:25] urgh, due to being on vm or ? [19:02:28] I gave up several times [19:02:37] I'm not sure if the delay is client or server. [19:02:42] But once you're in perf isn't too bad [19:02:48] seems server from here [19:04:03] are ppl in a room? [19:04:05] 15 is empty [19:04:17] this ui is poop :) [19:04:59] they are spamming RT.. grr [19:05:12] chasemp: there are 5 of us here... [19:05:20] which room? [19:05:25] http://208.80.155.185:5080/openmeetings/#room/2 [19:05:26] merges all those tickets into one [19:07:29] (03PS1) 10Dzahn: switch contacts.wm to misc-web-lb.eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/148438 [19:08:35] (03PS2) 10Dzahn: switch contacts.wm to misc-web-lb.eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/148438 [19:09:02] (03CR) 10Dzahn: "needs Ia87c36051af66" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146823 (owner: 10Dzahn) [19:09:30] chasemp: what's happening now? [19:09:52] just kept trying to make it reload [19:09:56] maybe I wasn't waiting long enough? [19:13:34] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [19:15:11] (03CR) 10Ottomata: [C: 04-1] "researchers and statistics-users should be fine for this, I don't think anything else is necessary." [operations/puppet] - 10https://gerrit.wikimedia.org/r/144994 (owner: 10JanZerebecki) [19:18:14] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Tue 22 Jul 2014 17:17:14 UTC [19:19:12] (03CR) 10Dzahn: "confirmed UID is LDAP user, but we need to confirm the key some other way, would you mind signing with gpg pub 4096R/9A6B07CC ? expires " [operations/puppet] - 10https://gerrit.wikimedia.org/r/144994 (owner: 10JanZerebecki) [19:25:55] ori: would it be possible to add some indexes to the NavigationTiming tables? [19:26:32] tgr: yes; file a bug? [19:26:34] i'd be happy to do it [19:26:45] ok, thanks [19:36:48] (03PS1) 10Ottomata: Install jq on analytics clients [operations/puppet] - 10https://gerrit.wikimedia.org/r/148443 [19:36:55] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Tue Jul 22 19:36:53 UTC 2014 [19:43:35] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [19:43:35] andrewbogott: I saw a ping, something about default settings and mailman? [19:44:12] Thehelpfulone: Yeah, not sure if this is a mailman thing or a gmail thing. Basically when I send an email to a mailing list I want to get it back from the list. [19:44:29] So that when I browse the label assigned to that list, it's not missing the bits of the conversation that are from me. [19:44:38] I might venture that /everyone/ wants that :) [19:45:56] I think Pine was mentioning some problems with getting that to work too [19:46:16] when you view the thread in Gmail, doesn't it just show your post? it usually does for me.. [19:46:44] Thehelpfulone: I use Thunderbird. Lemme look in the web gui and see if it's different [19:46:50] sure [19:48:20] Thehelpfulone: you're right, the behavior in the web interface is correct. [19:48:44] So that suggests that not everyone would want the self-mail feature, only people who use an imap client rather than gmail [19:48:47] (03CR) 10Ottomata: [C: 032 V: 032] Install jq on analytics clients [operations/puppet] - 10https://gerrit.wikimedia.org/r/148443 (owner: 10Ottomata) [19:48:47] :( [19:49:42] I think there's another setting that allows you to mail it to yourself too on a list per list basis [19:49:45] I have a filter in gmail which has a label for the mailing list, and if I send to the list, it'll go into the right folder which shows up in Thunderbird too [19:50:11] https://lists.wikimedia.org/mailman/options/wikimedia-l - Receive your own posts to the list? [19:50:11] Ordinarily, you will get a copy of every message you post to the list. If you don't want to receive this copy, set this option to No. [19:50:11] No [19:50:11] Yes [19:50:19] I think that's yes by default (it's on as yes for me) [19:50:30] hm [19:50:41] I wonder if I even have a password in order to view my settings [19:51:56] I think the wikimedia-l mailing list sends it out once in a while for some reason (I don't remember if I set that to do so personally) [19:53:03] Thehelpfulone: yeah, I have 'Receive your own posts to the list?' marked as Yes [19:53:23] yeah I was just about to say - I looked it up on that list for you, is it on wikimedia-l that you're having the problems? [19:53:38] Ops [19:53:50] But, I'm going to try turning off 'Avoid duplicate copies' and see if that improves things [19:54:04] Probably google is messing with me, de-duping my inbox or something [19:54:43] andrewbogott: Internetz say "Gmail is just about the only mail server which follows the RFC for SMTP correctly and will discard duplicate emails based on the SMTP ID" [19:55:15] There was a thread about this on one of the many mailing lists not too long ago [19:55:17] bd808: ok, but it's not that I'm only getting 1 copy, I'm getting 0 [19:55:25] I have to look in my 'sent' box to see what I wrote [19:55:30] You have a copy in your outbox [19:55:51] They dedup very strictly [19:55:55] yeah :( [19:56:01] OK, so, anyway, clearly not a mailman issue. [19:56:19] Mailman sould technically be changing the SMTP id header [19:56:21] *should [19:58:49] http://puppetlabs.com/gif-contest/sysadmin-appreciation-day-2014 [19:59:18] https://puppetlabs.com/meme/making-changes-without-rfc [19:59:24] puppet memes :p [20:03:37] (03PS1) 10Jgreen: hack horribly to make it possible for coders to read ocg-related logs, ref. RT 7596 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148446 [20:05:33] (03PS2) 10Jgreen: hack horribly to make it possible for coders to read ocg-related logs, ref. RT 7596 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148446 [20:06:20] (03PS3) 10Jgreen: hack horribly to make it possible for coders to read ocg-related logs, ref. RT 7596 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148446 [20:06:28] (03CR) 10Mwalker: [C: 031] hack horribly to make it possible for coders to read ocg-related logs, ref. RT 7596 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148446 (owner: 10Jgreen) [20:12:55] (03CR) 10Jgreen: [C: 032 V: 031] hack horribly to make it possible for coders to read ocg-related logs, ref. RT 7596 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148446 (owner: 10Jgreen) [20:17:19] when that labs project that does statistics on code submits gives score, it should give a bonus for .. where commit_message like "%hack%" or commit_message like "%horrib%".. [20:43:44] (03PS1) 10Jgreen: alternate horrible hack to make ocg-related logs readable to coders [operations/puppet] - 10https://gerrit.wikimedia.org/r/148457 [20:44:59] (03CR) 10Mwalker: [C: 031] alternate horrible hack to make ocg-related logs readable to coders [operations/puppet] - 10https://gerrit.wikimedia.org/r/148457 (owner: 10Jgreen) [20:47:42] (03PS4) 10Scottlee: RT 7858: datasets Apache and Puppet edits. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147226 [20:48:45] PROBLEM - puppetmaster https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:49:22] (03CR) 10Jgreen: [C: 032 V: 031] alternate horrible hack to make ocg-related logs readable to coders [operations/puppet] - 10https://gerrit.wikimedia.org/r/148457 (owner: 10Jgreen) [20:49:35] RECOVERY - puppetmaster https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.014 second response time [20:53:25] (03PS1) 10Jgreen: adjust logrotate for ocg to create 644 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148461 [20:55:08] (03CR) 10Jgreen: [C: 032 V: 031] adjust logrotate for ocg to create 644 [operations/puppet] - 10https://gerrit.wikimedia.org/r/148461 (owner: 10Jgreen) [21:00:04] The time is nigh to deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140722T2100) [21:16:05] who maintains our redis servers? [21:19:10] actually; my question is a better one for the ops list; sending it there [21:19:55] mwalker: git log modules/redis/manifests/init.pp :) [21:20:03] qchris,yuvi,otto? [21:20:39] * qchris only fixed some things there, some wikimetrics can use puppet's module again. [21:21:08] yeah, I only touched some things there, to make it work with dynamicproxy [21:21:14] :-D [21:21:18] I think the origin of redis fever can be traced to ori and AaronSchulz? [21:22:41] we could make a script for this? [21:23:06] enter: service name , output = checked who wrote most of it in git [21:43:06] (03PS1) 10Ori.livneh: wmflib: add apt_version() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148512 [21:49:03] can someone help me with what appears to be a DNS request fail problem? [21:49:26] ns0 won't respond to some requests coming from the office. [21:49:34] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:52:19] !log reedy Synchronized php-1.24wmf13/extensions/WikimediaMessages/: Fix fatal for dumps (duration: 00m 13s) [21:52:25] Logged the message, Master [21:57:03] !log reedy Synchronized php-1.24wmf14/extensions/WikimediaMessages/: Fix fatal for dumps (duration: 00m 15s) [21:57:08] Logged the message, Master [22:05:58] (03PS8) 10Dzahn: Add puppet module for a tor relay [operations/puppet] - 10https://gerrit.wikimedia.org/r/140948 [22:06:22] cmjohnson1: can you help me with https://rt.wikimedia.org/Ticket/Display.html?id=7974 ? [22:06:43] (03CR) 10Dzahn: "PS8: AvoidDiskWrites 1 because we're gonna be using an SSD" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140948 (owner: 10Dzahn) [22:09:54] cajoel: yep..looking at it [22:10:25] if you need me to recreate requests ping me [22:10:53] I can fire them off easily, and/or give you a shell account inside the office (if you have vpn) [22:17:42] (03PS2) 10Dzahn: phab-login screen HTML-replace deprecated HTML [operations/puppet] - 10https://gerrit.wikimedia.org/r/147640 [22:18:14] (03PS3) 10Dzahn: phab-login screen, login message and old HTML [operations/puppet] - 10https://gerrit.wikimedia.org/r/147640 [22:25:48] Downloading refs/changes/99/122399/4 from gerrit [22:25:49] Traceback (most recent call last): File "/usr/local/bin/git-review", line 10 [22:25:58] ...git_review.cmd.CheckoutNewBranchFailed: Cannot checkout to new branch [22:26:01] grrr [22:34:52] cmjohnson1: anything amiss on ns0? [22:35:29] ns0 looks fine...can't seem to hit the office ip though [22:36:07] just it or all of eqiad? [22:36:36] just ns0 [22:37:01] bblack can you help out cajoel...i have to get going [22:37:12] yeah ok [22:37:28] thx [22:37:34] I can't make DNS queries against ns0 or ns1 from office IP space [22:37:41] can do it fine to ns2 [22:38:37] cajoel: any idea if this a truly-new issue or something ongoing forever that's just being noticed? [22:38:51] can't say [22:38:59] probably has been around [22:41:33] (03PS5) 10Dzahn: include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 [22:45:34] (03PS6) 10Reedy: include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 (owner: 10Dzahn) [22:47:43] (03PS7) 10Dzahn: include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 [22:47:56] lol, damn it [22:49:43] (03PS8) 10Reedy: include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 (owner: 10Dzahn) [22:51:46] thanks Reedy :) [22:52:38] (03PS9) 10Dzahn: include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 [22:53:34] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [22:55:06] (03CR) 10Dzahn: [C: 032] include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 (owner: 10Dzahn) [22:57:41] (03CR) 10Dzahn: "yea, but that is unlikely to be changed soon" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148293 (owner: 10Dzahn) [22:58:22] (03CR) 10Dzahn: [C: 031] "last file with tabs, come on :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148293 (owner: 10Dzahn) [22:59:43] (03PS2) 10Dzahn: the last tab char in any .pp file !? [operations/puppet] - 10https://gerrit.wikimedia.org/r/148295 [23:00:04] The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140722T2300) [23:04:25] (03PS3) 10Dzahn: put contacts.wm.org behind misc. varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/146823 [23:05:05] (03PS4) 10Dzahn: put contacts.wm.org behind misc. varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/146823 [23:06:25] looks like i'm the only one in SWAT, can just deploy it myself? [23:06:35] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Puppet has 1 failures [23:07:35] jouncebot says yes [23:08:50] (03CR) 10Dzahn: [C: 032] put contacts.wm.org behind misc. varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/146823 (owner: 10Dzahn) [23:10:00] * YuviPanda pokes Coren with https://gerrit.wikimedia.org/r/#/c/148133/ [23:10:03] trivial merge? [23:10:20] curls yuvi [23:10:40] (03CR) 10coren: [C: 032] "Same as prod, okay by me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/148133 (owner: 10Yuvipanda) [23:11:00] mutante: better specify a UA now :) [23:11:36] UserAgent YuviBot [23:17:05] (03PS1) 10Dzahn: contacts.wm - remove SSL vhost,now behind varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/148541 [23:19:58] (03CR) 10Dzahn: [C: 032] contacts.wm - remove SSL vhost,now behind varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/148541 (owner: 10Dzahn) [23:20:28] !log ebernhardson Started scap: Update flow in wmf/1.24wmf14 [23:20:33] Logged the message, Master [23:20:46] (03CR) 10Dzahn: [C: 032] switch contacts.wm to misc-web-lb.eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/148438 (owner: 10Dzahn) [23:21:35] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [23:23:34] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [23:27:30] (03PS1) 10Ori.livneh: apache: add apache::mpm [operations/puppet] - 10https://gerrit.wikimedia.org/r/148542 [23:37:00] (03PS1) 10MaxSem: HHVM: log warnings and stacktraces [operations/puppet] - 10https://gerrit.wikimedia.org/r/148544 [23:37:37] !log ebernhardson Finished scap: Update flow in wmf/1.24wmf14 (duration: 17m 08s) [23:37:42] Logged the message, Master [23:39:35] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [23:55:56] (03PS1) 10MaxSem: Fix a couple warnings in beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/148552