[00:00:53] can we fix that it will revert to the version packaged? that sounds like we have a file that isnt puppetized [00:01:09] (03CR) 10Ori.livneh: svn - move Apache config from file to template (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153989 (owner: 10Dzahn) [00:01:32] (03PS1) 10Dzahn: Revert "openstack Apache conf, also listen on port 80" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153992 [00:01:37] mutante: yeah. let me look (i won't touch anything). is it virt1001? [00:01:54] (03PS1) 10Dzahn: Revert "openstack - use apache::conf for port" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153993 [00:02:00] (03CR) 10jenkins-bot: [V: 04-1] Revert "openstack - use apache::conf for port" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153993 (owner: 10Dzahn) [00:02:20] ori: no, should be virt1000 [00:02:24] and virt0 [00:02:25] ah, right [00:02:47] heheh [00:02:47] # This file is managed by Puppet! [00:02:49] lies! :D [00:02:56] s/is/was/ [00:03:25] :p [00:03:30] ori: https://bugzilla.wikimedia.org/show_bug.cgi?id=68535 [00:05:21] (03PS3) 10Dzahn: limn - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 [00:05:35] (03CR) 10Dzahn: limn - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 (owner: 10Dzahn) [00:07:17] hm, i think it's modules/puppetmaster/templates/ports.conf.erb [00:07:42] provisioned by modules/puppetmaster/manifests/passenger.pp [00:09:34] (03PS1) 10Dzahn: OTRS - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153998 [00:10:48] ori: ah, then same question as on prod. puppetmaster, how to replace that logic or keep the ports.conf.erb [00:13:59] (03CR) 10Ori.livneh: puppetmaster - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [00:15:45] (03PS2) 10Ori.livneh: puppetmaster - use apache::site & apache::conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [00:16:56] (03PS3) 10Dzahn: puppetmaster - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 [00:17:16] gerrit conflict :) [00:17:37] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [00:17:52] it's going to be a tad tricky to merge [00:18:15] because we also need to reset ports.conf to its original content [00:18:35] which is https://dpaste.de/4yK9/raw btw [00:18:53] (03PS4) 10Dzahn: puppetmaster - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 [00:19:32] mutante: mind if i amend the patch? [00:20:15] not at all [00:23:44] cya later, i'm gonna run for now, amend all you like [00:24:54] (03PS5) 10Ori.livneh: puppetmaster - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [00:24:59] cya [00:25:26] i'll merge/deploy [00:25:29] (03CR) 10Ori.livneh: [C: 032 V: 032] puppetmaster - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [00:26:35] (the orig thing is temp. btw) [00:35:05] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: Puppet has 1 failures [00:35:15] PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:35:15] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:35:45] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: Puppet has 1 failures [00:35:55] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Puppet has 2 failures [00:35:55] PROBLEM - puppet last run on search1013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:36:15] PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Puppet has 1 failures [00:36:16] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: Puppet has 1 failures [00:36:25] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:36:25] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [00:36:38] (03PS1) 10Ori.livneh: Unmanage ports.conf, following I0e4aa8800 [operations/puppet] - 10https://gerrit.wikimedia.org/r/154003 [00:36:45] PROBLEM - puppet last run on mw1167 is CRITICAL: CRITICAL: Puppet has 1 failures [00:38:42] (03CR) 10Ori.livneh: [C: 032] Unmanage ports.conf, following I0e4aa8800 [operations/puppet] - 10https://gerrit.wikimedia.org/r/154003 (owner: 10Ori.livneh) [00:39:05] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:39:29] (03CR) 10Ori.livneh: "Follow-up change: https://gerrit.wikimedia.org/r/#/c/154003/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [00:50:45] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:52:15] RECOVERY - puppet last run on wtp1007 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:52:15] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:52:16] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:52:25] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:52:25] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:52:45] RECOVERY - puppet last run on mw1167 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:52:56] RECOVERY - puppet last run on search1013 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:53:15] RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:53:55] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [01:00:05] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 22:59:42 UTC [01:17:54] (03PS1) 10Yurik: Additional python packages for stats servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/154004 [02:17:37] !log LocalisationUpdate completed (1.24wmf15) at 2014-08-14 02:16:34+00:00 [02:17:45] Logged the message, Master [02:19:55] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Thu Aug 14 02:19:46 UTC 2014 [02:30:55] !log LocalisationUpdate completed (1.24wmf16) at 2014-08-14 02:29:52+00:00 [02:31:01] Logged the message, Master [03:12:39] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Aug 14 03:11:33 UTC 2014 (duration 11m 32s) [03:12:45] Logged the message, Master [03:57:24] (03PS1) 10Legoktm: mwgrep: Sort results before printing them [operations/puppet] - 10https://gerrit.wikimedia.org/r/154011 [03:57:30] helderwiki: ^ [04:01:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:03:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:05:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:07:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:07:57] (03CR) 10Helder.wiki: [C: 031] mwgrep: Sort results before printing them [operations/puppet] - 10https://gerrit.wikimedia.org/r/154011 (owner: 10Legoktm) [04:09:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:11:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:13:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:15:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:17:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 03:57:50 UTC [04:18:35] RECOVERY - Puppet freshness on ocg1002 is OK: puppet ran at Thu Aug 14 04:18:32 UTC 2014 [04:20:14] PROBLEM - Puppet freshness on ocg1002 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 04:18:32 UTC [04:38:20] RECOVERY - Puppet freshness on ocg1002 is OK: puppet ran at Thu Aug 14 04:38:13 UTC 2014 [05:12:12] (03PS1) 10Springle: mariadb 10 config for sanitarium [operations/puppet] - 10https://gerrit.wikimedia.org/r/154013 [05:13:40] (03CR) 10Springle: [C: 032] mariadb 10 config for sanitarium [operations/puppet] - 10https://gerrit.wikimedia.org/r/154013 (owner: 10Springle) [06:29:02] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:10] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:20] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:30] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:00] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:01] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:10] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:41] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:00] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:01] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:20] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:20] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:31] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:31] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [06:44:31] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:45:30] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:45:30] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:46:00] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:01] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:46:01] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:46:10] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:46:10] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:46:41] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:47:00] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:47:00] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:47:20] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [07:23:41] (03PS21) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [07:41:12] (03CR) 10QChris: "Since the change does not come with 'ensure => absent' for" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 (owner: 10Dzahn) [07:46:30] (03PS22) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [07:54:37] (03CR) 10QChris: [C: 04-1] gerrit - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 (owner: 10Dzahn) [08:01:04] (03CR) 10QChris: [C: 04-1] "Since the change does not come with 'ensure => absent' for the old conf files (and they differ in name with the new conf files) ... I gues" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 (owner: 10Dzahn) [08:02:02] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:04:02] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:06:05] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:08:02] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:09:37] <_joe_> !log reactivated the jobrunner on mw1053, with promising results. Puppettization pending (in ~ 1 hour) [08:09:41] Logged the message, Master [08:10:02] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:12:02] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:14:02] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 06:13:14 UTC [08:14:02] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:14:03] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Thu Aug 14 08:13:59 UTC 2014 [08:16:02] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:18:02] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 07:59:17 UTC [08:19:32] RECOVERY - Puppet freshness on search1014 is OK: puppet ran at Thu Aug 14 08:19:18 UTC 2014 [08:20:23] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Epic puppet fail [08:34:46] (03PS1) 10Nuria: Bumping up version of wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/154019 [08:39:29] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [09:15:23] (03PS1) 10Giuseppe Lavagetto: jobrunner: puppetize HAT runner [operations/puppet] - 10https://gerrit.wikimedia.org/r/154021 [09:20:38] (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunner: puppetize HAT runner [operations/puppet] - 10https://gerrit.wikimedia.org/r/154021 (owner: 10Giuseppe Lavagetto) [09:22:29] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet last ran 663087 seconds ago, expected 14400 [09:23:30] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [09:25:19] RECOVERY - Puppet freshness on mw1053 is OK: puppet ran at Thu Aug 14 09:25:10 UTC 2014 [09:30:44] <_joe_> !log the hhvm jobrunner is back in production, seems healthy, see https://logstash.wikimedia.org/#/dashboard/elasticsearch/hhvm_jobrunner [09:30:49] Logged the message, Master [09:32:09] PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 18746 MB (3% inode=99%): [09:34:09] RECOVERY - Disk space on elastic1017 is OK: DISK OK [10:02:03] (03PS1) 10Ori.livneh: apache::conf: append trailing newline if missing [operations/puppet] - 10https://gerrit.wikimedia.org/r/154026 [10:02:05] (03PS1) 10Ori.livneh: Fix-ups for I3d002968c [operations/puppet] - 10https://gerrit.wikimedia.org/r/154027 [10:04:42] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "* trailing whitespaces are being removed as I move it to a template (wait for it :))" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154027 (owner: 10Ori.livneh) [10:05:55] <_joe_> why remove serveradmin, btw? [10:09:47] (03PS1) 10Giuseppe Lavagetto: jobrunner: parametrize the port for the fcgi interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/154028 [10:10:27] (03CR) 10jenkins-bot: [V: 04-1] jobrunner: parametrize the port for the fcgi interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/154028 (owner: 10Giuseppe Lavagetto) [10:11:44] _joe_: set in default.conf [10:12:12] you picked it, iirc :P [10:12:17] <_joe_> ori: oh, lol [10:12:42] <_joe_> got by my own OCD [10:14:05] <_joe_> (it makes perfect sense btw) [10:15:52] (03PS2) 10Giuseppe Lavagetto: jobrunner: parametrize the port for the fcgi interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/154028 [10:19:07] (03CR) 10Giuseppe Lavagetto: [C: 031] "Makes sense." [operations/puppet] - 10https://gerrit.wikimedia.org/r/154026 (owner: 10Ori.livneh) [10:19:49] (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunner: parametrize the port for the fcgi interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/154028 (owner: 10Giuseppe Lavagetto) [10:31:42] (03PS1) 10Giuseppe Lavagetto: hhvm: add Provides: php5 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/154030 [10:31:58] (03PS2) 10Ori.livneh: apache::conf: append trailing newline if missing [operations/puppet] - 10https://gerrit.wikimedia.org/r/154026 [10:32:06] (03CR) 10Ori.livneh: [C: 032 V: 032] apache::conf: append trailing newline if missing [operations/puppet] - 10https://gerrit.wikimedia.org/r/154026 (owner: 10Ori.livneh) [10:33:23] (03CR) 10Ori.livneh: jobrunner: parametrize the port for the fcgi interface (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/154028 (owner: 10Giuseppe Lavagetto) [10:35:46] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Epic puppet fail [10:43:12] (03CR) 10Giuseppe Lavagetto: [C: 031] rcstream: make lvs health check fetch /nginx_status [operations/puppet] - 10https://gerrit.wikimedia.org/r/145997 (https://bugzilla.wikimedia.org/67957) (owner: 10Ori.livneh) [10:47:27] (03CR) 10Ori.livneh: Nutcracker: move declaration to role::mediawiki; parametrize (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 (owner: 10Ori.livneh) [10:49:06] (03CR) 10Giuseppe Lavagetto: [C: 031] Nutcracker: move declaration to role::mediawiki; parametrize (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 (owner: 10Ori.livneh) [10:51:23] (03PS1) 10Giuseppe Lavagetto: jobrunner: use mpm_worker instead of mpm_prefork [operations/puppet] - 10https://gerrit.wikimedia.org/r/154032 [10:52:05] (03PS2) 10Giuseppe Lavagetto: jobrunner: use mpm_worker instead of mpm_prefork [operations/puppet] - 10https://gerrit.wikimedia.org/r/154032 [10:52:15] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] jobrunner: use mpm_worker instead of mpm_prefork [operations/puppet] - 10https://gerrit.wikimedia.org/r/154032 (owner: 10Giuseppe Lavagetto) [10:55:51] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [10:56:28] (03PS1) 10Giuseppe Lavagetto: apache: change mpm dependency to work on trusty as well. [operations/puppet] - 10https://gerrit.wikimedia.org/r/154033 [10:56:31] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Epic puppet fail [10:56:40] (03PS2) 10Giuseppe Lavagetto: apache: change mpm dependency to work on trusty as well. [operations/puppet] - 10https://gerrit.wikimedia.org/r/154033 [10:57:28] (03CR) 10Giuseppe Lavagetto: [C: 032] apache: change mpm dependency to work on trusty as well. [operations/puppet] - 10https://gerrit.wikimedia.org/r/154033 (owner: 10Giuseppe Lavagetto) [11:00:30] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [11:15:44] (03CR) 10Danny B.: "Is there anything else necessary but running the rebuild script to have this working?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140580 (owner: 10Danny B.) [11:24:31] (03PS1) 10Ori.livneh: wmflib: add ensure_final_newline() [operations/puppet] - 10https://gerrit.wikimedia.org/r/154035 [12:00:30] manybubbles|away ^d I noticed elastic1008 has registered itself with 127.0.1.1 instead of the actual ip, have you seen this before? (it matches what's in /etc/hosts so I suspect something didn't change it) [12:04:53] godog: yeah - we noticed it. I don't think its actually hurting anything but it is funky [12:10:13] (03CR) 10Manybubbles: "Anomie and Reedy, I remember category collation was something we had to be careful with. There is a script we have to run and it doesn't" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147922 (https://bugzilla.wikimedia.org/67287) (owner: 10Bartosz Dziewoński) [12:12:46] !log cirrus index rebuilds are still proceeding without issue. Going to continue to let them run and keep half an eye on them. enwiki is nearly done. Commons and wikidata are done. Many of group1 are done - we're up to eswiktionary now - but there are many to go. [12:12:52] Logged the message, Master [12:17:56] (03PS2) 10Ori.livneh: mwgrep: Sort results before printing them [operations/puppet] - 10https://gerrit.wikimedia.org/r/154011 (owner: 10Legoktm) [12:18:33] (03CR) 10Ori.livneh: [C: 032 V: 032] mwgrep: Sort results before printing them [operations/puppet] - 10https://gerrit.wikimedia.org/r/154011 (owner: 10Legoktm) [12:19:57] (03CR) 10Hashar: [V: 031] "Following the cherry pick on beta cluster puppetmaster, the jobrunner01 instance processed job again." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152931 (https://bugzilla.wikimedia.org/69272) (owner: 10BryanDavis) [12:20:26] (03CR) 10Ori.livneh: [C: 032] beta: Set runners_* for role::beta::jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/152931 (https://bugzilla.wikimedia.org/69272) (owner: 10BryanDavis) [12:23:29] (03CR) 10Ori.livneh: [C: 031] svn - move Apache config from file to template [operations/puppet] - 10https://gerrit.wikimedia.org/r/153989 (owner: 10Dzahn) [12:23:57] (03CR) 10Ori.livneh: [C: 031] limn - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 (owner: 10Dzahn) [12:24:54] (03CR) 10Ori.livneh: [C: 04-1] download.wm.org - use apache::site method (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 (owner: 10Dzahn) [12:25:19] (03CR) 10Hashar: "That sync the colors with scap.py :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152063 (owner: 10Hashar) [12:34:40] (03CR) 10Ori.livneh: [C: 031] contint-use apache::site,move config to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 (owner: 10Dzahn) [12:36:00] (03CR) 10Ori.livneh: [C: 04-1] performance.wm.org - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153953 (owner: 10Dzahn) [12:36:17] (03CR) 10Ori.livneh: [C: 031] graphite - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153952 (owner: 10Dzahn) [12:36:57] (03CR) 10Ori.livneh: [C: 031] tendril - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153951 (owner: 10Dzahn) [12:37:32] (03PS7) 10Giuseppe Lavagetto: Nutcracker: move declaration to role::mediawiki; parametrize [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 (owner: 10Ori.livneh) [12:37:57] (03CR) 10Ori.livneh: [C: 031] "Looks OK, but be careful with this one -- the setup is fragile." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153955 (owner: 10Dzahn) [12:38:19] <_joe_> !log stopping puppet on appservers while deploying a delicate change. [12:38:25] Logged the message, Master [12:38:44] (03CR) 10Giuseppe Lavagetto: [C: 032] Nutcracker: move declaration to role::mediawiki; parametrize [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 (owner: 10Ori.livneh) [12:42:01] (03CR) 10Hashar: "Puppet compilation for gallium.wikimedia.org and lanthanum.eqiad.wmnet:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 (owner: 10Dzahn) [12:42:08] (03CR) 10Hashar: [C: 04-1] contint-use apache::site,move config to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 (owner: 10Dzahn) [12:43:58] _joe_: how does it look? [12:44:11] <_joe_> ori: good [12:44:19] <_joe_> lemme make a couple of checks [12:46:25] (03PS1) 10Mark Bergsma: Skip spam check if one of the recipients is postmaster@ or abuse@ [operations/puppet] - 10https://gerrit.wikimedia.org/r/154044 [12:48:38] (03PS1) 10Ori.livneh: nutcracker: used ordered_yaml() [operations/puppet] - 10https://gerrit.wikimedia.org/r/154045 [12:52:51] (03PS1) 10Ori.livneh: shell_exports(): sort keys to stabalize output [operations/puppet] - 10https://gerrit.wikimedia.org/r/154048 [12:54:27] (03CR) 10Ori.livneh: [C: 032] "(trivial)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154048 (owner: 10Ori.livneh) [12:54:30] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: used ordered_yaml() [operations/puppet] - 10https://gerrit.wikimedia.org/r/154045 (owner: 10Ori.livneh) [13:02:19] (03CR) 10QChris: "Is this change still needed?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151095 (owner: 10Ottomata) [13:28:30] (03CR) 10Ori.livneh: [C: 032] beta: fix ansi escapes for wmf-beta-autoupdater [operations/puppet] - 10https://gerrit.wikimedia.org/r/152063 (owner: 10Hashar) [13:32:32] (03CR) 10Mark Bergsma: [C: 032] Skip spam check if one of the recipients is postmaster@ or abuse@ [operations/puppet] - 10https://gerrit.wikimedia.org/r/154044 (owner: 10Mark Bergsma) [13:42:42] manybubbles: ack, will take a look because of wrong /etc/hosts [13:43:08] godog: I don't think its hurting anything - like super low priority [13:49:14] manybubbles: indeed, I'm curious what should have worked but didn't :) [13:54:49] hi guys [13:55:35] how would i go about getting some sort of aggregate access logs for bits.wm/skins/common/* ? [13:55:57] this is related to https://bugzilla.wikimedia.org/show_bug.cgi?id=69277 , i'm wondering which things in that directory are actually used, and exactly how used they are [14:01:04] <_joe_> !log puppet re-enabled on the appserver [14:01:10] Logged the message, Master [14:04:22] (03CR) 10Ottomata: "Not needed, but I'm not sure what to do with it. Its a good script!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151095 (owner: 10Ottomata) [14:06:34] (03PS2) 10Ottomata: Bumping up version of wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/154019 (owner: 10Nuria) [14:06:40] (03CR) 10Ottomata: [C: 032 V: 032] Bumping up version of wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/154019 (owner: 10Nuria) [14:11:32] (03PS2) 10Giuseppe Lavagetto: hhvm: add Provides: php5 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/154030 [14:12:46] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] hhvm: add Provides: php5 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/154030 (owner: 10Giuseppe Lavagetto) [14:22:41] btw found ~15 hosts with the same wrong entry in /etc/hosts, tracked in RT #8130 [14:22:52] so who do i poke about the logs? [14:23:25] MatmaRex: uh, mh [14:41:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:43:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:45:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:45:21] (03PS1) 10Ottomata: Override group permissions on /etc/send_nsca.cfg on Hadoop worker nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/154059 [14:45:23] oof, mutante ^ [14:47:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:49:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:51:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:53:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:55:07] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:55:36] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet has 2 failures [14:57:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:38:02 UTC [14:57:32] hey chasemp: https://gerrit.wikimedia.org/r/#/c/154059/ [14:57:35] thoughts? [14:58:05] RECOVERY - Puppet freshness on cp4013 is OK: puppet ran at Thu Aug 14 14:57:58 UTC 2014 [14:58:20] ottomata: have a meeting in 2 minutes and then I'll look? [14:58:27] sure [14:58:40] (03PS2) 10Ottomata: Additional python packages for stats servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/154004 (owner: 10Yurik) [14:58:45] (03CR) 10Ottomata: [C: 032 V: 032] Additional python packages for stats servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/154004 (owner: 10Yurik) [14:59:22] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154062 [14:59:24] (03PS1) 10Reedy: testwiki to 1.24wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154063 [14:59:27] (03PS1) 10Reedy: Wikipedias to 1.24wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154064 [14:59:29] (03PS1) 10Reedy: group0 to 1.24wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154065 [14:59:54] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154062 (owner: 10Reedy) [14:59:58] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154062 (owner: 10Reedy) [15:00:04] manybubbles, anomie, Reedy: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140814T1500). Please do the needful. [15:00:05] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 14:57:58 UTC [15:00:20] (03CR) 10Giuseppe Lavagetto: [C: 031] "Puppet compiler results for 4 nodes:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [15:00:23] * anomie sees no patches for SWAT this morning [15:13:07] PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:13:17] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:13:18] PROBLEM - check if dhclient is running on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:13:27] PROBLEM - SSH on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:13:27] PROBLEM - nutcracker process on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:13:37] PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:13:37] PROBLEM - nutcracker port on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:13:57] PROBLEM - Disk space on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:14:01] _joe_: That you? [15:14:36] Reedy: It's not in dsh, so not a blocker [15:14:42] I know [15:14:49] <_joe_> Reedy: no [15:14:52] lol [15:14:58] <_joe_> it seems the host hjust died [15:15:04] :( [15:15:16] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154063 (owner: 10Reedy) [15:15:22] (03Merged) 10jenkins-bot: testwiki to 1.24wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154063 (owner: 10Reedy) [15:15:41] !log reedy Started scap: testwiki to 1.24wmf17 [15:15:47] Logged the message, Master [15:15:49] * Reedy twiddles his thumbs [15:16:39] <_joe_> going in console, give me 10 mins [15:17:57] <_joe_> the server is live [15:17:59] <_joe_> ... [15:18:07] RECOVERY - Puppet freshness on cp4013 is OK: puppet ran at Thu Aug 14 15:17:59 UTC 2014 [15:23:49] <_joe_> !log powercycling mw1053, which looks like the victim of hhvm-induced ooms [15:23:54] Logged the message, Master [15:25:17] PROBLEM - NTP on mw1053 is CRITICAL: NTP CRITICAL: No response from NTP server [15:26:17] RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [15:26:17] RECOVERY - nutcracker process on mw1053 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker [15:26:28] RECOVERY - nutcracker port on mw1053 is OK: TCP OK - 0.000 second response time on port 11212 [15:26:28] RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output [15:26:48] RECOVERY - Disk space on mw1053 is OK: DISK OK [15:26:58] RECOVERY - DPKG on mw1053 is OK: All packages OK [15:27:07] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [15:27:07] RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient [15:27:56] <_joe_> Reedy: you don't need to stop deployment, btw [15:28:14] _joe_: He's aware [15:28:27] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [15:28:28] I've removed that host from dsh on Tuesday or so [15:28:41] <_joe_> mh why? [15:28:49] <_joe_> it's part of the jobrunners group [15:28:59] <_joe_> they do get software updates don't they [15:29:03] _joe_: it served no load and scap was hanging on it [15:29:08] yep, they should [15:29:21] <_joe_> ok so I guess I know why scap was hanging [15:30:07] RECOVERY - NTP on mw1053 is OK: NTP OK: Offset -0.0007432699203 secs [15:31:10] sync-common: 67% (ok: 151; fail: 0; left: 76) [15:35:18] <_joe_> hoo: put it back into scap [15:35:29] <_joe_> it should work [15:35:39] <_joe_> if it doesn't, we have to understand why [15:35:39] REVERT [15:35:45] _joe_: needs sync common first [15:35:47] will do [15:36:04] (03PS1) 10Reedy: Revert "Remove mw1053 from mediawiki-installation dsh" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154069 [15:36:20] Reedy: Are you scapping now? [15:36:30] yup [15:36:32] it's still going [15:36:36] sync-common: 94% (ok: 214; fail: 0; left: 13) [15:36:43] we probably want to sync-common after scap has it's stuff rebuild [15:36:47] yeah [15:36:51] seems sensible [15:37:05] sync-common: 99% (ok: 225; fail: 0; left: 2) [15:37:35] building l10n cache now [15:42:06] scap-rebuild-cdbs: 78% (ok: 176; fail: 0; left: 51) [15:44:04] 1 to go [15:44:08] (03CR) 10Jgreen: [C: 032 V: 031] allow DKIM wiki-mail._domainkey for use with localpart wiki*@ instead of just wiki@ [operations/dns] - 10https://gerrit.wikimedia.org/r/153830 (owner: 10Jgreen) [15:47:17] !log adjust wiki-mail._domainkey DNS record to allow sending from 'wiki*@" addresses, instead of just wiki@ [15:47:23] Logged the message, Master [15:47:25] sooo. who do i poke about the aggregate logs? [15:47:59] MatmaRex: nobody, or qchris_meeting [15:48:09] (write RT with him in cc?) [15:48:40] Jeff_Green: are/will such addresses in use anywhere? [15:48:55] !log reedy Finished scap: testwiki to 1.24wmf17 (duration: 33m 13s) [15:49:00] Logged the message, Master [15:49:20] !log Running sync-common on mw1053 [15:49:25] Logged the message, Master [15:49:34] Nemo_bis: wikis currently send with envelope sender wiki@wikimedia.org, we're getting ready to switch to a VERP scheme [15:49:41] (03CR) 10BBlack: [C: 031] Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [15:49:59] the new address scheme will be wiki-{blahblahblah}@wikimedia.org [15:50:12] right :) [15:50:18] Jeff_Green: have you considered en.wiki currently uses a unique config? [15:50:32] define unique? [15:51:32] 'wmgNotificationSender' => array( 'default' => 'wiki@wikimedia.org', 'enwiki' => 'no-reply-notifications@wikipedia.org', [15:51:35] ), [15:52:04] [which is horrible btw] [15:52:19] what's that for? [15:52:28] No idea if it matters, just the From: of Echo emails IIRC [15:52:50] Used to be worse: https://bugzilla.wikimedia.org/show_bug.cgi?id=58261 [15:53:22] tonythomas: ^^^ [15:53:42] unique config [15:53:46] that en.wiki line can just be removed from config if it causes problems, it never had any known rationale [15:54:04] * Jeff_Green looking at mail config to see what that does [15:55:11] we are not altering the From address, as far as I know [15:55:59] tonythomas: but in the old scheme From == envelope right? [15:56:03] just asking exim to copy our $wgPasswordSender altered to the retunr path I think [15:57:01] legoktm tells me that the $wgPassword sender is kept to wiki@wikimedia.org [15:57:07] in wmf wikis [15:57:46] if I'm reading the exim config correctly we blackhole bounces to no-reply-notifications@, essentially same as bounces to wiki@ [15:58:07] and in the UserMailer.php [15:58:09] $returnPath = $from->address; [15:58:27] the $from takes in as the $wgPasswordSender [15:58:41] and we alter this $returnPath [15:58:53] https://github.com/wikimedia/mediawiki-core/blob/master/includes/UserMailer.php#L243 [15:59:22] fwiw mail is actually going out with no-reply-notifications@ as the envelope sender [16:00:01] I thought everything gets rewritten to wiki@wikimedia.org [16:00:31] it used to be the case, AFAICS https://bugzilla.wikimedia.org/show_bug.cgi?id=58261#c0 [16:00:47] didn't check recently [16:00:49] tonythomas: looking further [16:00:59] _joe_: mw1053 should be good to repool in dsh [16:01:41] Reedy: https://test.wikipedia.org/wiki/Special:CentralLogin/complete?token=877de23819ebfcb74a791dd8b4b46c61 [16:01:46] Unexpected non-MediaWiki exception encountered, of type "BadMethodCallException" [16:01:46] [41185555] /wiki/Special:CentralLogin/complete?token=877de23819ebfcb74a791dd8b4b46c61 Exception from line 166 of /usr/local/apache/common-local/php-1.24wmf17/extensions/CentralAuth/specials/SpecialCentralLogin.php: Call to a member function getId() on a non-object (boolean) [16:01:48] (03PS2) 10Reedy: Revert "Remove mw1053 from mediawiki-installation dsh" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154069 [16:02:01] (03CR) 10Reedy: [C: 031] "sync-common and l10n cache rebuilt on server" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154069 (owner: 10Reedy) [16:02:10] I just clicked on every special page in the list [16:02:11] legoktm: wheeeeee [16:02:16] legoktm: wooot [16:02:17] the fact that every other word is 'wiki' in every config makes reading config maddening [16:02:20] Want a full stack trace? [16:02:27] I have the full trace [16:02:32] #0 /usr/local/apache/common-local/php-1.24wmf17/extensions/CentralAuth/specials/SpecialCentralLogin.php(33): SpecialCentralLogin->doLoginComplete(string) [16:02:36] Jeff_Green: I feel you [16:02:38] well, that's the only useful part [16:02:53] I don't think this is a new thing though [16:03:55] well [16:04:01] Jeff_Green: anyway, even if we are having this no-reply-notifications@wikipedia.org -- our extension can safely change this thing to wiki-{blahblah}@wikipedia.org right ? [16:04:20] just the return path ? the from address can still remain the same ? [16:04:27] tonythomas: i'm not sure why it's there in the first place [16:04:31] (03PS6) 10Ottomata: Add script and class to manage HDFS user directories [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/153706 [16:04:35] Jeff_Green: ah. true that [16:04:36] lets just kill no-reply-notifications@wikipedia.org [16:04:56] I think so [16:05:07] afaict we sign all mail coming in from webservers with a DKIM record that's only valid for wiki@ anyway, which makes all this extra interesting [16:05:11] hoo: legoktm Are you two loooking into the CA fail? [16:05:22] does anyone have a sample message from enwiki somewhere/ [16:05:41] I'm going through the rest of the special pages first [16:05:43] Jeff_Green: one sec [16:06:04] Jeff_Green: https://dpaste.de/tDKV/raw [16:06:43] and we just change the Return-Path: :D [16:07:10] dkim=neutral (no key) [16:07:30] i think if our DKIM policy was restrictive that would be a fail [16:07:36] Reedy: Not really, am preparing for travel tomorrow [16:09:43] my vote is to burn use of no-reply-notifications@wikimedia.org to the ground, but it would be nice to have some background on why it was ever used [16:10:40] (03CR) 10BBlack: [C: 032] Revert "Remove mw1053 from mediawiki-installation dsh" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154069 (owner: 10Reedy) [16:10:58] Nemo_bis might know? Since it's only enabled on enwp, I think killing it is ok [16:11:03] I think that would also allow us to rip out some gratuitous exim config [16:11:21] * Reedy is git blaming IS [16:12:03] legoktm: as I said, I think it's safe to remove [16:12:34] it might have had some intended purpose one year ago, but with all the spam etc. problems we've had since then it's no longer justified [16:13:06] that really gives an intuition that someone is handling some bounce though :D [16:13:37] Jeff_Green: https://github.com/wikimedia/operations-mediawiki-config/commit/2451b60ac08e03c96a6984697646dbf95e812cea [16:13:49] Echo [16:13:58] https://gerrit.wikimedia.org/r/#/c/59717/ [16:14:09] https://bugzilla.wikimedia.org/show_bug.cgi?id=46670 [16:14:14] blargh [16:14:27] and of course we have two different approaches to blackholing the bounce mail [16:14:35] It would seem ops apparently had some input [16:14:58] commit message : Use agreed no-reply addresses that route to dev null [16:15:13] Reedy: where do you see ops input? [16:15:23] "Based on the final settings that Operations gave us, let's start using these two new email addresses for our first release:" [16:15:27] No RT ticket link though [16:15:31] oic [16:16:01] Reedy: oh, could we get https://gerrit.wikimedia.org/r/#/c/153742/ backported to wmf16? It's more logging for the CA bug [16:16:11] I still don't see why any of this is necessary, what does it do different from sending as wki@ ? [16:16:25] I've no idea :) [16:16:35] in both cases we route it outbound, and /dev/null it inbound [16:16:42] Oh, https://rt.wikimedia.org/Ticket/Display.html?id=4785 [16:17:05] hiding in plain sight [16:17:14] Reedy: if that's related to https://gerrit.wikimedia.org/r/#/c/59717/ please add as comment :) [16:17:27] (03CR) 10Reedy: "https://rt.wikimedia.org/Ticket/Display.html?id=4785" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/59717 (https://bugzilla.wikimedia.org/46670) (owner: 10Lwelling) [16:17:33] thanks [16:17:46] Reedy: reading [16:17:49] Added as See also on the bug too [16:17:56] great, was doing that :) [16:18:21] Ops involvement seems . . . hmm. :-P [16:18:29] * tonythomas wished Send new password link in https://rt.wikimedia.org worked : [16:18:40] not the issue now btw [16:22:59] the only value I see in maintaining the separate address is that it allows us, in theory, to do message counts for echo [16:23:03] !log reedy Synchronized php-1.24wmf16/extensions/CentralAuth/: (no message) (duration: 00m 14s) [16:23:07] Logged the message, Master [16:23:30] !log reedy Synchronized php-1.24wmf17/extensions/CentralAuth/: (no message) (duration: 00m 13s) [16:23:34] Logged the message, Master [16:26:02] legoktm: Try again ;) [16:26:49] better [16:27:37] something is catching the MWException and displaying it nicely [16:28:25] oh wait, that's a different error since my token expired, hm [16:29:30] Reedy: if you go to https://test.wikipedia.org/wiki/Special:UserLogin, do you get redirected to the CentralAuth auto-login? [16:29:47] Nope [16:29:47] [5b9fe9a5] /wiki/Special:CentralLogin/complete?token=ded44e816307821ab3d69df1750bd875 Exception from line 167 of /usr/local/apache/common-local/php-1.24wmf17/extensions/CentralAuth/specials/SpecialCentralLogin.php: The user account logged into does not exist. [16:29:57] well, I'm not sure how to fix that [16:30:53] legoktm: that one is an echo issue ? do we have any other alternatives to do get the message count, suppose this one gets killed ? [16:31:15] message count for? [16:31:31] I don't think anyone ever looked at the count of emails to no-reply-notifications@ [16:32:28] so, there was no other intentions behind putting up no-reply-notifications@ ? [16:33:26] on the original RT there's talk of message counts [16:34:04] imo we should talk to the Echo team and see what they think. maybe worst case we could support an alternate VERP address prefix [16:34:41] although it seems like we're already sending out badly DKIM signed messages for echo now :-P [16:34:45] Jeff_Green: but we are altering the return-path only right ? so does this custom from address cause an issue ? [16:35:29] I don't think we're altering the return path [16:36:04] exim is logging messages and I don't see any indication that it's changed the return path before final delivery [16:36:14] legoktm: What's up with CA? Anything utterly broken or just the usual legacy quirks? [16:36:23] * hoo didn't really follow [16:36:36] I'm not quite sure either [16:36:54] Jeff_Green: is this no-reply thing getting into exim as the 5th param in mail() ? [16:37:06] me neither, I think you're just not supposed to visit the centralautologin pages directly [16:37:17] legoktm: and that breaks? [16:37:41] it was doing User::newFromName( $request->something ) without checking a valid user object was created [16:37:54] tonythomas: I dunno [16:38:18] all this is worth a conversation with the echo devs [16:38:27] legoktm: ... that's like the first time ever we had such breakage [16:38:29] this week [16:39:41] werdna: Hi ! you around ? [16:39:46] <_joe_> Reedy: where is your patch to put mw1053 back in scap? [16:39:56] _joe_: bblack merged it [16:39:58] _joe_: bblac.k merged it [16:40:00] meh [16:49:37] (03PS1) 10KartikMistry: Add Chinese fonts for VE screenshots feature [operations/puppet] - 10https://gerrit.wikimedia.org/r/154086 (https://bugzilla.wikimedia.org/69535) [16:51:59] <_joe_> !log uploaded new hhvm package 3.3-dev+20140728+wmf4 [16:52:04] Logged the message, Master [16:52:24] <_joe_> (logging it because it will be updated on servers) [16:53:11] (03CR) 10Giuseppe Lavagetto: [C: 031] "Now the hhvm package provides php5" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [17:22:35] (03CR) 10Gifti: "+1" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153428 (owner: 10Steinsplitter) [17:37:16] mutante: https://gerrit.wikimedia.org/r/#/c/153639/ [17:38:30] just one more? that sounds ok [17:38:55] (03CR) 10Aaron Schulz: [C: 032] Made RunJobs use the MW exception handler [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151129 (owner: 10Aaron Schulz) [17:39:03] (03Merged) 10jenkins-bot: Made RunJobs use the MW exception handler [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151129 (owner: 10Aaron Schulz) [17:39:34] !log aaron Synchronized rpc: 6c0ece687bb6ff3fec0ca7e80a587525ebf18a70 (duration: 00m 08s) [17:39:39] Logged the message, Master [17:40:13] (03PS2) 10Dzahn: Increased the number of parsoid job runners to lower queue size [operations/puppet] - 10https://gerrit.wikimedia.org/r/153639 (owner: 10Aaron Schulz) [17:40:38] (03CR) 10Dzahn: [C: 032] Increased the number of parsoid job runners to lower queue size [operations/puppet] - 10https://gerrit.wikimedia.org/r/153639 (owner: 10Aaron Schulz) [17:42:15] mutante: yt? [17:42:21] https://gerrit.wikimedia.org/r/#/c/154059/ :) [17:42:30] need some thoughts on that, its a little weird [17:44:27] PROBLEM - puppetmaster https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:17] RECOVERY - puppetmaster https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.046 second response time [17:45:56] !log /srv/deployment/jobrunner updated to 795baf3ca4ce8308597dd74e5242aa5bfbbe961d [17:46:02] Logged the message, Master [17:48:42] (03PS1) 10Anomie: Add API usage log [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154096 [17:50:04] ottomata: yea, uhm.. i guess it's either what you do or alternative a sudo config, i'm not sure right now [17:50:55] the current group being "ganglia" seems wrong [17:51:09] ganglia? [17:51:14] so i guess it's ok, but get another review? [17:51:22] 4.0K -r-------- 1 root ganglia [17:51:25] ana1011 [17:51:39] wha [17:51:40] weird [17:55:30] i think the package must have installed it that way? [17:55:32] i haven't touched it [17:56:14] hm, mutante alternatively [17:56:54] i could make it group owned by root and group readable, and just add the yarn user to the root group? [17:57:01] hm, naw [17:57:35] there was that puppet issue with adding an existing user to an existing group [17:57:46] heheh, there's a way around it [17:58:28] ah i'm not using it anymore though [17:58:50] we even have puppet execute usermod [17:59:12] yeah, with an exec but ja [17:59:25] anyway, ja ok, i'll just stick with the file [18:00:04] Reedy, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140814T1800). Please do the needful. [18:00:28] 4 Warning: Unknown: Input variables exceeded 1000. To increase the limit change max_input_vars in php.ini. in Unknown on line 0 [18:00:35] oooo [18:00:38] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [18:00:38] i guess: add group "nsca" with puppet, add hadoop user to nsca group like done here [18:00:44] manifests/role/analytics/refinery.pp: command => 'usermod hdfs -a -G analytics-admins', [18:00:47] modules/clamav/manifests/init.pp: command => 'usermod -a -G Debian-exim clamav', [18:00:51] ottomata: shrug [18:01:48] think that is better? I'd still have to mess with the file to get it group owned by nsca [18:01:54] not sure i should create a whole new system group just for this... [18:02:23] (03PS2) 10Reedy: Wikipedias to 1.24wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154064 [18:03:48] (03CR) 10Reedy: [C: 032] Wikipedias to 1.24wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154064 (owner: 10Reedy) [18:03:49] ottomata: no, no strong opinion at all [18:04:00] ok, i'm going to stick with this then, as it is fewer impacting changes [18:04:10] we can change if someone doesn't like [18:04:37] (03PS2) 10Ottomata: Override group permissions on /etc/send_nsca.cfg on Hadoop worker nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/154059 [18:04:39] (03Merged) 10jenkins-bot: Wikipedias to 1.24wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154064 (owner: 10Reedy) [18:04:58] ok [18:05:03] (03CR) 10Ottomata: [C: 032 V: 032] "Dzahn and I discussed this in IRC. I decided to stick with it for now. If someone wants me to do this a different way, I'm happy to." [operations/puppet] - 10https://gerrit.wikimedia.org/r/154059 (owner: 10Ottomata) [18:05:47] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf16 [18:05:53] Logged the message, Master [18:08:50] (03PS2) 10Reedy: group0 to 1.24wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154065 [18:09:10] (03CR) 10Reedy: [C: 032] group0 to 1.24wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154065 (owner: 10Reedy) [18:09:14] (03Merged) 10jenkins-bot: group0 to 1.24wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154065 (owner: 10Reedy) [18:09:58] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Epic puppet fail [18:10:11] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf17 [18:10:16] Logged the message, Master [18:10:21] (03PS2) 10Reedy: Add API usage log [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154096 (owner: 10Anomie) [18:10:35] (03CR) 10Reedy: [C: 032] Add API usage log [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154096 (owner: 10Anomie) [18:10:43] (03Merged) 10jenkins-bot: Add API usage log [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154096 (owner: 10Anomie) [18:11:04] (03PS2) 10Reedy: Remove $wgExtDistArchiveAPI and $wgExtDistProxy [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153347 (owner: 10Withoutaname) [18:11:08] (03CR) 10Reedy: [C: 032] Remove $wgExtDistArchiveAPI and $wgExtDistProxy [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153347 (owner: 10Withoutaname) [18:11:12] (03Merged) 10jenkins-bot: Remove $wgExtDistArchiveAPI and $wgExtDistProxy [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153347 (owner: 10Withoutaname) [18:14:35] (03CR) 10Mxn: [C: 031] "Looks good." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149637 (https://bugzilla.wikimedia.org/68612) (owner: 10Withoutaname) [18:16:47] (03PS2) 10Reedy: Set $wgPasswordDefault to old MD5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153850 (https://bugzilla.wikimedia.org/68766) (owner: 10Parent5446) [18:18:25] (03CR) 10Reedy: [C: 032] Set $wgPasswordDefault to old MD5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153850 (https://bugzilla.wikimedia.org/68766) (owner: 10Parent5446) [18:18:57] (03PS3) 10Dzahn: ishmael - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153846 [18:19:08] (03Merged) 10jenkins-bot: Set $wgPasswordDefault to old MD5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153850 (https://bugzilla.wikimedia.org/68766) (owner: 10Parent5446) [18:19:32] (03CR) 10Dzahn: [C: 032] tendril - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153951 (owner: 10Dzahn) [18:20:21] (03PS2) 10Reedy: Point wgSiteMatrixFile at full path (not /apache symlink) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152758 [18:20:28] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 3 unmerged changes in mediawiki_config (dir /a/common/). [18:20:33] (03CR) 10Reedy: [C: 032] Point wgSiteMatrixFile at full path (not /apache symlink) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152758 (owner: 10Reedy) [18:20:40] shush icinga-wm [18:20:56] (03Merged) 10jenkins-bot: Point wgSiteMatrixFile at full path (not /apache symlink) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152758 (owner: 10Reedy) [18:20:58] (03PS23) 10BBlack: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [18:21:15] Reedy: we could make it address individual people on unmerged changes too! [18:21:25] haha [18:21:30] ACKNOWLEDGEMENT - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 4 unmerged changes in mediawiki_config (dir /a/common/). daniel_zahn OK, Reedy [18:21:41] lol [18:21:43] (03PS2) 10Ori.livneh: wmflib: add ensure_final_newline() [operations/puppet] - 10https://gerrit.wikimedia.org/r/154035 [18:21:48] (03CR) 10Ori.livneh: [C: 032 V: 032] wmflib: add ensure_final_newline() [operations/puppet] - 10https://gerrit.wikimedia.org/r/154035 (owner: 10Ori.livneh) [18:22:11] haha [18:23:23] (03PS3) 10Reedy: Change upload settings on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152042 (https://bugzilla.wikimedia.org/69171) (owner: 10Calak) [18:23:26] (03CR) 10Reedy: [C: 032] Change upload settings on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152042 (https://bugzilla.wikimedia.org/69171) (owner: 10Calak) [18:23:51] (03CR) 10Dzahn: "Notice: /Stage[main]/Tendril/Apache::Site[tendril.wikimedia.org]/Apache::Conf[tendril.wikimedia.org]/File[/etc/apache2/sites-enabled/50-te" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153951 (owner: 10Dzahn) [18:23:56] (03Merged) 10jenkins-bot: Change upload settings on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152042 (https://bugzilla.wikimedia.org/69171) (owner: 10Calak) [18:24:15] (03PS2) 10Reedy: Add subpages to main namespace on FDC wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151781 (owner: 10Matanya) [18:24:20] (03CR) 10Reedy: [C: 032] Add subpages to main namespace on FDC wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151781 (owner: 10Matanya) [18:24:57] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Epic puppet fail [18:24:57] PROBLEM - puppet last run on mw1012 is CRITICAL: CRITICAL: Epic puppet fail [18:25:07] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: Epic puppet fail [18:25:07] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Epic puppet fail [18:25:07] PROBLEM - puppet last run on mw1160 is CRITICAL: CRITICAL: Epic puppet fail [18:25:08] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: Epic puppet fail [18:25:08] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: Epic puppet fail [18:25:17] PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: Epic puppet fail [18:25:17] PROBLEM - puppet last run on platinum is CRITICAL: CRITICAL: Epic puppet fail [18:25:18] PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: Epic puppet fail [18:25:18] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: Epic puppet fail [18:25:18] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: Epic puppet fail [18:25:18] PROBLEM - puppet last run on analytics1020 is CRITICAL: CRITICAL: Epic puppet fail [18:25:27] PROBLEM - puppet last run on ssl1002 is CRITICAL: CRITICAL: Epic puppet fail [18:25:27] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Epic puppet fail [18:25:27] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: Epic puppet fail [18:25:27] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: Epic puppet fail [18:25:27] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: Epic puppet fail [18:25:28] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Epic puppet fail [18:25:28] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: Epic puppet fail [18:25:29] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: Epic puppet fail [18:25:29] PROBLEM - puppet last run on es1008 is CRITICAL: CRITICAL: Epic puppet fail [18:25:30] PROBLEM - puppet last run on rbf1002 is CRITICAL: CRITICAL: Epic puppet fail [18:25:31] gj puppet [18:25:37] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Epic puppet fail [18:25:46] ugh, what now [18:25:55] "epic puppet fail"? [18:25:59] epic? [18:26:14] (03Merged) 10jenkins-bot: Add subpages to main namespace on FDC wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151781 (owner: 10Matanya) [18:26:22] !log stopped ircecho on neon temporarily [18:26:28] Logged the message, Master [18:26:59] * hoo dislikes that he can no longer see puppet logs [18:27:20] ori: Validation of File[/etc/apache2/conf-available/00-defaults.conf] failed: [18:27:24] (03PS2) 10Reedy: Re-enable $wgRSSProxy since blog.wikimedia.org is on an external host [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152917 (owner: 10Legoktm) [18:27:27] (03CR) 10Reedy: [C: 032] Re-enable $wgRSSProxy since blog.wikimedia.org is on an external host [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152917 (owner: 10Legoktm) [18:27:50] mutante: on which host? [18:28:00] (03CR) 10Dzahn: "Error: Failed to apply catalog: Validation of File[/etc/apache2/conf-available/00-defaults.conf] failed: You cannot specify more than one " [operations/puppet] - 10https://gerrit.wikimedia.org/r/154035 (owner: 10Ori.livneh) [18:28:14] ori: i picked mw1026, but it's like on all appservers [18:28:19] see above [18:28:23] i pasted into gerrit [18:28:28] (03PS1) 10Ori.livneh: Revert "wmflib: add ensure_final_newline()" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154105 [18:28:37] (03CR) 10Ori.livneh: [C: 032 V: 032] "blergh." [operations/puppet] - 10https://gerrit.wikimedia.org/r/154105 (owner: 10Ori.livneh) [18:28:55] (03Merged) 10jenkins-bot: Re-enable $wgRSSProxy since blog.wikimedia.org is on an external host [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152917 (owner: 10Legoktm) [18:29:35] (03PS2) 10Reedy: Change user groups rights on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153427 (https://bugzilla.wikimedia.org/69394) (owner: 10Calak) [18:29:39] mutante: apparently if your function accepts undef and returns it unmodified it becomes magically not an undef [18:29:40] (03CR) 10Reedy: [C: 032] Change user groups rights on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153427 (https://bugzilla.wikimedia.org/69394) (owner: 10Calak) [18:29:43] "wrapped exception".. sounds like a present :p [18:29:44] because the function returns an rvalue [18:29:45] thanks puppet [18:30:14] hah, arg [18:30:49] (03Merged) 10jenkins-bot: Change user groups rights on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153427 (https://bugzilla.wikimedia.org/69394) (owner: 10Calak) [18:33:33] (03PS1) 10BBlack: Add entries for hhvm-appservers [operations/dns] - 10https://gerrit.wikimedia.org/r/154108 [18:34:12] waits a bit longer before letting icinga-wm come back, but it works again on a random appserver, mw1026 recovered [18:34:32] (03PS3) 10Reedy: Wikimedia wikis configuration for $wgExtraSignatureNamespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149591 (owner: 10Nemo bis) [18:34:35] (03CR) 10BBlack: [C: 032] Add entries for hhvm-appservers [operations/dns] - 10https://gerrit.wikimedia.org/r/154108 (owner: 10BBlack) [18:34:42] (03CR) 10Reedy: [C: 032] Wikimedia wikis configuration for $wgExtraSignatureNamespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149591 (owner: 10Nemo bis) [18:37:04] (03PS2) 10BBlack: Create internal LVS cluster 'hhvm_appservers' [operations/puppet] - 10https://gerrit.wikimedia.org/r/152908 (owner: 10Mark Bergsma) [18:37:15] (03PS2) 10BBlack: Add monitoring for LVS service hhvm-appservers.svc.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/152909 (owner: 10Mark Bergsma) [18:37:59] (03Merged) 10jenkins-bot: Wikimedia wikis configuration for $wgExtraSignatureNamespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149591 (owner: 10Nemo bis) [18:38:44] (03PS2) 10Reedy: Close body and html tags [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149644 [18:38:49] (03CR) 10Reedy: [C: 032] Close body and html tags [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149644 (owner: 10Reedy) [18:39:09] (03CR) 10BBlack: [C: 031] Create internal LVS cluster 'hhvm_appservers' [operations/puppet] - 10https://gerrit.wikimedia.org/r/152908 (owner: 10Mark Bergsma) [18:39:12] (03Merged) 10jenkins-bot: Close body and html tags [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149644 (owner: 10Reedy) [18:39:21] (03CR) 10BBlack: [C: 031] Add monitoring for LVS service hhvm-appservers.svc.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/152909 (owner: 10Mark Bergsma) [18:39:49] (03PS2) 10Reedy: Enable job queue to process notification on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150742 (owner: 10Bsitu) [18:39:58] (03CR) 10Reedy: [C: 032] Enable job queue to process notification on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150742 (owner: 10Bsitu) [18:40:02] (03Merged) 10jenkins-bot: Enable job queue to process notification on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150742 (owner: 10Bsitu) [18:40:35] (03PS2) 10Reedy: Add WikimediaShopLink's SkinBuildSidebar hook directly [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149787 (https://bugzilla.wikimedia.org/55678) (owner: 10Legoktm) [18:41:06] (03CR) 10Reedy: [C: 032] Add WikimediaShopLink's SkinBuildSidebar hook directly [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149787 (https://bugzilla.wikimedia.org/55678) (owner: 10Legoktm) [18:41:10] (03Merged) 10jenkins-bot: Add WikimediaShopLink's SkinBuildSidebar hook directly [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149787 (https://bugzilla.wikimedia.org/55678) (owner: 10Legoktm) [18:41:38] (03CR) 10Dzahn: [C: 032] graphite - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153952 (owner: 10Dzahn) [18:42:36] (03PS2) 10Reedy: Enable GuidedTour extension on tewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151639 (https://bugzilla.wikimedia.org/69103) (owner: 10Phuedx) [18:42:40] (03CR) 10Reedy: [C: 032] Enable GuidedTour extension on tewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151639 (https://bugzilla.wikimedia.org/69103) (owner: 10Phuedx) [18:42:44] (03Merged) 10jenkins-bot: Enable GuidedTour extension on tewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151639 (https://bugzilla.wikimedia.org/69103) (owner: 10Phuedx) [18:43:23] (03PS2) 10Reedy: Grant 'block' to qa_automation group on test2wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153732 (https://bugzilla.wikimedia.org/61799) (owner: 10Spage) [18:43:28] (03CR) 10Reedy: [C: 032] Grant 'block' to qa_automation group on test2wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153732 (https://bugzilla.wikimedia.org/61799) (owner: 10Spage) [18:46:01] (03CR) 10Dzahn: "Notice: /Stage[main]/Role::Graphite/Apache::Site[graphite.wikimedia.org]/Apache::Conf[graphite.wikimedia.org]/File[/etc/apache2/sites-avai" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153952 (owner: 10Dzahn) [18:47:16] (03PS2) 10QChris: Reschedule backups to not interfer with queue runs so easily [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/153388 (https://bugzilla.wikimedia.org/68731) [18:49:03] (03CR) 10Legoktm: "Uhhhhhhhhhh can we do a more progressive rollout of this? We already have issues with emails not getting out on time, I'd rather not add t" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150742 (owner: 10Bsitu) [18:49:28] (03PS2) 10Dzahn: performance.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153953 [18:50:03] (03CR) 10Dzahn: [C: 032] ishmael - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153846 (owner: 10Dzahn) [18:50:28] (03PS1) 10Legoktm: Revert "Enable job queue to process notification on all wikis" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154112 [18:50:31] Reedy: ^ [18:50:41] lol [18:51:20] I didn't see the original one otherwise I would have -1'd it [18:51:55] (03PS2) 10Reedy: Revert "Enable job queue to process notification on all wikis" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154112 (owner: 10Legoktm) [18:51:59] (03CR) 10Reedy: [C: 032] Revert "Enable job queue to process notification on all wikis" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154112 (owner: 10Legoktm) [18:52:12] (03Merged) 10jenkins-bot: Revert "Enable job queue to process notification on all wikis" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154112 (owner: 10Legoktm) [18:52:28] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [18:53:03] (03PS2) 10Reedy: Change autoconfirmed settings on nowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153848 (https://bugzilla.wikimedia.org/69302) (owner: 10Calak) [18:53:07] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:53:15] (03CR) 10Reedy: [C: 032] Change autoconfirmed settings on nowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153848 (https://bugzilla.wikimedia.org/69302) (owner: 10Calak) [18:53:19] (03Merged) 10jenkins-bot: Change autoconfirmed settings on nowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153848 (https://bugzilla.wikimedia.org/69302) (owner: 10Calak) [18:54:07] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:54:27] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:54:27] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [18:54:33] (03CR) 10Nuria: [C: 031] Reschedule backups to not interfer with queue runs so easily [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/153388 (https://bugzilla.wikimedia.org/68731) (owner: 10QChris) [18:55:10] !log reedy Synchronized database lists: (no message) (duration: 00m 14s) [18:55:16] Logged the message, Master [18:55:17] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:55:22] (03CR) 10Ottomata: [C: 032 V: 032] Reschedule backups to not interfer with queue runs so easily [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/153388 (https://bugzilla.wikimedia.org/68731) (owner: 10QChris) [18:55:30] (03CR) 10JanZerebecki: [C: 031] wikistats - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153828 (owner: 10Dzahn) [18:55:57] RECOVERY - puppet last run on mw1030 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [18:56:07] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [18:57:29] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 14s) [18:57:35] Logged the message, Master [18:59:18] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Epic puppet fail [18:59:29] (03CR) 10Ori.livneh: [C: 031] performance.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153953 (owner: 10Dzahn) [19:00:33] goodbye WikimediaShopLink! [19:02:13] lolol [19:02:56] is ori or bd808|BUFFER around? [19:02:58] * aude guess not [19:03:04] both are on vacation I think [19:03:09] bd808|BUFFER is really offline [19:03:12] ori is/was about [19:03:25] http://wikidata.beta.wmflabs.org/wiki/Q1571 or any other item does not load for me [19:03:32] * aude wants to find the logs for this [19:03:53] not urgent but not great that we can't run selenium tests there [19:03:59] (03PS3) 10BBlack: introduce labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153053 (owner: 10Dzahn) [19:04:23] w00t [19:09:24] (03PS2) 10BBlack: add yuvipanda to labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153054 (owner: 10Dzahn) [19:09:53] (03CR) 10BBlack: [C: 032] introduce labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153053 (owner: 10Dzahn) [19:10:02] (03CR) 10BBlack: [C: 032] add yuvipanda to labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153054 (owner: 10Dzahn) [19:11:24] (03PS1) 10Ottomata: Include role::analytics::refinery::data::check on analyics1027 [operations/puppet] - 10https://gerrit.wikimedia.org/r/154115 [19:11:33] (03PS2) 10Ottomata: Include role::analytics::refinery::data::check on analyics1027 [operations/puppet] - 10https://gerrit.wikimedia.org/r/154115 [19:11:40] (03PS3) 10Ottomata: Include role::analytics::refinery::data::check on analytics1027 [operations/puppet] - 10https://gerrit.wikimedia.org/r/154115 [19:12:25] (03CR) 10Ottomata: [C: 032 V: 032] Include role::analytics::refinery::data::check on analytics1027 [operations/puppet] - 10https://gerrit.wikimedia.org/r/154115 (owner: 10Ottomata) [19:13:13] (03PS3) 10Yurik: Added graph extension to labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153867 [19:13:33] greg-g, i'll sync ^, its a labs only ext update [19:13:57] (03CR) 10Yurik: [C: 032] Added graph extension to labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153867 (owner: 10Yurik) [19:14:56] yurikR: kk [19:15:03] oh... [19:15:12] sup? [19:15:22] greg-g: is there an open slot to update test.wikidata? [19:15:31] graph thing, so, I'm unsure about security review for it [19:15:39] greg-g, its on labs only [19:15:41] yurikR: beta cluster isn't private [19:15:45] doesn't matter [19:15:46] that's fine [19:15:56] graph is not for private stuff [19:16:19] its a very basic extension that simply includes a few extra javascript files if it detects tag [19:16:28] hence - labs for now [19:16:30] I thought the reason you opted to not do a security review was "it'll be on a private wiki, thus safe" [19:16:34] (03Merged) 10jenkins-bot: Added graph extension to labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153867 (owner: 10Yurik) [19:17:14] greg-g, sorry, probably didn't explain clearly: i would like to use on privatewiki, but it will also be highly usable on general wikipedia [19:17:30] so, security review is needed, based on what you just said :) [19:17:52] of course - for the public deployment [19:18:01] beta is public [19:18:17] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [19:18:37] greg-g, but we don't have any secret code there [19:18:56] i mean - we don't have anything that holds passwords, accesses private DBs, etc [19:19:07] (03PS1) 10Dzahn: add labmon1001 to site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/154117 [19:19:12] its a simple include javascript here command :) [19:19:27] so, I'm going off of how Erik wanted to treat beta cluster and security reviews, he has pushed back before in this kind of case [19:19:54] I kinda agree with it, because the promotion bit from beta -> prod is easy to miss [19:20:20] i'm a bit confused how this is different from us simply adding that exact code to one of the existing extensions and deploying it directly? [19:20:38] (03CR) 10Aude: [C: 031] "shall merge when ready to deploy" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149928 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [19:20:38] but if you think we shouldn't, lets wait [19:20:55] greg-g, just let me know what the guidelines are [19:21:00] yurikR: there's always ways around processes, the point is to do what the spirit of the process is [19:21:11] right, hence - what is the process :) [19:21:16] (that was a question) [19:21:16] :) [19:21:33] YuviPanda: bblack [19:21:44] hmm? [19:22:06] YuviPanda: bblack: and that's how it would be used then https://gerrit.wikimedia.org/r/#/c/154117/1/manifests/site.pp [19:22:08] yurikR: it's tough, I want beta to be low friction, but I also don't want to miss things and have something go to prod without a review. [19:22:10] if here we are just adding an external javascript-only lib and handling a new tag, but we don't enable it UNLESS we change commonSettings, how is that a risk? [19:22:20] yurikR: so the safe thing is: reviews before beta cluster [19:22:25] oki [19:23:03] mutante: yeah, cool :) [19:23:04] csteipp said he will do it by tomorrow :) [19:23:07] yurikR: I can be convinced otherwise, but I'd need a really good reasoning to overturn erik's opinion [19:23:16] * greg-g nods [19:23:20] greg-g: yurikR +1 on security review :) reduces friction from labs -> prod [19:23:46] (03PS2) 10Dzahn: add labmon1001 to site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/154117 [19:23:49] mutante: bblack I'll probably not get to writing it this week, but should do next week [19:23:58] yurikR: if chris can do it tomorrow, I give you a pass to deploy it to beta tomorrow ;) [19:24:06] greg-g, my only goal was to get it quicker into prod because milimetric wants to experiment with it as it would be in prod to style it better [19:24:21] throw up a wmflabs instance for him? [19:24:39] i tried - but could't get a pub ip :) [19:25:14] plus it wouldn't be the same - we are talking about experimenting with pulling data from commons, etc [19:25:53] * greg-g nods [19:25:53] oki, in any case, no rush here, reverting :) [19:26:19] thank you for understanding sir [19:26:26] yurikR: you don't need a public IP! [19:26:28] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /a/common/). [19:26:31] thank you for keeping me on course :) [19:26:38] yurikR: use 'Web proxies' on the left nav [19:26:51] yurikR: we don't really hand out public IPs if all you want is a web interface. [19:27:12] (03CR) 10Rush: [C: 031] "This seems good to me. Nice." [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/153706 (owner: 10Ottomata) [19:27:22] (03PS1) 10Yurik: Revert "Added graph extension to labs" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154118 [19:27:26] (03PS3) 10BBlack: add labmon1001 to site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/154117 (owner: 10Dzahn) [19:28:11] (03CR) 10BBlack: [C: 032] add labmon1001 to site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/154117 (owner: 10Dzahn) [19:29:30] !log puppeting labmon1001, etc [19:29:34] Logged the message, Master [19:30:28] (03CR) 10Hashar: [C: 04-1] "The fonts should be installed on the contint slave which only include mediawiki::package . The class mediawiki::multimedia::fonts is not " [operations/puppet] - 10https://gerrit.wikimedia.org/r/154086 (https://bugzilla.wikimedia.org/69535) (owner: 10KartikMistry) [19:31:55] greg-g: would it be ok if we deploy https://gerrit.wikimedia.org/r/#/c/154119/ and https://gerrit.wikimedia.org/r/149928 before end of the day sometime? [19:32:22] either now, before i go home, or in ~2 hourse from now after i eat / go home [19:32:37] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [19:32:47] that would give the weekend for us and our users to test new stuff [19:32:51] YuviPanda: ^ :) [19:33:00] hours* [19:33:01] YuviPanda: you should be able to log into labmon1001 now, although it doesn't really have mucb beyond basics + your access [19:33:05] mutante: \o/ sweeet [19:33:39] bblack: mutante right. I'll replicate some of tungsten's setup there (txstatsd + graphite), with variations for storage locations [19:34:02] bblack: mutante however, python-txstatsd package isn't in trusty from apt.wm.o - we built the precise package as well. So I'll need to fix that first [19:34:27] *nod*, just make a patch that applies the roles you like, or combine it into a single role::labmon [19:37:16] mutante: yeah, will do :) [19:39:03] ottomata: i would like to touch limn instance proxy, is that stat1003 ? critical?/ [19:39:19] greg-g: Reedy https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=123445&oldid=123433 [19:39:36] so, i'll go home and be online in ~2 hours from now [19:39:44] hope that is ok! [19:39:54] "Whoops! The default skin for your wiki ($wgDefaultSkin), vector, is not available. " [19:39:58] mutante: i am not fully sure [19:40:02] limn doesn't actually run in production [19:40:04] it runs in labs [19:41:51] MatmaRex: I just responded on your RT about bits logs [19:42:33] mutante: growl, my passive monitor_service isn't showing up in icinga configs on neon! [19:42:38] everything seems fine in puppet [19:42:55] i've run puppet on the exporting node and on neon [19:43:05] do you know if there is a way to check a storedconfig value? [19:53:53] (03CR) 10Yurik: [C: 032] Revert "Added graph extension to labs" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154118 (owner: 10Yurik) [19:55:08] (03Merged) 10jenkins-bot: Revert "Added graph extension to labs" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154118 (owner: 10Yurik) [19:56:00] oof, chasemp, messed with icinga + puppet much? [19:56:07] i've done this before, and everything seems right [19:56:18] but a monitor_service is not making it into icinga configs [19:56:23] ottomata: re: limn.. ok, that explains, yea [19:56:30] ottomata: re: stored configs, in mysql backend directly [19:56:33] yeah [19:56:35] found that mutante [19:56:36] db1001 [19:56:39] the exported resource is there [19:56:43] but neon is not picking it up [19:56:46] ottomata: we use some custom dump for exported resources I think? [19:56:48] i looked at root@palladium:~# view /usr/local/sbin/puppetstoredconfigclean.rb [19:57:01] uhm.. [19:57:58] https://gist.github.com/ottomata/52461f493a8f2d31423a [19:58:27] with stored configs [19:58:37] you have to run it on the host, then the master, then dest host [19:58:51] so whatever host is generated the record has to run puppet, then the master I think has to populate, then neon [19:58:59] or something like that, did you run puppet all around? [19:59:12] exported resources I meant [19:59:48] the master does? [19:59:55] i ran puppet on analytics1027 and on neon [20:00:17] you're saying puppet masters too? [20:00:19] doing that... [20:00:27] that was a dumb thing to say i think on my part [20:00:42] well, can't hurt :) [20:00:48] I think just the generating host and neon, but it's been awhile since I was in exported resource hell [20:00:49] part one looks good [20:00:58] analytics1027 is exporting to the storedconfig db (mysql) just fine [20:01:03] https://gist.github.com/ottomata/52461f493a8f2d31423a [20:01:21] neon is just not realizing the exported resource [20:01:24] for some reason... [20:01:27] I remember someone saying normal exported resources were taking too long [20:01:31] so we have a custom dump script [20:01:34] that ships config over [20:01:42] I think _joe|away rewrote it at some point? [20:01:49] maybe it's on it's own schedule [20:01:52] cron or something idk [20:02:20] aude: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=123452&oldid=123445 [20:02:21] :P [20:03:29] looking...don't really see anything like that [20:04:54] (03PS1) 10Calak: Add botadmin user group on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154126 (https://bugzilla.wikimedia.org/69411) [20:05:15] me neither but I thought for sure that was the deal [20:07:26] bah :'( [20:07:49] Passive check result was received for service 'hive_partition_webrequest-upload' on host 'analytics1027', but the service could not be found! [20:07:59] cmon puppet! [20:08:02] just put my thing there! [20:09:37] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [20:10:11] ottomata: I haven't done this yet either man, mutante do you k now how to get neon to pick up new exported resources? [20:10:40] ottomata: thanks, replied back [20:13:57] maybe it will just magically show up while i'm not working and start alerting everybody [20:14:04] i hope not! [20:14:20] (03PS7) 10Ottomata: Add script and class to manage HDFS user directories [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/153706 [20:14:25] (03CR) 10Ottomata: [C: 032 V: 032] Add script and class to manage HDFS user directories [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/153706 (owner: 10Ottomata) [20:14:26] what is the check? if it does and I'm around I'll silence it [20:16:27] springle: I wonder why the 'SELECT tmi_value FROM `translate_messageindex` WHERE tmi_key = '0:aPI' LIMIT 1' query keeps randomly timing out [20:17:58] it's fine when I run it manually...maybe it has to do with the occasional DELETE of all rows + REPLACE [20:18:13] aude: sure :) [20:18:18] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Epic puppet fail [20:18:28] about ~6k rows [20:20:12] chasemp: these 4 https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/analytics/refinery.pp#L113 [20:20:40] ok I'll be around for a bit I'll keep an eye out too [20:21:26] re.. are you sure the service name matches exactly? [20:21:35] (03PS1) 10Ottomata: Auto create HDFS user directories for users in a group [operations/puppet] - 10https://gerrit.wikimedia.org/r/154130 [20:21:44] in icinga config and in send_nsca [20:21:49] well, even if it didn't, mutante, the service should be created in icinga [20:21:59] its not being rendered in any of the icinga config files right now [20:22:14] and, here's an example from syslog [20:22:19] Warning: Passive check result was received for service 'hive_partition_webrequest-bits' on host 'analytics1027', but the service could not be found! [20:22:51] ottomata: i actually see that service [20:22:56] root@neon:/etc/icinga# grep -r "hive_part" * [20:23:03] puppet_services.cfg:# --PUPPET_NAME-- analytics1027 hive_partition_webrequest-bits [20:23:14] ! [20:23:44] IT WASN'T there 5 mins ago! [20:24:26] :p yea, needs more than one puppet run? hrmm [20:24:29] it got added with the Aug 14 20:19:08 [20:24:35] man i've been trying to run that thing for hte last hours [20:24:39] haha, it is magical! [20:25:15] :) check this [20:25:21] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=hive [20:25:32] pending now [20:25:56] ja see it! [20:25:57] awesooome [20:26:17] in the beginning of the next hour it hsould get its first notification [20:26:26] i'm going to force one via send_nsca, just to make sure that part works [20:26:34] you could manually run send_nsca on one of them if you want to [20:26:36] yea, that [20:28:42] (03CR) 10Dzahn: [C: 032] performance.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153953 (owner: 10Dzahn) [20:29:58] mutante: ok, it looks good in the logs [20:30:03] is that what passive checks look like though? [20:30:04] in icinga? [20:30:07] the status didn't change [20:30:27] i assume it will always be pending, unless it doesn't receive the passive check within the freshness_interval? [20:30:34] in which case it will fail? [20:30:53] (03PS2) 10Ottomata: Auto create HDFS user directories for users in a group [operations/puppet] - 10https://gerrit.wikimedia.org/r/154130 [20:30:59] (03CR) 10Ottomata: [C: 032 V: 032] Auto create HDFS user directories for users in a group [operations/puppet] - 10https://gerrit.wikimedia.org/r/154130 (owner: 10Ottomata) [20:31:33] ottomata: it should stay OK unless it doesnt receive the passive check ... [20:31:58] but since it just got added, unsure how long in PENDING is normarl [20:32:41] you can tell they are passive checks by that little icon with 4 green arrows [20:33:40] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [20:33:40] PROBLEM - puppet last run on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:34:01] well, i just used send_nsca to send a passive check, thought that would have made it into OK [20:34:07] ottomata: they should turn from pending to OK when they receive the very first packet [20:34:26] syslog shows them receiving the check, hmm [20:34:28] yes, that should be what happens [20:34:31] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 1315 seconds ago with 0 failures [20:34:38] and syslog isn't saying that the service is undefined anymore [20:34:43] so it looks like icinga is accepting it [20:34:52] it also doesnt say anything about mismatching hostname or so? [20:36:02] Aug 14 20:31:00 neon icinga: PASSIVE SERVICE CHECK: analytics1027;hive_partition_webrequest-upload;0;OK: A dataset has recently become ready. Location: hdfs://analytics-hadoop/wmf/data/raw/webrequest/webrequest_upload/hourly/2014/08/14/18 [20:36:12] that looks good [20:36:36] and it actually is OK in webui [20:36:38] works, ottomata [20:36:40] PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:36:49] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=analytics1027&service=hive_partition_webrequest-upload [20:36:50] PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:36:50] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:36:58] ^ wait for the others to come up in a few i suppose [20:37:01] PROBLEM - Disk space on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:37:06] oh yay! [20:37:22] mw1053, again -.- [20:37:40] RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output [20:37:41] PROBLEM - puppet last run on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:37:41] RECOVERY - DPKG on mw1053 is OK: All packages OK [20:37:45] AWESOOOOOME [20:37:45] <_joe|away> hoo: yes [20:38:00] ottomata: :) [20:38:00] RECOVERY - Disk space on mw1053 is OK: DISK OK [20:38:21] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [20:40:50] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [20:44:08] (03CR) 10Reza: [C: 031] "Great!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154126 (https://bugzilla.wikimedia.org/69411) (owner: 10Calak) [20:45:01] (03CR) 10Dzahn: "50-performance-wikimedia-org.conf: symbolic link to `/etc/apache2/sites-available/50-performance-wikimedia-org.conf'" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153953 (owner: 10Dzahn) [20:47:21] (03CR) 10Dzahn: [C: 032] ganglia - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153955 (owner: 10Dzahn) [20:47:50] PROBLEM - puppet last run on analytics1010 is CRITICAL: CRITICAL: Puppet has 1 failures [20:47:58] <_joe|away> !log stopping puppet, jobrunner on mw1053; HHVM is eating memory like godzilla [20:48:03] Logged the message, Master [20:48:30] lol [20:49:22] _joe|away: so it's probably leaking memory and descriptors, yay [20:49:58] <_joe|away> AaronSchulz: I'm not really looking into it, godog is taking a look [20:50:49] <_joe|away> AaronSchulz: what strikes me as really strange is testwiki is completely healthy [20:50:58] yep, trying to figure out if we can at least get some idea of what's going on before killing it [20:51:23] <_joe|away> godog: open files and net sockets can be an interesting candidate [20:51:40] ~700 [20:52:18] <_joe|away> it has a huge list of running threads AFAICS [20:54:05] (03CR) 10Dzahn: "all good here too. removed old config. notice: /Stage[main]/Ganglia::Web/Apache::Site[ganglia.wikimedia.org]/Apache::Conf[ganglia.wikimedi" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153955 (owner: 10Dzahn) [20:54:47] 28948 apache 20 0 14.436g 0.010t 9904 S 0.0 90.9 [20:55:34] haha [20:56:09] (03CR) 10Bsitu: "Yeah, that makes sense. Probably try a couple of medium wikis first then a large wiki, finally a full rollout" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154112 (owner: 10Legoktm) [20:56:14] (03PS3) 10Dzahn: download.wm.org - use apache::site method [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 [20:56:15] bah can't find the script to dump hhvm stuff with gdb we wrote a while ago :( [20:56:41] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [20:57:08] (03PS1) 10Hashar: Phase out $wgRateLimitLog in favor of debug bucket [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154142 [20:57:09] anyway I wanted to try and run that and get a dump at least [20:57:54] (03PS1) 10Ottomata: Fix missing slash in puppet file URL [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/154143 [20:58:03] (03PS2) 10Ottomata: Fix missing slash in puppet file URL [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/154143 [20:58:19] (03CR) 10Ottomata: [C: 032 V: 032] Fix missing slash in puppet file URL [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/154143 (owner: 10Ottomata) [20:58:31] <_joe|away> godog: it's on deployment-mediawiki01 [20:58:54] (03PS1) 10Ottomata: Update cdh module with fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/154145 [20:59:01] (03PS2) 10Ottomata: Update cdh module with fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/154145 [20:59:07] (03CR) 10Ottomata: [C: 032 V: 032] Update cdh module with fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/154145 (owner: 10Ottomata) [20:59:41] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [20:59:49] yep found it, thanks [21:00:08] (03CR) 10Dzahn: svn - move Apache config from file to template (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153989 (owner: 10Dzahn) [21:00:40] PROBLEM - puppet last run on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:00:45] (03CR) 10Dzahn: [C: 032] svn - move Apache config from file to template [operations/puppet] - 10https://gerrit.wikimedia.org/r/153989 (owner: 10Dzahn) [21:00:49] I think I broke it :( [21:02:41] PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:02:50] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:03:02] a bit slow but it is running gdb [21:03:08] I noticed [21:03:26] (03PS1) 10Ottomata: Disable cdh::hadoop::users temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/154146 [21:03:34] (03CR) 10jenkins-bot: [V: 04-1] Disable cdh::hadoop::users temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/154146 (owner: 10Ottomata) [21:03:36] (03PS2) 10Ottomata: Disable cdh::hadoop::users temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/154146 [21:03:45] (03CR) 10Ottomata: [C: 032 V: 032] Disable cdh::hadoop::users temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/154146 (owner: 10Ottomata) [21:04:15] AaronSchulz: slowly but surely, if it doesn't finish we'll restart and that's it I think, not sure if there are any other debug means [21:04:41] (03CR) 10Dzahn: "pass:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153989 (owner: 10Dzahn) [21:04:50] PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:05:20] PROBLEM - check if dhclient is running on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:05:31] PROBLEM - nutcracker process on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:05:31] PROBLEM - SSH on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:41] PROBLEM - nutcracker port on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:05:50] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:07:01] PROBLEM - Disk space on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:08:00] RECOVERY - Disk space on mw1053 is OK: DISK OK [21:08:41] RECOVERY - nutcracker port on mw1053 is OK: TCP OK - 0.000 second response time on port 11212 [21:08:42] (03PS4) 10Dzahn: stats.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 [21:09:21] RECOVERY - nutcracker process on mw1053 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker [21:09:28] (03CR) 10Dzahn: "PS4: also use apache::conf for ports config" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 (owner: 10Dzahn) [21:09:30] RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [21:09:31] RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output [21:09:40] Saved HHVM debug data in /tmp/hhvm.28948.{bt,core,bin}. [21:09:40] RECOVERY - DPKG on mw1053 is OK: All packages OK [21:09:43] success! [21:09:50] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [21:10:02] -rw-r--r-- 1 root root 15197545536 Aug 14 21:09 hhvm.28948.core [21:10:10] RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient [21:10:22] I'm going to restart it if there are no objections _joe|away AaronSchulz [21:10:22] (03CR) 10Dzahn: "QChris: yea, so far i expected we delete them manually and it's not worth to keep them in puppet for a one-time action" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 (owner: 10Dzahn) [21:10:54] ok [21:10:56] !log restarted hhvm on mw1053 [21:11:03] Logged the message, Master [21:11:18] it is back, not the jobrunner tho [21:14:28] it is curious, the memory usage seems to be correlated to network traffic [21:14:52] like it was buffering the response or the request in memory [21:14:53] mutante: were you seeing the no-skins thing when logged in, logged out or both? [21:16:26] MatmaRex: logged out, failed to test logged in before it was gone [21:16:48] hmm. [21:20:48] (03CR) 10Dzahn: gerrit - use apache::site (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 (owner: 10Dzahn) [21:22:44] AaronSchulz: does the job queue suffer if we leave mw1053 off? [21:23:19] (03CR) 10Dzahn: "this was here because of:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153992 (owner: 10Dzahn) [21:23:30] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.003 second response time [21:23:52] I admit it is quite amusing to see the other's cpu graphs, basically they are pegged at 100% all the time [21:23:58] godog: no [21:24:41] AaronSchulz: ok, I'll keep it off [21:24:50] (03CR) 10Dzahn: "this was here because of:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153993 (owner: 10Dzahn) [21:25:12] mainly because likely we damage the job queue by having hhvm hung (or slow? not clear) [21:25:56] <_joe|away> godog: I turned off the jobrunner already [21:26:02] hmm. graphite? i'll take a look [21:26:46] godog: you are working on hhvm? [21:27:07] do you know where i can find logs for http://wikidata.beta.wmflabs.org/wiki/Q2558 (503 error) [21:27:10] ? [21:27:22] <_joe|away> aude: deployment-mediawiki02 [21:27:24] <_joe|away> I guess [21:27:29] i looked! [21:27:37] <_joe|away> or deployment-mediawiki01 [21:27:48] also looked there [21:28:02] <_joe|away> if hhvm is configured to log there, but you should really ask or.i or bryan [21:28:15] <_joe|away> as they have set up most details in beta [21:28:38] in /tmp ? [21:28:56] hhvm.something ? [21:29:13] <_joe|away> maybe [21:29:20] i don't see them [21:29:30] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.010 second response time [21:29:31] <_joe|away> or /var/log/hhvm [21:29:31] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 32 data above and 0 below the confidence bounds [21:29:31] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 32 data above and 0 below the confidence bounds [21:29:33] shall ask or.i and bryan again [21:29:44] not too useful that beta wikidata is inaccessible [21:32:40] it might have been lost somewhere, -rw-r--r-- 1 apache apache 20017088 Jul 31 18:36 /var/log/hhvm/error.log [21:33:33] (03PS4) 10Dzahn: contint-use apache::site,move config to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 [21:34:29] error.log was last updated on july 31 [21:34:41] on deployment 01 [21:34:53] and 02 [21:40:10] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 19:39:46 UTC [21:41:41] (03CR) 10Dzahn: "amended, recompiled as #217" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 (owner: 10Dzahn) [21:43:12] (03CR) 10Dzahn: "also see, for example:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 (owner: 10Dzahn) [21:49:22] !log reedy Synchronized php-1.24wmf17/includes/context/RequestContext.php: (no message) (duration: 00m 15s) [21:49:28] Logged the message, Master [21:50:20] PROBLEM - Disk space on gallium is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 10 MB (2% inode=99%): [21:57:31] (03PS8) 10Dzahn: turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 [21:57:56] (03PS9) 10Dzahn: turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 [22:01:23] (03PS1) 10Aude: Bump cache epoch for test.wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154204 [22:02:56] aude: Are we live there? [22:03:02] I mean the new version [22:03:20] hoo|busy: now is our deploy time https://wikitech.wikimedia.org/wiki/Deployments [22:03:24] (03PS2) 10Ori.livneh: Bumped SlowQueryThreshold since the log is spammy and untruncated [operations/puppet] - 10https://gerrit.wikimedia.org/r/151233 (owner: 10Aaron Schulz) [22:03:37] * aude do [22:03:46] (03PS3) 10Ori.livneh: Bumped SlowQueryThreshold since the log is spammy and untruncated [operations/puppet] - 10https://gerrit.wikimedia.org/r/151233 (owner: 10Aaron Schulz) [22:03:49] bump the cache epoch after that [22:03:53] yep [22:03:55] (03CR) 10Ori.livneh: [C: 032 V: 032] Bumped SlowQueryThreshold since the log is spammy and untruncated [operations/puppet] - 10https://gerrit.wikimedia.org/r/151233 (owner: 10Aaron Schulz) [22:03:59] I'm packing still [22:04:03] sure [22:04:05] well and preparing [22:05:26] (03PS10) 10Dzahn: turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 [22:05:51] (03PS11) 10Dzahn: turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 [22:06:20] (03CR) 10Ori.livneh: [C: 04-2] "I already fixed this yesterday" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153993 (owner: 10Dzahn) [22:06:20] RECOVERY - Disk space on gallium is OK: DISK OK [22:06:24] (03CR) 10Dzahn: "please remove the " D modules/nginx" argggg" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 (owner: 10Dzahn) [22:08:06] (03PS1) 10Aaron Schulz: Increased wgParsoidCacheUpdateTitlesPerJob to 12 to lower the backlog [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154205 [22:08:47] (03CR) 10Ori.livneh: [C: 031] Increased wgParsoidCacheUpdateTitlesPerJob to 12 to lower the backlog [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154205 (owner: 10Aaron Schulz) [22:10:41] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [22:13:19] !log aude Started scap: Update branch for test.wikidata [22:13:25] Logged the message, Master [22:13:59] !log aude scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="test2wiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.kFlVQdKnM2" ' returned non-zero exit status 255 (duration: 00m 40s) [22:14:05] Logged the message, Master [22:14:19] oh? [22:14:55] does /tmp/tmp.kFlVQdKnM2 contain anything useful? [22:15:02] looking [22:15:05] bblack: you on RT today? [22:15:11] yup [22:15:23] https://bugzilla.wikimedia.org/show_bug.cgi?id=69560 - https://lists.wikimedia.org/pipermail/wikitech-l/ redirects to the HTTP version [22:15:44] I don't know where you'd look to fix that - exim maybe? [22:16:55] ori: empty [22:17:13] aude: try running: /usr/local/bin/mwscript mergeMessageFileList.php --wiki="test2wiki" --list-file="/a/common/wmf-config/extension-list" [22:17:16] and see what the output is [22:18:02] Thehelpfulone: https://gerrit.wikimedia.org/r/#/c/145616/ [22:18:44] ok [22:18:56] ah [22:20:04] mutante: what's the "additional research"? [22:20:35] i see the problem [22:20:35] I think it just means that those reviewing it aren't too sure about the impact of the change since they didn't implement it originally. [22:21:07] Thehelpfulone: "why it's like that" afaict [22:21:09] need to update the wikidata branch again :/ [22:21:17] Thehelpfulone: saw inline comments too? [22:21:24] yeah just reading them now [22:21:26] yikes [22:21:35] yeah I end to think JanZerebecki comments are sound, but this is the first I've looked at the patch [22:21:56] (03CR) 10Aaron Schulz: [C: 032] Increased wgParsoidCacheUpdateTitlesPerJob to 12 to lower the backlog [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154205 (owner: 10Aaron Schulz) [22:21:59] aude: OK. what would be a helpful message for scap to emit in this case? if you have a chance after your deployment, it'd be helpful to have a bug about making the log message more useful next time [22:22:02] basically "did somebody leave those on http for a reason" [22:22:06] (03Merged) 10jenkins-bot: Increased wgParsoidCacheUpdateTitlesPerJob to 12 to lower the backlog [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154205 (owner: 10Aaron Schulz) [22:22:07] back in the days [22:22:24] yeah it seems like legacy rather than intentional [22:22:31] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [22:22:34] ori: the message is ok [22:22:37] so can you override a -1 mutante? [22:22:46] i can tell what failed and can try myself the same command [22:23:02] !log aaron Synchronized wmf-config/CommonSettings.php: Increased wgParsoidCacheUpdateTitlesPerJob to 12 to lower the backlog (duration: 00m 07s) [22:23:03] nod [22:23:08] Logged the message, Master [22:23:39] well joe's out for a week or so, so I doubt we'll hold this up on waiting for his -1. I suspect he just hadn't had time to come back to that patch since the additional comments [22:25:03] i think i remember when scap proceeded anyway despite such error [22:25:28] If it was a fatal, no... if it was a warning, yep [22:26:37] !log aaron Synchronized php-1.24wmf16/includes/DefaultSettings.php: 67bf481ce1644ff194d7565107d9b8ffe11bf4b7 (duration: 00m 07s) [22:26:43] Logged the message, Master [22:28:31] mutante: Thehelpfulone: I did some git blame searching on that pipermail http-redirect, it looks like it predates everything, and you brought it in during initial puppetization [22:28:40] so there's no real reason in repo-history for why it's there [22:28:47] that goes back to 2011 or so [22:30:12] waiting for jenkins [22:32:35] greg-g: do you need echo deployed? [22:32:40] per gerrit [22:32:57] aude: legoktm'll do it [22:33:07] ok [22:33:10] during swat? [22:33:15] er, I won't actually deploy it, someone else will during SWAT [22:33:19] ok [22:33:36] * aude needs to run scap which hopefully won't take long [22:36:40] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [22:38:10] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 14 Aug 2014 20:37:55 UTC [22:39:23] waiting [22:39:29] bblack: ok, so what needs to be done to get the fix merged? [22:39:45] http://en.wikipedia.org/wiki/Wikipedia:IRC [22:39:47] oops [22:39:49] sorry [22:40:04] Thehelpfulone: one of us needs to take some initiate and do it :) [22:40:20] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Thu Aug 14 22:40:16 UTC 2014 [22:41:42] bblack: like someone on RT duty? ;) [22:42:47] yes, I'll get to it in a few :P [22:42:58] jenkins is extra slow today [22:43:15] feel free to ignore it [22:43:26] . [22:43:28] ok [22:43:28] the change isn't covered by it anyway [22:43:32] exactly [22:43:36] it passed already [22:43:36] and the rest is unchanged [22:44:08] there [22:45:13] (03Abandoned) 10Dzahn: Revert "openstack - use apache::conf for port" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153993 (owner: 10Dzahn) [22:45:35] (03Abandoned) 10Dzahn: Revert "openstack Apache conf, also listen on port 80" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153992 (owner: 10Dzahn) [22:48:03] (03CR) 10Dzahn: [C: 031] "just take it like this for now without making puppet install it? we might find a solution to automate it later and until then it could be " [operations/puppet] - 10https://gerrit.wikimedia.org/r/144839 (owner: 10Dzahn) [22:49:08] (03CR) 10Dzahn: [C: 04-1] "needs some investigation/clarification re: revocation process if ever needed" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148289 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [22:50:16] (03CR) 10Dzahn: "quote: "So the difficult problem to solve before merging this is that all bastion hosts have roles that have ferm::rules that allow all of" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96424 (owner: 10Dzahn) [22:52:02] (03CR) 10Dzahn: "is this really not used anymore?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153226 (owner: 10Dzahn) [22:52:40] (03CR) 10Dzahn: [C: 031] "i'm pretty sure we'll never want to use that USB device to send SMS again" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153227 (owner: 10Dzahn) [22:53:49] (03PS12) 10BBlack: turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 (owner: 10Dzahn) [22:53:58] (03CR) 10Dzahn: "only after Change-Id: Ib6ffbf352dbfbf" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153228 (owner: 10Dzahn) [22:54:04] wait... [22:54:40] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:57:08] !log aude Started scap: Update branch for test.wikidata [22:57:20] * aude hopes it's quick [23:00:04] RoanKattouw, mwalker, ori, MaxSem: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140814T2300). Please do the needful. [23:05:45] (03PS3) 10BBlack: Make lists.wikimedia.org HTTPS only [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (https://bugzilla.wikimedia.org/68553) (owner: 10JanZerebecki) [23:06:58] (03PS4) 10BBlack: Make lists.wikimedia.org HTTPS only [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (https://bugzilla.wikimedia.org/68553) (owner: 10JanZerebecki) [23:08:04] (03PS1) 10Bartosz Dziewoński: Set $wgCategoryCollation to 'xx-uca-et' on all Estonian-language wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154213 (https://bugzilla.wikimedia.org/54168) [23:08:06] (03PS13) 10Dzahn: turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 [23:09:49] (03CR) 10BBlack: [C: 032] "I'm good with JanZerebecki's comment updates. I rebased this and also included the monitoring changem and it's merging now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (https://bugzilla.wikimedia.org/68553) (owner: 10JanZerebecki) [23:10:14] :) [23:11:08] let's see what breaks! :) [23:11:41] oh of course, lighttpd doesn't like trailing commas on lists, unlike all sane languages :p [23:11:44] if it doesnt, next he will suggest to enable StrictTransportSecurity:) [23:11:50] is SWAT waiting on the scap to finish? [23:11:51] heh, arr [23:12:32] heh thanks bblack [23:12:36] jzerebec1i: [23:12:53] legoktm: yeah [23:13:03] bblack: the bottom of mail.pp says it's not finished being puppetised - should I create a bug / RT request for that? [23:13:09] * aude wonder why https://test.wikidata.org/wiki/Special:Version is blank (but other stuff is fine) [23:13:20] do see any log entries related to that [23:13:25] do not8 [23:13:26] (03PS1) 10BBlack: fixup for lighttp being stupid about commas [operations/puppet] - 10https://gerrit.wikimedia.org/r/154214 [23:13:29] * [23:13:42] Thehelpfulone: that's probably one of 10,000 random cleanup actions around here one could create an RT ticket for [23:13:49] that doesn't mean we'll fix everything any faster :) [23:13:56] (03PS14) 10Dzahn: turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 [23:13:57] !log aude Finished scap: Update branch for test.wikidata (duration: 16m 48s) [23:14:01] Logged the message, Master [23:14:12] (03CR) 10BBlack: [C: 032 V: 032] fixup for lighttp being stupid about commas [operations/puppet] - 10https://gerrit.wikimedia.org/r/154214 (owner: 10BBlack) [23:14:51] heh [23:15:17] well RT doesn't have a working password reset feature bblack, so if I forget my password then I won't be able to create any RT tickets :P [23:15:27] https://rt.wikimedia.org/Ticket/Display.html?id=5408 [23:15:33] that's why we didn't fix that yet! [23:15:41] :D [23:16:19] (03CR) 10Aude: [C: 032] Bump cache epoch for test.wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154204 (owner: 10Aude) [23:16:24] (03Merged) 10jenkins-bot: Bump cache epoch for test.wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154204 (owner: 10Aude) [23:16:34] Thehelpfulone: you can create new tickets without a password :) [23:16:53] (03CR) 10Aude: [C: 032] Add good and featured article badges to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149928 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [23:16:54] (03CR) 10jenkins-bot: [V: 04-1] Add good and featured article badges to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149928 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [23:17:02] bah [23:17:22] the whole puppet server-restart thing still bugs me. it's shades of that same issue with varnish config reload. puppet should maintain state on failed actions like service start and keep trying them again on future runs. [23:17:53] mutante: but you can't view one in the RT interface without it ;) [23:19:50] bblack: it's doable [23:20:03] yea, unfortunately upstream issue with the extension [23:20:11] (03PS2) 10Aude: Add good and featured article badges to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149928 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [23:20:12] it depends if you are privileged user or not [23:20:16] http://requesttracker.wikia.com/wiki/PasswordReset [23:20:19] Thehelpfulone: https should work now [23:20:27] (03CR) 10Aude: [C: 032] Add good and featured article badges to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149928 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [23:20:31] (03Merged) 10jenkins-bot: Add good and featured article badges to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149928 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [23:20:41] aude: Looks like some caching issue [23:20:45] bd808|BUFFER was trying to see if we could get rid of the debian 'package' provider to stop installing recommended packages [23:20:51] probably [23:20:51] curl -H 'Host: test.wikidata.org' mw1120/wiki/Special:Version [23:20:53] looks fine [23:20:54] https://gerrit.wikimedia.org/r/#/c/71719/3/files/rt/AfterForm [23:21:48] great :) [23:21:55] (03PS1) 10BBlack: fix lighttpd reload command [operations/puppet] - 10https://gerrit.wikimedia.org/r/154216 [23:22:04] also, the lighttpd command for restarting on config change has been broken since December [23:22:14] so we just got the first application of basically any config changes since then, probably [23:22:20] aude: wtf [23:22:24] it is on HHVM [23:22:30] (03CR) 10BBlack: [C: 032 V: 032] fix lighttpd reload command [operations/puppet] - 10https://gerrit.wikimedia.org/r/154216 (owner: 10BBlack) [23:22:30] that's why it's broken, I guess [23:22:37] with a Zend server it works [23:22:42] with a HHVM one... booom [23:22:49] *boom* [23:23:07] nice, wfm, mailman archive links now enforce https [23:23:10] where is HHVM stuff logged? [23:23:47] oh, lovely [23:24:06] !log aude Synchronized wmf-config/Wikibase.php: Bump cache epoch and add badges setting on test.wikidata (duration: 00m 32s) [23:24:11] Logged the message, Master [23:24:17] can we get it to zend? [23:24:35] * aude cries [23:25:09] is there a setting to disable use of the epp redirect colum? [23:25:58] useRedirectTargetColumn [23:26:11] $wgWBRepoSettings['useRedirectTargetColumn'] = false; [23:27:14] alright, if peopel want to swat, go ahead [23:27:27] we'll have to look into our settings and such, but it's only test.wikidata [23:27:47] AaronSchulz: ori: Where can we find out about fatals/ exceptions happened on HHVM machines [23:28:00] for test(wikipedia|wikidata) [23:28:07] hoo|busy: which one is test wikidata on? [23:28:10] mw1120? [23:28:21] no, same es testwikipedia I think [23:28:24] mw1017 [23:28:46] ah [23:28:50] so, hhvm [23:29:05] they might be in /tmp/ [23:29:11] otherwise /var/log/hhvm [23:29:13] on the host itself? [23:29:14] that [23:29:15] that's if it's being logged [23:29:15] that sucks [23:29:16] yes [23:29:17] who is doing today's swat? [23:29:21] ori, bd808|BUFFER: http://git.spi-inc.org/gitweb/?p=puppet.git;a=blob;f=modules/spi-inc-org/files/apt/apt.conf.d/local-recommends;h=5442cf7dad83c6abad373b81561934f3a11dd565;hb=HEAD [23:29:25] RoanKattouw_away or ori? [23:29:27] that's waht i was told for beta, but found no logs there [23:29:32] MaxSem isn't on irc [23:29:55] what's on the menu? [23:30:02] i'm technically on vacation but could do it i guess [23:30:11] Can we please fix that before we go live with HHVM [23:30:14] TMH fix and revert of echo [23:30:15] because that sucks big time [23:30:22] +1 [23:30:24] i'll do it [23:30:29] hoo|busy: yes, we'll fix it, obviously [23:30:39] heh [23:30:53] thanks [23:31:44] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [23:32:34] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [23:33:04] aude: Can't find logs there [23:33:12] enither in /var/log/hhvm nor in /tmp/ [23:33:15] me neither [23:34:25] i'll help you look after swat [23:34:34] legoktm: roan is busy getting chay'd [23:34:42] lolol [23:35:37] heh [23:36:27] is that in wiktionary? [23:36:43] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [23:36:52] i assume that means someone is telling him a long story? [23:36:58] jeremyb: terry chay [23:37:07] right [23:37:13] wmf's fidel castro, in terms of oratorial stamina [23:37:27] hah. although idk what fidel's stamina is [23:37:44] i think fidel was TFA on nlwiki a few days ago [23:37:59] he was famous for delivering sermons about the triumph of socialism in cuba that lasted for hours [23:39:14] https://nl.wikipedia.org/wiki/Sjabloon:Uitgelicht_13_augustus [23:40:11] what is the issue? [23:46:05] Romaine, very long, sustained, oratory [23:46:48] which is apparently a trait shared with Fidel [23:47:14] bblack: I don't know what time you're duty finishes, but could you take a look at https://bugzilla.wikimedia.org/show_bug.cgi?id=44731 and https://rt.wikimedia.org/Ticket/Display.html?id=6981? mail.wikiPedia.org is a legacy issue AFAIK from 2004 ish. [23:47:18] your* [23:48:49] * jeremyb can take that Thehelpfulone [23:48:52] i wish we could remove all service names in the wikipedia.org domain [23:49:02] don't know why i didn't do it already... [23:49:08] and only have actual encyclopedia language versions in it [23:49:16] hah [23:49:19] ten! [23:49:35] :p sep11 [23:51:06] (03CR) 10JanZerebecki: [C: 031] "Dependency is done." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145500 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [23:51:29] hah, 16:13 < mutante> if it doesnt, next he will suggest to enable StrictTransportSecurity:) [23:52:00] this is my first time using the new puppet apache conf [23:52:12] jeremyb: cool, thanks [23:52:26] jeremyb: :) it will also need a DNS change if you want to move mail.wikipedia [23:52:32] I'm going through some of the mailing list bugs in Bugzilla that often get left [23:52:38] you know it wants you to generate the .conf from the .dat ? [23:52:49] https://bugzilla.wikimedia.org/show_bug.cgi?id=66318 and https://bugzilla.wikimedia.org/show_bug.cgi?id=46021 are both about globally blacklisting spam domains - is that something you could do too jeremyb? [23:53:39] the list admin can do that themselves, i think if we touch global filters we end up with tons of tickets for that [23:54:01] we _do_ add spam scores though [23:54:07] so that the admins can filter based on it [23:54:28] Thehelpfulone, yeah, why haven't those admins added their own filters? [23:54:50] I think they want to do it globally to make it easier for people [23:55:04] and if they've got it on a number of mailing lists of their own, then they're thinking others must be getting it to [23:55:20] https://wikitech.wikimedia.org/wiki/Mailman#Fighting_spam_in_mailman [23:55:24] PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2047: active_shards: 6140: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 1 [23:55:24] PROBLEM - ElasticSearch health check on elastic1017 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2047: active_shards: 6140: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 1 [23:55:24] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2047: active_shards: 6140: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 1 [23:55:24] PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2047: active_shards: 6140: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 1 [23:55:24] PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2047: active_shards: 6140: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 1 [23:55:24] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2047: active_shards: 6140: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 1 [23:55:39] easier for people = other people do the same thing :) [23:55:47] we should avoid centralization [23:56:01] thing is most admins probably don't know how to do it - or may not even be aware that it exists [23:56:04] it will lead to discussion about the common "good" config that works for all lists [23:56:24] RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2048: active_shards: 6143: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [23:56:24] RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2048: active_shards: 6143: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [23:56:24] RECOVERY - ElasticSearch health check on elastic1017 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2048: active_shards: 6143: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [23:56:24] RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2048: active_shards: 6143: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [23:56:24] RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2048: active_shards: 6143: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [23:56:24] RECOVERY - ElasticSearch health check on elastic1002 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 17: number_of_data_nodes: 17: active_primary_shards: 2048: active_shards: 6143: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [23:57:07] mutante: maybe that idea we had a while back about a mailing list for list admins that we (ops or mailman people) could email when we want to update them with things, i.e. letting them know about spam filtering and how to set it up? [23:58:02] Thehelpfulone: yes, i think a meta list of list-admins is still a good idea [23:58:18] you should be admin :) hehe [23:59:16] it's on wiki though how to use spam filter option