[00:00:18] it looks to be running properly this time, no big stream of errors [00:00:49] this [00:00:54] Sep 8 23:17:46 mw1070 rsyncd[25375]: bind() failed: Address already in use (address-family 2) [00:01:02] Sep 8 23:17:46 mw1070 rsyncd[25375]: unable to bind any inbound sockets on port 873 [00:01:05] Sep 8 23:17:46 mw1070 rsyncd[25375]: rsync error: error in socket IO (code 10) at socket.c(555) [Receiver=3.0.9] [00:01:44] no, red herring and that just happens on every start? [00:02:42] (03CR) 10Krinkle: "Applied locally on integration-puppetmaster and ran puppet on it and on integration-slave1006-trusty. Works as expected." [puppet] - 10https://gerrit.wikimedia.org/r/159226 (owner: 10Krinkle) [00:03:03] i see how puppet applied changes to mw-deployment-vars.sh before that [00:12:06] mutante: dunno if your still looking, if it helps at all i copied the output of the failed scap run to tin.eqiad.wmnet:/home/ebernhardson/scap-2014-09-08-16:15.log [00:12:15] * ebernhardson doesn't have rights to even look at log files on mw1070 :P [00:12:57] the current scap is running the final scap-rebuild-cdbs step now with no failures reported [00:15:01] RECOVERY - puppet last run on cp1040 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:19:34] !log ebernhardson Finished scap: Repeat SWAT scap deployment due to possible sync-common failure (duration: 38m 50s) [00:19:39] Logged the message, Master [00:19:52] legoktm: scap finally worked, please check thanks extension [00:20:04] * legoktm does [00:20:32] ebernhardson: yea, dont really have more than "it still worked at 23:02", then it worked again after it was restarted [00:20:36] > Quiddity was notified that you liked their edit. [00:20:39] wot [00:20:42] woot [00:20:44] ebernhardson: thanks! [00:21:05] mutante: ok, looks like its safe to re-enable mw1070 and call it good enough [00:21:09] full scap worked [00:22:03] !log re-enabled mw1070 in pybal [00:22:08] Logged the message, Master [00:22:12] ebernhardson: ok, done [00:22:36] mutante: thanks! [00:35:17] legoktm: http://people.wikimedia.org/~legoktm/ ?:) [00:35:24] :D [00:35:32] I need to move over my things from fenari still [00:35:40] legoktm: i wanted to ask you exactly that, [00:36:00] this is great, because I've been having to move files from terbium to fenari [00:36:06] :) [00:36:18] good, so i see you have terbium access [00:51:33] greg-g: SWAT patch https://gerrit.wikimedia.org/r/159089 caused an issue, mind if we send another patch to fix it? https://gerrit.wikimedia.org/r/#/c/159234/ [00:51:56] he says do it [00:52:03] he's sitting next to me so i pinged him irl [00:52:31] ok [00:55:20] yeppers [01:02:14] !log ebernhardson Synchronized php-1.24wmf20/extensions/Flow/includes/Content/BoardContentHandler.php: Sync BoardContentHandler.php for Flow in 1.24wmf20 (duration: 00m 04s) [01:02:18] Logged the message, Master [01:12:52] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [01:37:05] (03PS2) 10Dzahn: ishmael behind varnish, make neon a backend [puppet] - 10https://gerrit.wikimedia.org/r/154969 [01:39:19] (03CR) 10Dzahn: [C: 032] "since ishmael currently needs fixing anyways there isn't that much that can go wrong here" [puppet] - 10https://gerrit.wikimedia.org/r/154969 (owner: 10Dzahn) [02:01:12] (03PS2) 10Dzahn: switch ishmael over to misc-web-lb.eqiad [dns] - 10https://gerrit.wikimedia.org/r/154970 [02:01:36] (03CR) 10Dzahn: [C: 032] switch ishmael over to misc-web-lb.eqiad [dns] - 10https://gerrit.wikimedia.org/r/154970 (owner: 10Dzahn) [02:05:42] PROBLEM - puppet last run on db73 is CRITICAL: CRITICAL: Puppet has 1 failures [02:08:42] RECOVERY - puppet last run on db73 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [02:10:41] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3261 MB (3% inode=99%): [02:18:21] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: Puppet has 1 failures [02:18:52] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 4190 MB (3% inode=99%): [02:19:11] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures [02:23:24] (03PS1) 10Dzahn: ishmael - dont require SSL behind varnish [puppet] - 10https://gerrit.wikimedia.org/r/159245 [02:24:05] (03CR) 10Dzahn: [C: 032] "ishmael doesnt have data currently, but make the login work :)" [puppet] - 10https://gerrit.wikimedia.org/r/159245 (owner: 10Dzahn) [02:32:11] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:36:04] (03PS1) 10MaxSem: jgonera is not working for us anymore [puppet] - 10https://gerrit.wikimedia.org/r/159250 [02:36:18] please merge ^^^ :) [02:36:31] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [02:37:21] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [02:38:11] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [02:38:38] !log LocalisationUpdate completed (1.24wmf15) at 2014-09-09 02:38:38+00:00 [02:38:44] Logged the message, Master [02:41:45] mutante: [02:41:49] grr [02:42:05] getting used to new keyboard [02:42:36] why no data in ishmael? [02:43:03] (reading grrrit in scrollback) [03:00:12] RECOVERY - Disk space on virt0 is OK: DISK OK [03:09:35] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 1 failures [03:11:27] !log LocalisationUpdate completed (1.24wmf19) at 2014-09-09 03:11:27+00:00 [03:11:32] Logged the message, Master [03:14:01] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [03:28:42] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [03:44:07] !log LocalisationUpdate completed (1.24wmf20) at 2014-09-09 03:44:07+00:00 [03:44:13] Logged the message, Master [04:56:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Sep 9 04:56:25 UTC 2014 (duration 56m 24s) [04:56:35] Logged the message, Master [05:06:24] (03PS6) 10KartikMistry: WIP: Update cxserver (beta) config [puppet] - 10https://gerrit.wikimedia.org/r/157787 [05:15:01] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [06:28:31] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:32] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:41] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:42] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:51] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:01] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:21] PROBLEM - puppet last run on db1047 is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:21] RECOVERY - Disk space on ms1004 is OK: DISK OK [06:44:21] PROBLEM - Disk space on ms1004 is CRITICAL: DISK CRITICAL - free space: / 5 MB (0% inode=94%): /var/lib/ureadahead/debugfs 5 MB (0% inode=94%): [06:45:41] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:45:51] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:11] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:46:51] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:47:05] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:53:35] RECOVERY - puppet last run on db1047 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:57:24] PROBLEM - puppet last run on db60 is CRITICAL: CRITICAL: Puppet has 3 failures [07:14:34] RECOVERY - puppet last run on db60 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [07:15:54] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [07:29:15] (03CR) 10Filippo Giunchedi: Clean up salt::minion (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/153727 (owner: 10Ori.livneh) [07:31:19] (03PS2) 10Giuseppe Lavagetto: HAT: turn off mod_php [puppet] - 10https://gerrit.wikimedia.org/r/159037 [07:33:49] (03CR) 10Giuseppe Lavagetto: [C: 032] HAT: turn off mod_php [puppet] - 10https://gerrit.wikimedia.org/r/159037 (owner: 10Giuseppe Lavagetto) [07:36:34] <_joe_> !log disabling puppet, releasing a potentially harmful apache change [07:36:39] Logged the message, Master [07:36:56] <_joe_> !log that was on appservers [07:37:01] Logged the message, Master [07:46:38] (03PS1) 10Giuseppe Lavagetto: mediawiki: fix wwwportals [puppet] - 10https://gerrit.wikimedia.org/r/159258 [07:47:15] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: fix wwwportals [puppet] - 10https://gerrit.wikimedia.org/r/159258 (owner: 10Giuseppe Lavagetto) [07:54:45] * _joe_ kicks himself in the butt [07:54:51] (03PS1) 10Giuseppe Lavagetto: mediawiki: fix another syntax error [puppet] - 10https://gerrit.wikimedia.org/r/159260 [07:55:06] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: fix another syntax error [puppet] - 10https://gerrit.wikimedia.org/r/159260 (owner: 10Giuseppe Lavagetto) [08:06:56] <_joe_> !log stopping apache on mw1018 for inspection [08:07:01] Logged the message, Master [08:08:24] <_joe_> !log restarted apache2 on mw1018 [08:08:29] Logged the message, Master [08:10:44] <_joe_> !log re-enabling puppet on appservers and imagescalers, change is good [08:10:49] Logged the message, Master [08:11:06] (03PS1) 10Filippo Giunchedi: elasticsearch: better shard check output [puppet] - 10https://gerrit.wikimedia.org/r/159261 [08:43:56] random question, what's the failure model when allocating machines in racks/rows? e.g. for power and network [08:55:16] !log launched "iptables" on tin to check current rules and it loaded iptables modules, logging for future reference [08:55:21] Logged the message, Master [09:02:44] godog: rows, mostly [09:02:55] but individual racks can fail as well [09:04:08] (03PS1) 10Giuseppe Lavagetto: mediawiki: use pidfile from the env vars [puppet] - 10https://gerrit.wikimedia.org/r/159264 [09:05:21] <_joe_> paravoid: hi! [09:06:02] <_joe_> godog: ^^ can you take a look? [09:08:11] _joe_: sure [09:08:38] paravoid: ack, thanks! FWIW I was asking for RT #8295 [09:13:27] (03CR) 10Filippo Giunchedi: [C: 031] "good catch!" [puppet] - 10https://gerrit.wikimedia.org/r/159264 (owner: 10Giuseppe Lavagetto) [09:14:23] paravoid: time to take a quick look into https://gerrit.wikimedia.org/r/#/c/155753/ ? [09:15:46] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: use pidfile from the env vars [puppet] - 10https://gerrit.wikimedia.org/r/159264 (owner: 10Giuseppe Lavagetto) [09:17:05] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [09:19:15] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 1 failures [09:23:15] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Puppet has 1 failures [09:23:23] <_joe_> mmmh seems trusty appserver are not happy with our last change? [09:23:45] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: Puppet has 1 failures [09:24:15] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 1 failures [09:24:56] <_joe_> !log disabling puppet on appservers [09:25:00] Logged the message, Master [09:26:15] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Puppet has 1 failures [09:26:42] <_joe_> damn, this will give us some headaches I guess [09:27:36] <_joe_> this is a bogus puppet alarm [09:58:38] (03PS1) 10Filippo Giunchedi: releases: allow uploads from tin [puppet] - 10https://gerrit.wikimedia.org/r/159267 [10:02:58] <_joe_> !log restarting manually apache on mw1178,mw1192,mw1163,mw1130,mw1018 as they started with the wrong pidfile before my fix [10:03:03] Logged the message, Master [10:04:55] <_joe_> !log also re-enabling puppet [10:05:00] Logged the message, Master [10:05:14] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: Puppet has 1 failures [10:05:45] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 1 failures [10:05:45] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 1 failures [10:10:44] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:12:44] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:15:44] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:23:35] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [10:46:57] (03PS1) 10Alexandros Kosiaris: Ensure amanda schedules absent [puppet] - 10https://gerrit.wikimedia.org/r/159277 [10:46:59] (03PS1) 10Alexandros Kosiaris: Ensure amanda configuration absent [puppet] - 10https://gerrit.wikimedia.org/r/159278 [10:47:01] (03PS1) 10Alexandros Kosiaris: Ensure backup::client configs absent [puppet] - 10https://gerrit.wikimedia.org/r/159279 [10:47:03] (03PS1) 10Alexandros Kosiaris: Purge all backup::client related packages/confs [puppet] - 10https://gerrit.wikimedia.org/r/159280 [10:47:05] (03PS1) 10Alexandros Kosiaris: Remove all already absent backup::server crons/schedules [puppet] - 10https://gerrit.wikimedia.org/r/159281 [10:47:07] (03PS1) 10Alexandros Kosiaris: Remove the backup::client class [puppet] - 10https://gerrit.wikimedia.org/r/159282 [10:47:09] (03PS1) 10Alexandros Kosiaris: Purge the amanda-server packages/configurations [puppet] - 10https://gerrit.wikimedia.org/r/159283 [10:47:11] (03PS1) 10Alexandros Kosiaris: Remove the now defunct backup::server class [puppet] - 10https://gerrit.wikimedia.org/r/159284 [10:48:28] (03CR) 10jenkins-bot: [V: 04-1] Purge the amanda-server packages/configurations [puppet] - 10https://gerrit.wikimedia.org/r/159283 (owner: 10Alexandros Kosiaris) [10:51:24] (03PS2) 10Alexandros Kosiaris: Purge the amanda-server packages/configurations [puppet] - 10https://gerrit.wikimedia.org/r/159283 [10:51:26] (03PS2) 10Alexandros Kosiaris: Remove the now defunct backup::server class [puppet] - 10https://gerrit.wikimedia.org/r/159284 [10:53:15] (03CR) 10Alexandros Kosiaris: [C: 032] Ensure amanda schedules absent [puppet] - 10https://gerrit.wikimedia.org/r/159277 (owner: 10Alexandros Kosiaris) [10:53:34] and let's start killing amanda :-) [10:53:56] <_joe_> eheh [11:00:57] poor amanda [11:03:20] at least a. is not Italian and we can avoid a UK news drama [11:14:00] (03PS2) 10Alexandros Kosiaris: Remove the last resources of snmp on hosts [puppet] - 10https://gerrit.wikimedia.org/r/143306 [11:14:02] (03PS2) 10Alexandros Kosiaris: Remove the snmptt user [puppet] - 10https://gerrit.wikimedia.org/r/143305 [11:14:04] (03PS1) 10Alexandros Kosiaris: Remove the puppet freshness check [puppet] - 10https://gerrit.wikimedia.org/r/159285 [11:14:06] (03PS1) 10Alexandros Kosiaris: Removal of all snmptrap functionality [puppet] - 10https://gerrit.wikimedia.org/r/159286 [11:16:50] anyone objecting to me removing the puppet freshness check today ? [11:17:49] is it just me, or is git.wikimedia.org down? [11:17:56] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [11:21:32] (03CR) 10Alexandros Kosiaris: [C: 032] Remove the puppet freshness check [puppet] - 10https://gerrit.wikimedia.org/r/159285 (owner: 10Alexandros Kosiaris) [11:23:16] akosiaris: +1 [11:25:08] <_joe_> akosiaris: \o/ [11:26:10] !log git.wikimedia.org is down: Error: 503, Service Unavailable [11:26:15] Logged the message, Master [11:29:03] <_joe_> MatmaRex: it works now, the SAL is not used usually if no action is needed [11:29:39] <_joe_> !log git.wikimedia.org works now, no action needed [11:29:45] Logged the message, Master [11:30:07] it was broken for over half an hour already when i commented here. [11:30:37] i got no reply when i asked and it wasn't fixing itself, so i logged it at least [11:50:39] <_joe_> MatmaRex: eh sorry I'm on a pause [11:50:48] <_joe_> it's strange icinga didn't alarm [11:55:14] akosiaris: regarding https://gerrit.wikimedia.org/r/#/c/158086/ the comment on line 28 is valid ? [11:56:18] matanya: no. subserviceip is declared in the sort.each.do loop on line 21 [11:56:27] so a @ would wrong [11:56:45] that is what i thought, so this can be merged ... :) [11:56:56] (03CR) 10Alexandros Kosiaris: pybal: qualify vars (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/158086 (owner: 10Matanya) [12:02:50] (03PS1) 10Gage: New upstream version: 1.5.3 [debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/159288 [12:03:47] gbp-pq is pretty cool. [12:04:34] yes it is [12:08:10] (03CR) 10Alexandros Kosiaris: [C: 04-1] New upstream version: 1.5.3 (033 comments) [debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/159288 (owner: 10Gage) [12:09:31] oops, thanks. i struggled getting gbp-dch to do what i wanted. [12:11:00] (03PS2) 10Gage: New upstream version: 1.5.3 [debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/159288 [12:11:31] arr whitespace [12:12:14] (03PS3) 10Gage: New upstream version: 1.5.3 [debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/159288 [12:13:34] (03Abandoned) 10Manybubbles: Collect elasticsearch metrics less frequently [puppet] - 10https://gerrit.wikimedia.org/r/158639 (owner: 10Manybubbles) [12:25:04] (03CR) 10Gage: [C: 032 V: 032] New upstream version: 1.5.3 [debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/159288 (owner: 10Gage) [12:40:30] (03CR) 10Alexandros Kosiaris: [C: 032] Ensure amanda configuration absent [puppet] - 10https://gerrit.wikimedia.org/r/159278 (owner: 10Alexandros Kosiaris) [12:45:33] (03PS1) 10Alexandros Kosiaris: Directory removal needs force => true [puppet] - 10https://gerrit.wikimedia.org/r/159291 [12:46:25] PROBLEM - Host ps1-d2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [12:46:25] PROBLEM - Host ps1-d1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [12:46:25] PROBLEM - Host ps1-c2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [12:46:33] no worries that's me [12:46:34] PROBLEM - Host ps1-d3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [12:46:40] ah.. ok [12:46:44] PROBLEM - Host ps1-c3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [12:46:44] PROBLEM - Host ps1-c1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [12:47:00] (03CR) 10Alexandros Kosiaris: [C: 032] Directory removal needs force => true [puppet] - 10https://gerrit.wikimedia.org/r/159291 (owner: 10Alexandros Kosiaris) [12:47:25] RECOVERY - Host ps1-d2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 35.89 ms [12:47:25] RECOVERY - Host ps1-c3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 33.31 ms [12:47:25] RECOVERY - Host ps1-d3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 37.56 ms [12:47:25] RECOVERY - Host ps1-c1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 37.00 ms [12:47:25] RECOVERY - Host ps1-d1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 34.99 ms [12:47:34] RECOVERY - Host ps1-c2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 34.70 ms [12:49:51] (03PS2) 10Alexandros Kosiaris: Ensure backup::client configs absent [puppet] - 10https://gerrit.wikimedia.org/r/159279 [12:52:23] (03CR) 10Alexandros Kosiaris: [C: 032] Ensure backup::client configs absent [puppet] - 10https://gerrit.wikimedia.org/r/159279 (owner: 10Alexandros Kosiaris) [13:07:22] (03PS6) 10BBlack: Move all geoip-based resolution to DYNA [dns] - 10https://gerrit.wikimedia.org/r/158382 [13:18:13] (03PS1) 10Giuseppe Lavagetto: varnish: add comment to avoid future pitfalls [puppet] - 10https://gerrit.wikimedia.org/r/159294 [13:18:15] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [13:22:24] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "While this patch could work with some additional work, I extensively tested the apache performance and using hard-coded rewrites with no i" [puppet] - 10https://gerrit.wikimedia.org/r/156303 (owner: 10Giuseppe Lavagetto) [13:24:19] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce servermon.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/159041 (owner: 10Alexandros Kosiaris) [13:25:24] (03PS10) 10Alexandros Kosiaris: module/role class for servermon [puppet] - 10https://gerrit.wikimedia.org/r/153412 [13:28:12] (03CR) 10Ottomata: [C: 032] elasticsearch: better shard check output [puppet] - 10https://gerrit.wikimedia.org/r/159261 (owner: 10Filippo Giunchedi) [13:30:26] (03CR) 10Ottomata: Adding gzip compression for several file types (031 comment) [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/159181 (owner: 10Nuria) [13:35:27] hi hoo ! time to take one more look at https://gerrit.wikimedia.org/r/#/c/155753/ [13:42:25] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [13:42:35] PROBLEM - HTTP 5xx req/min on labmon1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [13:43:36] tonythomas: I can have a look [13:44:15] hoo: k. improved the docs. hope its worth going in [13:45:03] (03CR) 10Alexandros Kosiaris: [C: 032] module/role class for servermon [puppet] - 10https://gerrit.wikimedia.org/r/153412 (owner: 10Alexandros Kosiaris) [13:50:49] tonythomas: Ok, that looks good now... I still can't sign of on the functionality, though [13:50:58] you will need someone to actually test that [13:52:09] hoo: k. thanks. I would need root access to test with exim configs right [13:52:34] in labs, I might be able to -> but again, the URLs would be different [13:52:57] mh... you will need someone from ops to sign of on this anyway [13:53:02] so someone else has to test this [13:53:16] and most important part is to verify that it wont disrupt production [13:53:26] fiddling out beta then is less important [13:53:31] hoo: I will wait for mark or Jeff_Green then [13:53:36] true. [13:53:43] whut [13:54:36] Jeff_Green: great. youre around. we were talking of testing and signing https://gerrit.wikimedia.org/r/#/c/155753/ [13:56:07] ok [13:56:10] (03PS2) 10Filippo Giunchedi: elasticsearch: better shard check output [puppet] - 10https://gerrit.wikimedia.org/r/159261 [13:56:17] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] elasticsearch: better shard check output [puppet] - 10https://gerrit.wikimedia.org/r/159261 (owner: 10Filippo Giunchedi) [13:57:09] (03PS1) 10BBlack: Add ns1 (baham) IPs to DNS [dns] - 10https://gerrit.wikimedia.org/r/159297 [13:57:35] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [13:57:47] RECOVERY - HTTP 5xx req/min on labmon1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:57:57] hoo & tonythomas what testing have you done so far? [13:58:14] (03PS1) 10BBlack: Add baham (future ns1) config [puppet] - 10https://gerrit.wikimedia.org/r/159298 [13:58:16] I? None, I'm just helping with curl and general puppet advie [13:58:51] (03CR) 10BBlack: [C: 032] Add ns1 (baham) IPs to DNS [dns] - 10https://gerrit.wikimedia.org/r/159297 (owner: 10BBlack) [14:02:25] Jeff_Green: I have executed the same curl command in beta terminal, and I got the right output. let me try the same now [14:02:36] ok [14:03:02] have you been able to test the puppet code itself, in terms of generating the expected config files? [14:04:46] and does that config file really do what you want it to do? [14:05:03] (especially as it shouldn't affect production) [14:05:25] I couldnt test the puppet code by itself, but mark's frequent reviews have been helping [14:05:54] ok, I can look at that too [14:06:17] we need to also go through all the possible failure modes and make sure we don't create an exim deathspiral [14:06:24] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 3 failures [14:06:31] hoo. yeah. currently, in production, we will not have any emails getting into the bouncehandler router anyway, as we have got a strict regex for VERP - VERP_BOUNCE_LOCALPART_REGEXP = ^wiki-\w+-\w+-\w+-\w+$ [14:06:39] i.e., what happens if the webserver doesn't respond, or responds with something other than a 200 [14:06:45] tonythomas: Ok, that's fine, then [14:07:01] Jeff_Green: curl will have a non 0 exit, I guess [14:07:09] and all current bounce emails, which come to wikimedia.org ( as return-path now is wiki@wikimedia.org ) gets eaten by the eat router ! [14:07:30] oh, not true [14:07:55] it doesn't queue and retry [14:08:11] (03CR) 10BBlack: [C: 032] "Checked on catalog compiler for the 3x current nameservers, only change is the new hostname in the NAMESERVERS list for authdns-update" [puppet] - 10https://gerrit.wikimedia.org/r/159298 (owner: 10BBlack) [14:08:20] Jeff_Green: oh. that too can happen ! [14:08:26] which does happen? [14:09:13] a failed POST. ie what you just said [14:10:12] we hadn't discussed that part yet! [14:10:29] the bounce get frozen somewhere ? [14:10:33] ok, that's a good thing to figure out [14:11:31] mh... use curl -f ? [14:11:48] * hoo has no idea how stuff there acts, but that might be starting point to figure stuff [14:12:02] we don't want to double-bounce [14:12:06] # No frozen messages please [14:12:06] 20 ignore_bounce_errors_after = 0h [14:12:08] queuing is most conservative [14:12:24] something in https://github.com/wikimedia/operationspuppet/blob/e7bdd66d3ec256206f73263a7793766b3a7ab4ba/templates/mail/exim4.minimal.erb#L19 [14:12:25] eating might be ok too, we just lose some bounce data [14:14:55] PROBLEM - DPKG on baham is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:14:55] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Puppet has 1 failures [14:15:46] Jeff_Green: I can mimic a POST to wrong server in my local install though. will update the exim logs [14:15:56] ok [14:18:50] is the module already enabled on the API? [14:20:06] Jeff_Green: the receiver module ? [14:20:30] yeah, so we don't 404 on an incoming message? [14:21:29] Jeff_Green: we just have $wgBounceHandlerUnconfirmUsers = false; [14:21:43] and the API is all ready and registered in the beta wiki [14:22:09] I need to confirm my email-id in local install ( got unsubscribed due to testing :P ) [14:22:33] can you give me a curl command to test from the mailservers? [14:22:47] ok. in a min [14:23:29] how about curl -H 'Host:deployment.wikimedia.beta.wmflabs.org' http://deployment.wikimedia.beta.wmflabs.org/w/api.php?action="bouncehandler"&email@- [14:24:24] the email@- will show command not found, as it copies the incoming bounce, I think [14:26:33] just found something [14:26:33] (03PS1) 10BBlack: fix missing dot on new ns1 revdns [dns] - 10https://gerrit.wikimedia.org/r/159302 [14:26:35] (03PS1) 10BBlack: Fix GWT text records having explicit zone names [dns] - 10https://gerrit.wikimedia.org/r/159303 [14:26:40] and I got the error message too ! [14:26:41] https://gerrit.wikimedia.org/r/#/c/155753/31/manifests/role/mail.pp,unified [14:26:45] $verp_post_connect_server = 'appservers.svc."${::mw_primary}".wmnet' [14:26:46] it gets Frozen :\ [14:27:00] I think that should be $verp_post_connect_server = "appservers.svc.${::mw_primary}.wmnet" [14:27:26] Jeff_Green: https://dpaste.de/AUix [14:27:41] I have posted what happens when correct and incorrect POST URL [14:27:51] ok cool [14:27:53] Jeff_Green: I will correct that in ~30 mins, dinner time [14:27:56] will brb [14:27:57] k [14:28:32] (03CR) 10BBlack: [C: 032] fix missing dot on new ns1 revdns [dns] - 10https://gerrit.wikimedia.org/r/159302 (owner: 10BBlack) [14:29:29] (03CR) 10BBlack: [C: 032] Fix GWT text records having explicit zone names [dns] - 10https://gerrit.wikimedia.org/r/159303 (owner: 10BBlack) [14:36:31] ...where are the other SWATters. [14:36:38] wtf. traitors. [14:37:23] bblack: You have a SWAT patch that says you'll deploy it, do you want to do aude's patch as well, or is that my one patch? [14:37:54] I do mean "do you want to do" and not "please god do" [14:37:57] did i list it wrong? [14:38:01] marktraceur: aude's is all yours [14:38:01] No no [14:38:03] Sweet [14:38:08] * marktraceur looks at it [14:38:08] ok [14:38:18] aude: bblack confused me, not you. :) [14:38:22] I'm just pushing a relatively-scary DNS change and wanted it to be in a known window with people aware of it :) [14:38:33] 'kay. [14:38:39] Should I delay until you're done, then? [14:38:41] I'll wait till after you do your part [14:40:42] KK [14:40:52] Ooh, a core patch, easy peasy [14:45:06] RECOVERY - DPKG on baham is OK: All packages OK [14:46:04] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [14:51:34] marktraceur: Because you removed everyone except you when you did Tuesday last week, when someone copied things for the new week they didn't think to re-insert everyone. [14:52:12] Oh. [14:52:23] Well, I suppose it's my own fault, I'll take it anyway [14:58:49] * James_F is ready. [14:59:05] James_F: Did you just add patches to my deploy *two minutes* in advance [14:59:19] …no? [14:59:24] Oh, kay. [14:59:35] James_F: Smile and wave... [14:59:41] Then...I don't see any patches from you [14:59:48] * James_F grins and waves instead. [14:59:53] hmm. [15:00:05] marktraceur, bblack: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140909T1500). Please do the needful. [15:00:13] Are you going to add patches to my deploy *zero minutes* in advance now [15:00:26] I'll do aude's at least [15:00:39] :) [15:00:42] ready to verify [15:00:52] aude: What are we deploying? :P [15:01:02] * aude broke old changes list [15:01:14] marktraceur: just one [15:01:15] +2ed, waiting for Jenkins [15:01:16]
  • is missing a css class if there is a tag [15:02:00] it's probably important to some people [15:02:14] marktraceur: https://gerrit.wikimedia.org/r/#/c/159244/ [15:02:27] James_F: Iz on [[Deployments]]? [15:02:30] Not sure why it's no longer in the list. [15:02:36] Werd. [15:02:39] Did someone remove it? [15:02:49] Add it, I use that list for tracking the deploy [15:02:52] * aude did not! [15:02:59] Kk. [15:03:43] Huzzah, merged, going [15:04:29] (03PS1) 10Chad: CirrusSearch: primary backend for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159307 [15:05:05] marktraceur: added. [15:05:58] Ta [15:06:00] !log marktraceur Synchronized php-1.24wmf20/includes/changes/OldChangesList.php: [SWAT] Fix undefined argument (css classes) in OldChangesList. (duration: 00m 07s) [15:06:08] * aude verifies [15:06:11] aude: That should be enough to test, yeah [15:06:23] looks perfect [15:06:24] thanks! [15:07:17] Syncing test file too [15:07:18] !log marktraceur Synchronized php-1.24wmf20/tests/phpunit/includes/changes/OldChangesListTest.php: [SWAT] Fix undefined argument (css classes) in OldChangesList. (duration: 00m 07s) [15:07:22] James_F is next! [15:07:25] ok [15:07:30] Ta. [15:08:40] Logged the message, Master [15:09:02] Those are some iconic elements [15:09:25] Oh goody, I get to make an extension update patch too [15:09:29] It's my lucky day [15:09:43] Sadly it'll mean a delay, James_F - not sure how up-to-date my deploy checkout of mediawiki is [15:11:13] marktraceur: tin generally has an up-to-date checkout. ;-) [15:11:38] you want to create commits on tin [15:11:43] not sure that's the way to go [15:11:54] Yeaah, I thought I'd do it locally. [15:11:58] :-) [15:12:12] Yeah, marktraceur's too goody-goody to do it the traditional way. [15:12:15] James_F: To easy to upload security stuff or somthing in an accident that way [15:12:45] hoo: Indeed, though I know a number of people who do it that way to avoid over-writing security things. [15:12:49] Did James_F just call me goody-goody [15:12:55] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [15:12:55] Or am I having a stroke [15:13:15] marktraceur: False dichotomy. [15:14:34] Maybe [15:17:11] Why do we have so many bloody extensions [15:17:15] * marktraceur takes a nap [15:17:20] * bd808 got too comfortable with making patches on tin by doing new MW branch deploys [15:17:57] I'm not sure that it is impossible to prep a new wmf branch deploy off of tin but it would be tricky [15:18:01] bd808: But only for new branches nad config, I guess (hope) [15:18:11] hoo: Yeah [15:18:32] I have avoided the swat things by conscious choice [15:19:09] I don't even make config patches on tin, it just seems wrong [15:19:17] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [15:19:50] <^d> You can easily make the commits from tin and push to gerrit. [15:19:53] <^d> #doitallthetime [15:19:53] bd808: :-) [15:20:02] I'm not even sure how you would avoid uploading local commits in branches when doing stuff from tin [15:20:11] hoo: Carefully! [15:20:13] (03CR) 10GWicke: [C: 031] releases: allow uploads from tin [puppet] - 10https://gerrit.wikimedia.org/r/159267 (owner: 10Filippo Giunchedi) [15:20:16] (without uggly cherry picking and stuff) [15:20:41] James_F: I'm sure I'm supposed to be wagging my finger at you saying "Bad James_F, no, submit submodule update patches" [15:20:44] But whatever [15:20:49] Put your stuff on top. upload to gerrit. reset the checkout on tin. pull from gerrit. [15:21:02] marktraceur: Yeah, well, I was expecting to get to the office with time enough to do this for you. [15:21:16] marktraceur: Also, tsk, SWATers should have a fresh core checkout all the time. [15:21:20] :) [15:21:25] bd808: Sounds lovely... :P [15:21:31] I'll see about adding a cron job to keep mine updated [15:21:45] Every two hours or so [15:21:49] PROBLEM - Host baham.wikimedia.org is DOWN: CRITICAL - Time to live exceeded (208.80.153.239) [15:22:25] marktraceur: That's possibly too frequent – don't want it happening mid-deploy. [15:22:29] hoo: I even described it all in excruciating detail at https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys [15:22:37] marktraceur: Every noon-time is probably fine? [15:22:54] James_F: Suppose so. But I'd miss any updates that happened during the 16:00 deploy [15:23:01] So maybe noon and 18:)0 [15:23:10] 18:00 even. [15:23:12] * James_F nods. [15:23:21] 18:00 and 06:00 would be good. [15:23:30] Could do [15:23:43] Well, no, because I miss updates during the 08:00 deploy [15:23:45] Keep up :) [15:25:12] marktraceur: you all clear? can I go do dangerous DNS things? [15:26:47] !log marktraceur Synchronized php-1.24wmf20/extensions/MobileFrontend/less/modules/editor/VisualEditorOverlay.less: (no message) (duration: 00m 07s) [15:26:48] Argh, no message [15:26:54] Logged the message, Master [15:27:14] !log [SCAP] Deployed fix for oojs class names at James_F's behest, sorry for lack of message. [15:27:18] James_F: Verify? [15:27:19] Logged the message, Master [15:27:24] marktraceur: Will do. [15:27:25] bblack: Soon [15:28:47] (03PS1) 10Alexandros Kosiaris: Fixups for servermon [puppet] - 10https://gerrit.wikimedia.org/r/159321 [15:28:59] marktraceur: Yup, working great. Clear! [15:29:05] bblack: All yours [15:30:37] (03CR) 10Alexandros Kosiaris: [C: 032] Fixups for servermon [puppet] - 10https://gerrit.wikimedia.org/r/159321 (owner: 10Alexandros Kosiaris) [15:31:16] (03PS7) 10BBlack: Move all geoip-based resolution to DYNA [dns] - 10https://gerrit.wikimedia.org/r/158382 [15:31:37] (03CR) 10BBlack: [C: 032] Move all geoip-based resolution to DYNA [dns] - 10https://gerrit.wikimedia.org/r/158382 (owner: 10BBlack) [15:32:04] !log deploying large DNS change https://gerrit.wikimedia.org/r/#/c/158382/ - be on the lookout for any related fallout from here... [15:32:11] Logged the message, Master [15:38:47] (03PS32) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [puppet] - 10https://gerrit.wikimedia.org/r/155753 [15:39:52] Jeff_Green: done the $verp_post_connect_server URL correction. :) [15:42:19] ok [15:43:32] tonythomas: ok--so I need a way to test that the API basically responds to the mx [15:44:43] Is there any other way other than doing the curl request from the deployment-mediawiki02 terminal ? [15:44:46] i tried the CURL string you suggest (correcting the hostname according to the production role) and I got a nasty response from the webserver [15:45:22] wait, are you trying to test for deploying to labs/beta or test re. deploying to prod [15:45:23] might be because, bouncehandler is not installed in that wiki you are trying to do the API call ? [15:45:36] currently its only in the deployment-mediawiki02 machine [15:45:41] oh [15:45:50] well then [15:46:01] so what exactly are you asking for re. this patchset? [15:46:26] we obviously can't put it in prod without the bouncehandler extension to answer it [15:47:03] Jeff_Green: we have made this manually silent on prod, from our configs, I think [15:48:00] because no known local-part pattern in use matches the router? [15:48:31] Jeff_Green: true. we made it pretty complex right ? [15:48:48] ^wiki-\w+-\w+-\w+-\w+$ [15:48:58] and also, the eat router is back in position [15:49:09] still, why deploy the mailserver code before the API code? [15:49:58] we have the API running in beta alright, and as part of the deploy plan, we need to ensure working of the exim-curl-api call from beta right ? [15:50:26] and beta/prod goes out through the same mailserver too right ? [15:50:44] hrm, I don't know the answer to that last question [15:51:29] so to test the API in beta, we should have this change in prod -- thats why mark was telling about the beta/prod realm switch we introduced [15:51:39] ok [15:52:18] the realm switch is again an added safety right ? [15:52:33] so that something that pass through ^wiki-\w+-\w+-\w+-\w+$ wont be effective in prod ! [15:53:54] if incoming mail to wiki-* hits a prod mailserver after this patch, it will hit the mwverpbounceprocessor router and get an error from the API [15:54:18] ty marktraceur [15:54:36] I guess that's ok, but I wonder if we shouldn't just have that router disabled for production until we're ready to deploy it there? [15:55:05] yw greg-g [15:55:13] Jeff_Green: sad that I have to quit laptop again :\ will brb [15:55:15] Long night in the city? :) [15:57:37] marktraceur: longer than planned, but was due to just chatting on paulproteus' couch [15:57:48] That'll do it [15:58:11] talking CI, Security, etc etc :) [15:58:22] All the things, I'm sure, yes :) [16:00:05] manybubbles, ^d: Respected human, time to deploy Search (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140909T1600). Please do the needful. [16:00:37] <^d> marktraceur: You guys all done with swatz? [16:01:18] I am, bblack was up to something sinister though [16:01:30] when isn't he? [16:01:47] (03PS1) 10Ottomata: Manage varnishkafka rsyslog conf file with puppet [puppet] - 10https://gerrit.wikimedia.org/r/159330 [16:01:55] <_joe_> he's black ops [16:01:57] <^d> Mmm, sinister things. [16:02:00] * _joe_ hides [16:02:13] (03CR) 10Chad: [C: 032] CirrusSearch: primary backend for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159307 (owner: 10Chad) [16:02:33] (03Merged) 10jenkins-bot: CirrusSearch: primary backend for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159307 (owner: 10Chad) [16:03:26] !log demon Synchronized wmf-config/InitialiseSettings.php: eswiki getting cirrus (duration: 00m 04s) [16:03:32] Logged the message, Master [16:05:33] ^d: that caused a little bump [16:05:54] looks like it was just cache warming though [16:05:59] because its pretty much gone now [16:06:00] <^d> Yeah, like yesterday. [16:06:04] <^d> Tiny bump, then back to normal [16:07:02] (03PS2) 10Ottomata: Manage varnishkafka rsyslog conf file with puppet [puppet] - 10https://gerrit.wikimedia.org/r/159330 [16:11:11] (03CR) 10Ottomata: [C: 032 V: 032] Manage varnishkafka rsyslog conf file with puppet [puppet] - 10https://gerrit.wikimedia.org/r/159330 (owner: 10Ottomata) [16:20:35] greg-g: as a heads up, i'd like a deploy window for OCG/pdf later today. anything available in (say) a few hours? [16:21:24] cscott: greg-g is out for today. chrismcmahon is acting greg today.. Have you checked wikitech? :) [16:23:59] chrismcmahon: /nick greg-g-2 :D [16:25:34] chrismcmahon, Reedy: maybe between the train and flow, 1300 pdt to 1400 pdt, assuming no fires on the train? [16:26:46] cscott: WFM, you can probably start earlier [16:26:50] The deploys never take too long [16:27:01] I might do a few mediawiki-config changes too (and maybe cawikimedia) [16:28:31] cscott: hi, sounds reasonable, I'm mostly just to solve problems if you have one [16:30:51] https://wikitech.wikimedia.org/wiki/Deployments [16:31:04] Does anyone else see 2 SWAT deploys at 23:00–00:00 UTC today? [16:31:17] The source looks normal [16:32:52] Reedy: yeah, i saw that too [16:33:21] I thought it was odd when I was editing in my line that it seemed like the blocks were not in date-order [16:33:39] (in the source) [16:33:58] Reedy: fancy doing a vewikimedia too if I make some patches before your deply? :p [16:34:10] JohnLewis: o_0 [16:34:33] Unless I misread your message actually.. [16:35:14] Wikimedia Canada are moving their wiki to us [16:35:37] yeah I saw the RT ticket. I thought you meant create it but you mean patches for it? [16:35:50] I guess they get cawikimedia? [16:35:55] Nope, mutante_ has done most of the patches for it already [16:35:57] oh, he is here [16:35:59] xD [16:36:13] * JohnLewis is confused to steps back and hides [16:36:33] well I'm doing some configs for a vewikimedia per BZ :p [16:38:35] omfg [16:48:54] (03PS2) 10Reedy: Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 [16:48:58] (03CR) 10jenkins-bot: [V: 04-1] Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 (owner: 10Reedy) [16:51:17] Reedy: I found and removed the dup placeholder swat row. [16:51:29] (03Abandoned) 10Reedy: Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 (owner: 10Reedy) [16:51:36] That table is full of magic [16:51:44] jouncebot: refresh [16:51:45] I refreshed my knowledge about deployments. [16:52:10] bd808: thanks :) [16:53:34] The order of things in source really has nothing to do with the order in the rendered table. It sorts everything on the dates in the template entries. So copy pasta can make dup rows like that. [16:54:04] Reedy: did you see the venezuela bug? [16:54:40] Oh, JohnLewis already asked [16:54:49] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [16:54:49] PROBLEM - HTTP 5xx req/min on labmon1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [16:55:12] (03Restored) 10Reedy: Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 (owner: 10Reedy) [16:55:33] (03PS3) 10Reedy: WIP:Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 [16:55:52] JohnLewis: jeremyb If I'm gonna do one, might aswell do 2... But we do need apache and dns stuff doing by opsen [16:56:24] right [16:56:32] bbl :) [17:08:00] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [17:08:00] RECOVERY - HTTP 5xx req/min on labmon1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:09:41] (03PS4) 10Reedy: WIP: Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 [17:11:23] (03PS5) 10Reedy: WIP: Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 [17:11:32] _joe_: redone ^^ [17:11:51] <_joe_> Reedy: cool [17:11:59] (03PS6) 10Reedy: Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 [17:12:07] <_joe_> Reedy: I'll rebase my subsequent apache changes on that one [17:12:35] sweet :) [17:13:50] Reedy: should i put the cawikimedia patches on swat deploy via wiki? [17:14:30] (03PS1) 10Reedy: Non wikipedias to 1.24wmf20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159346 [17:15:01] mutante_: You could.. I was going to do it in the window [17:19:39] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [17:23:27] Krinkle: if you have a minute, wanna move your fenari public_html to terbium -> people.wm.org ? [17:23:41] Reedy: ok, i will as soon as i got my 2factor :p [17:23:56] eh, the second factor to login to wikitech [17:23:59] brb [17:24:25] (03PS1) 10Filippo Giunchedi: swift: separate access log from general log [puppet] - 10https://gerrit.wikimedia.org/r/159348 [17:28:07] (03CR) 10Ori.livneh: "Replies to Filippo. (Thanks for reviewing!)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/153727 (owner: 10Ori.livneh) [17:29:51] mutante: So.. In hopes to improve docs, I'm limiting myself to doc I can access and not mailing lists. What's the new bastion? I see no mention of it on https://wikitech.wikimedia.org/wiki/Fenari, and https://wikitech.wikimedia.org/wiki/Category:Server_type:Bastion only contains an old Kennisnet bastion, and Bastion is redirect to wmflabs bastion. [17:31:00] bast1001 [17:31:02] ? [17:31:18] Krinkle: you should see this as separate from bastion. it is bast1001, but we picked a different server for the public_html part of things [17:31:23] and created people.wikimedia.org for that [17:31:26] I know [17:31:45] it is being taken out of the noc. part of fenari, as opposed to the config part of it [17:32:03] so the place to put the public_html dir into your home would now be terbium [17:32:19] and give you http://people.wikimedia.org/~krinkle [17:32:34] i'll make a wiki page today, k? [17:32:45] I'll do it [17:32:53] so I know it contains what I need to know [17:34:03] (03CR) 10Mark Bergsma: [C: 04-1] "Still two issues..." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [17:34:10] thanks, will check it out and might add ..wiki [17:34:30] you are right to do this (not rely on lists) [17:35:56] ^d: ^godog searhc meeting time! [17:35:58] godog: meeting now? [17:37:29] manybubbles: yep sorry I'm late cc ottomata [17:38:40] mark: so the realm switch should be inside the class { 'exim::roled': to pass them as parameters ? [17:39:22] no, just pass the variables as parameters [17:39:43] so class { 'exim::roled': ... verp_domains => $verp_domains, ... } [17:40:21] ok. fixing that up [17:40:22] Hm.. interwiki 'w' and 'wikipedia' no longer work on labs wik [17:41:00] Can't see how many there are, https://wikitech.wikimedia.org/wiki/Special:WantedPages is disabled. [17:41:41] :en: works, that's odd, it thinks it's a wikipedia? [17:43:28] (03CR) 10Mark Bergsma: [C: 031] Allocate sandbox vlans for codfw and ulsfo [dns] - 10https://gerrit.wikimedia.org/r/158636 (owner: 10Mark Bergsma) [17:44:22] (03CR) 10Mark Bergsma: [C: 031] Allocate IPv4/IPv6 for RIPE Atlas codfw/ulsfo [dns] - 10https://gerrit.wikimedia.org/r/158939 (owner: 10Faidon Liambotis) [17:47:44] (03PS33) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [puppet] - 10https://gerrit.wikimedia.org/r/155753 [17:48:44] mark: ^ looks good now ? https://gerrit.wikimedia.org/r/#/c/155753 [17:50:28] (03CR) 10BryanDavis: "I simplified this even more when I recreated foreachwiki in MediaWiki-Vagrant Reedy: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=126193&oldid=126170 [17:57:56] does that actually make jouncebot get it? [17:58:25] (03PS1) 10John F. Lewis: Reopen vewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159354 (https://bugzilla.wikimedia.org/70579) [17:58:29] (03CR) 10jenkins-bot: [V: 04-1] Reopen vewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159354 (https://bugzilla.wikimedia.org/70579) (owner: 10John F. Lewis) [17:58:39] Does test.wikipedia.org still run from nfs or in some way connected to tin, fenari, or some other host that isn't a regular apache? [17:58:49] From mw1017, right? [17:59:42] Krinkle: Nope, runs on local data [17:59:57] you can sync-common on it though to have it up to date with tin [18:00:04] Reedy, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140909T1800). Please do the needful. [18:00:04] (and not update everything) [18:00:12] so testwiki is special in that its from a single apache, but no longer special treatment in being synced first or running un unsynced code? [18:00:34] right, but that sync-common would work on any apache [18:00:38] Not sure about order of sync, but essentially yes [18:00:40] I assume it's included in scap, right? [18:00:48] yup [18:00:57] scap does sync to it, like to every other host [18:01:11] it's just that oyu can use it to have test use different code, if needed [18:01:18] (temporary only... and it's hackish) [18:01:20] Reedy: greg-g ah, did i put it into the wrong one? [18:01:29] should config changes be in mediawiki train? [18:01:34] or in swat [18:02:00] (03CR) 10Mark Bergsma: "This is pretty much good to go, but we'll have to deploy it in production carefully as I assume this hasn't been tested much in Labs yet." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [18:02:48] (03PS2) 10John F. Lewis: Reopen vewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159354 (https://bugzilla.wikimedia.org/70579) [18:03:10] mutante: Swat... unless Reed.y wants to do them himself, then just poke him :P [18:03:41] hoo: ok, i did both :p [18:03:45] alright then [18:05:00] hoo, Krinkle: Order of sync during a scap is randomized. Code is pushed to the 4 rsync fanout servers and then the long list of mw hosts is shuffled and they start syncing. [18:05:30] Ok, thought so... syncing test out of line wouldn't have made much sense anywa [18:05:31] y [18:06:28] omfg [18:07:38] Reedy: ?:) [18:08:12] oh, mobile ? [18:08:28] not specifically [18:08:38] (03CR) 10Filippo Giunchedi: Clean up salt::minion (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/153727 (owner: 10Ori.livneh) [18:11:09] mutante: k, made a bunch of edits on wikitech. Have fun ) [18:11:37] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.24wmf20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159346 (owner: 10Reedy) [18:11:41] (03Merged) 10jenkins-bot: Non wikipedias to 1.24wmf20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159346 (owner: 10Reedy) [18:12:53] ^d manybubbles ottomata FWIW I will be busy(er) with codfw and swift coming online, but will be generally available if I can help [18:13:07] (03PS1) 10John F. Lewis: Don't redirect vewikimedia [puppet] - 10https://gerrit.wikimedia.org/r/159356 (https://bugzilla.wikimedia.org/70579) [18:13:25] (03PS34) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [puppet] - 10https://gerrit.wikimedia.org/r/155753 [18:13:48] (03CR) 10Reedy: "Might aswell add the change to the wikimedia-chapters vhost in this commit too" [puppet] - 10https://gerrit.wikimedia.org/r/159356 (https://bugzilla.wikimedia.org/70579) (owner: 10John F. Lewis) [18:13:50] mark: done :) [18:14:20] (03CR) 10John F. Lewis: "Oh yeah; good idea :p" [puppet] - 10https://gerrit.wikimedia.org/r/159356 (https://bugzilla.wikimedia.org/70579) (owner: 10John F. Lewis) [18:14:25] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf20 [18:14:30] Logged the message, Master [18:15:25] (03PS2) 10John F. Lewis: Don't redirect vewikimedia [puppet] - 10https://gerrit.wikimedia.org/r/159356 (https://bugzilla.wikimedia.org/70579) [18:15:42] (03PS3) 10John F. Lewis: Reopen vewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159354 (https://bugzilla.wikimedia.org/70579) [18:17:06] Where the hell do Wikimedia Zero bugs go? [18:17:59] Reedy: Mediawiki extensions -> ZeroPortal in BZ? [18:18:22] https://bugzilla.wikimedia.org/buglist.cgi?component=ZeroPortal&list_id=342973&product=MediaWiki%20extensions&resolution=--- [18:18:38] Presumably :/ [18:22:46] (03PS3) 10Reedy: add cawikimedia to dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158284 (owner: 10Dzahn) [18:22:46] (03CR) 10Reedy: [C: 032] add cawikimedia to dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158284 (owner: 10Dzahn) [18:22:46] (03Merged) 10jenkins-bot: add cawikimedia to dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158284 (owner: 10Dzahn) [18:23:24] :) [18:24:00] Warning: fopen(/a/common/wikiversions.dat): failed to open stream: Permission denied in /a/common/php-1.24wmf19/extensions/WikimediaMaintenance/addWiki.php on line 196 [18:24:01] lol [18:25:02] conflicts in wikiversions.json are awful [18:26:57] (03PS4) 10Reedy: add cawikimedia to wikiversion, MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158303 (owner: 10Dzahn) [18:27:09] (03CR) 10Reedy: [C: 032] add cawikimedia to wikiversion, MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158303 (owner: 10Dzahn) [18:27:14] (03Merged) 10jenkins-bot: add cawikimedia to wikiversion, MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158303 (owner: 10Dzahn) [18:27:29] (03PS3) 10Reedy: add cawikimedia to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158312 (owner: 10Dzahn) [18:27:40] (03CR) 10Reedy: [C: 032] add cawikimedia to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158312 (owner: 10Dzahn) [18:27:44] (03Merged) 10jenkins-bot: add cawikimedia to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158312 (owner: 10Dzahn) [18:28:35] !log reedy Synchronized multiversion/: (no message) (duration: 00m 14s) [18:28:40] Logged the message, Master [18:29:56] hah, arr, does that depend on the order of merging them? or they should rather be one bigger change? [18:30:13] ding ding, an unauthorized person has access to our servers: https://gerrit.wikimedia.org/r/#/c/159250/ :P [18:30:22] (03PS1) 10Reedy: gj Reedy. Remove merge conflict [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159359 [18:30:59] (03CR) 10Reedy: [C: 032] gj Reedy. Remove merge conflict [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159359 (owner: 10Reedy) [18:31:03] (03Merged) 10jenkins-bot: gj Reedy. Remove merge conflict [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159359 (owner: 10Reedy) [18:31:40] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Add cawikimedia [18:31:45] Logged the message, Master [18:32:02] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 15s) [18:32:09] Logged the message, Master [18:32:13] hmm [18:32:21] mutante: Has the apache config change been made? [18:32:53] s/made/deployed/ [18:33:14] Nope [18:33:14] https://gerrit.wikimedia.org/r/#/c/158843/ [18:34:42] (03PS3) 10Krinkle: apache: Remove old comments referencing 'yaseo' [puppet] - 10https://gerrit.wikimedia.org/r/158996 [18:37:48] (03PS4) 10Reedy: Reopen vewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159354 (https://bugzilla.wikimedia.org/70579) (owner: 10John F. Lewis) [18:38:27] (03CR) 10Reedy: [C: 032] Reopen vewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159354 (https://bugzilla.wikimedia.org/70579) (owner: 10John F. Lewis) [18:38:31] (03Merged) 10jenkins-bot: Reopen vewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159354 (https://bugzilla.wikimedia.org/70579) (owner: 10John F. Lewis) [18:38:53] (03PS2) 10Reedy: Increase $wgSVGMaxSize to 4096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158951 (https://bugzilla.wikimedia.org/70529) (owner: 10Jackmcbarn) [18:38:57] (03CR) 10Reedy: [C: 032] Increase $wgSVGMaxSize to 4096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158951 (https://bugzilla.wikimedia.org/70529) (owner: 10Jackmcbarn) [18:39:03] (03Merged) 10jenkins-bot: Increase $wgSVGMaxSize to 4096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158951 (https://bugzilla.wikimedia.org/70529) (owner: 10Jackmcbarn) [18:40:22] (03Abandoned) 10Reedy: Delete ve.wikimedia.org and leave redirect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/131907 (https://bugzilla.wikimedia.org/55737) (owner: 10Withoutaname) [18:41:05] (03PS2) 10Reedy: Scribunto: double the Lua CPU limit on the job runners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158948 (owner: 10Ori.livneh) [18:41:19] (03CR) 10Reedy: [C: 032] Scribunto: double the Lua CPU limit on the job runners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158948 (owner: 10Ori.livneh) [18:41:23] (03Merged) 10jenkins-bot: Scribunto: double the Lua CPU limit on the job runners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158948 (owner: 10Ori.livneh) [18:42:19] (03CR) 10Reedy: "Can this just be deployed anytime now?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158024 (https://bugzilla.wikimedia.org/68766) (owner: 10Parent5446) [18:43:54] (03PS2) 10Reedy: Update Parsoid extension require path [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157177 (owner: 10GWicke) [18:44:03] (03CR) 10Reedy: [C: 032] Update Parsoid extension require path [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157177 (owner: 10GWicke) [18:44:08] (03Merged) 10jenkins-bot: Update Parsoid extension require path [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157177 (owner: 10GWicke) [18:44:17] (03PS1) 10Dzahn: disable access for jgonera [puppet] - 10https://gerrit.wikimedia.org/r/159360 [18:44:48] (03PS2) 10Reedy: Put wikibase cache settings together [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158112 (owner: 10Aude) [18:44:52] (03CR) 10Reedy: [C: 032] Put wikibase cache settings together [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158112 (owner: 10Aude) [18:44:59] (03Merged) 10jenkins-bot: Put wikibase cache settings together [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158112 (owner: 10Aude) [18:45:16] (03PS2) 10Reedy: Add Wikibase properties to suggester blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158113 (https://bugzilla.wikimedia.org/70346) (owner: 10Aude) [18:45:20] (03CR) 10Hoo man: [C: 04-1] "> add explicitely to the group for absented users" [puppet] - 10https://gerrit.wikimedia.org/r/159360 (owner: 10Dzahn) [18:45:23] (03CR) 10Reedy: [C: 032] Add Wikibase properties to suggester blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158113 (https://bugzilla.wikimedia.org/70346) (owner: 10Aude) [18:45:26] (03Merged) 10jenkins-bot: Add Wikibase properties to suggester blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158113 (https://bugzilla.wikimedia.org/70346) (owner: 10Aude) [18:46:07] cscott: I'm just about done I think... So if you want to start early, I think you can [18:46:09] (03PS2) 10Dzahn: disable access for jgonera [puppet] - 10https://gerrit.wikimedia.org/r/159360 [18:46:11] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 14s) [18:46:18] Logged the message, Master [18:46:29] Reedy: ok, i'm preparing my deploy commit. [18:46:52] (03CR) 10Hoo man: [C: 031] disable access for jgonera [puppet] - 10https://gerrit.wikimedia.org/r/159360 (owner: 10Dzahn) [18:49:49] (03PS3) 10Dzahn: disable access for jgonera [puppet] - 10https://gerrit.wikimedia.org/r/159360 [18:50:22] (03CR) 10Dzahn: [C: 032] disable access for jgonera [puppet] - 10https://gerrit.wikimedia.org/r/159360 (owner: 10Dzahn) [18:51:35] manybubbles: ^d there's a huge number of pool-queuefull warnings for Cirrus again [18:51:55] ^d: did you do it? [18:52:31] Reedy: i need your help [18:52:39] matanya: oh noes [18:52:49] Reedy: no load spike - I'll look into it some more [18:52:51] looks like my git repo broke for some reason [18:53:11] which repo? [18:53:24] mediawiki-config [18:53:58] it may be easiest to just delete and re-clone it [18:54:00] depending on what's up [18:54:23] i did that [18:54:27] got the same issue [18:54:32] wmf-config/Wikibase.php [18:54:44] prevents me from pulling to master [18:54:57] what's wrong with it? [18:55:01] git pull just WFM locally [18:55:12] i cloned [18:55:18] and then pulled before i branch [18:55:21] (03CR) 10Dzahn: "Notice: /Stage[main]/Admin/Admin::Hashuser[jgonera]/Admin::User[jgonera]/User[jgonera]/ensure: removed" [puppet] - 10https://gerrit.wikimedia.org/r/159360 (owner: 10Dzahn) [18:55:41] but it refuses due to: You have not concluded your merge (MERGE_HEAD exists). [18:55:41] Please, commit your changes before you can merge [18:55:58] what merge would i want to do in a clean clone ??? [18:56:04] (03Abandoned) 10Reedy: Use old style memcached access on wikitech to stop cache pollution [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158038 (owner: 10Reedy) [18:57:07] https://tools.wmflabs.org/paste/view/6d9b495c [18:57:10] Reedy: ^ [18:57:37] git rebase origin? [18:57:45] else git reset HEAD~10 --hard; git pull [19:01:19] (03PS1) 10Matanya: (bug 70616) Change rights for user groups in hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159364 [19:01:34] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [19:01:41] thank Reedy the former did the trick [19:02:56] now you can review the change ... :D [19:03:21] mutante: https://gerrit.wikimedia.org/r/158086 is waiting for you, thanks :) [19:04:25] Reedy: adding more logging to the pool errors - because I don't have a clue what caused it. any idea if it'd be ok to log usernames/ip addresses in there? like, do we already have that kind of information in the warning logs anyway? [19:05:05] manybubbles: We have a poolcounter log on fluorine [19:05:17] I've not looked at it, so not sure offhand what that may contain [19:05:34] !log killed jgonera's screen session on stat1002 - puppet failed to deactivate otherwise [19:05:36] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:05:41] Logged the message, Master [19:06:52] (03PS1) 10Ottomata: Increase queue_buffering_max_ms again to cover for 10+ second pauses on analytics1021 [puppet] - 10https://gerrit.wikimedia.org/r/159365 [19:08:35] Reedy: mostly just lucenesearch getting hammered. It looks like eswiki saw spikes in traffic from time to time. Cirrus took eswiki today so it might see the same spikes [19:09:20] (03CR) 10Ottomata: [C: 032 V: 032] Increase queue_buffering_max_ms again to cover for 10+ second pauses on analytics1021 [puppet] - 10https://gerrit.wikimedia.org/r/159365 (owner: 10Ottomata) [19:11:19] mutante: Hm.. bast1001 is not visible from terbium? [19:11:33] .wikimedia.org works but not plain bast1001, bast1001 work from fenari though [19:11:34] Probably not [19:11:53] fenari and bast100 are external facing hosts [19:12:12] yes [19:12:12] it's also visible from tin [19:12:40] from tin but not terbium? lol [19:12:56] I mean bast1001 can obviously see both [19:12:58] (03CR) 10Nuria: Adding gzip compression for several file types (031 comment) [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/159181 (owner: 10Nuria) [19:13:23] (03CR) 10Alex Monk: "Done by Ic8b12770 instead?" [puppet] - 10https://gerrit.wikimedia.org/r/159250 (owner: 10MaxSem) [19:13:37] yeah, fenari and bast1001 resolve in tin, but neither resolves on terbium [19:15:27] (03CR) 10CSteipp: "Reedy, yep." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158024 (https://bugzilla.wikimedia.org/68766) (owner: 10Parent5446) [19:18:44] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [19:20:35] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [19:24:06] (03CR) 10Jgreen: Added the bouncehandler router to catch in all bounce emails (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [19:24:24] PROBLEM - check if dhclient is running on stat1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:25:21] RECOVERY - check if dhclient is running on stat1002 is OK: PROCS OK: 0 processes with command name dhclient [19:28:02] (03CR) 10Ottomata: Adding gzip compression for several file types (031 comment) [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/159181 (owner: 10Nuria) [19:31:14] PROBLEM - Varnishkafka log producer on cp1062 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [19:33:59] oo [19:35:24] RECOVERY - Varnishkafka log producer on cp1062 is OK: PROCS OK: 1 process with command name varnishkafka [19:37:50] puppet just didn't finish restarting that, hm [19:39:53] mutante: I've moved my stuff over to terbium [19:53:06] (03Abandoned) 10MaxSem: jgonera is not working for us anymore [puppet] - 10https://gerrit.wikimedia.org/r/159250 (owner: 10MaxSem) [19:58:24] Nemo_bis: Do you know if there's a Bugzilla bug for the Phabricator premiere that could be used as a blocking bug? [19:59:00] Reedy, chrismcmalunch: i had a little trouble making my deploy commit, but i'm ready to go now. [19:59:42] scfc_de: no idea but probably not, they're handling everything on the phabricator instance [19:59:57] Nemo_bis: Thought so, thanks. [20:00:00] Reedy, chrismcmalunch: seems like train is done and no one else is deploying right now? [20:03:50] I think you're fine [20:03:54] !next [20:04:01] What's the syntax? [20:04:45] Reedy: dunno, just added myself to https://wikitech.wikimedia.org/wiki/Deployments#Near-term [20:08:07] (03PS1) 10Alexandros Kosiaris: Various fixups for servermon.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/159378 [20:09:15] (03CR) 10Alexandros Kosiaris: [C: 032] Various fixups for servermon.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/159378 (owner: 10Alexandros Kosiaris) [20:10:06] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [20:10:20] jouncebot: next [20:10:20] In 0 hour(s) and 49 minute(s): Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140909T2100) [20:10:41] jouncebot: refresh [20:10:43] I refreshed my knowledge about deployments. [20:10:45] jouncebot: next [20:10:45] In 0 hour(s) and 49 minute(s): Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140909T2100) [20:10:46] (03PS1) 10Ottomata: Fix for mobile-100 kafkatee output filter [puppet] - 10https://gerrit.wikimedia.org/r/159381 [20:11:32] (03PS2) 10Ottomata: Fix for mobile-100 kafkatee output filter [puppet] - 10https://gerrit.wikimedia.org/r/159381 [20:12:31] (03CR) 10Ottomata: [C: 032 V: 032] Fix for mobile-100 kafkatee output filter [puppet] - 10https://gerrit.wikimedia.org/r/159381 (owner: 10Ottomata) [20:15:31] !log updated OCG to version c9a2b4cf2502479eeabed07ab2de728695d96e46 [20:15:37] Logged the message, Master [20:15:48] (03PS1) 10Alexandros Kosiaris: Fix syntax error in servermon apache config [puppet] - 10https://gerrit.wikimedia.org/r/159382 [20:16:53] (03PS1) 10Ottomata: Set up sampled-1000 output for kafkatee on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/159383 [20:18:20] (03CR) 10Ottomata: [C: 032 V: 032] Set up sampled-1000 output for kafkatee on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/159383 (owner: 10Ottomata) [20:19:23] cscott: Reedy all OK? [20:19:30] (03PS1) 10Dzahn: add domain_search wikimedia.org to terbium [puppet] - 10https://gerrit.wikimedia.org/r/159384 [20:19:32] Krinkle|detached: thanks for moving stuff [20:19:40] chrismcmahon: yup, deploy completed. looks good to me. [20:19:46] Krinkle|detached: and see that gerrit above ^ in reply to your comment [20:22:01] (03CR) 10Dzahn: "root@terbium:~# grep search /etc/resolv.conf" [puppet] - 10https://gerrit.wikimedia.org/r/159384 (owner: 10Dzahn) [20:32:54] (03PS1) 10Dzahn: remove 'virt cluster pmtpa' from ganglia [puppet] - 10https://gerrit.wikimedia.org/r/159390 [20:35:34] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Epic puppet fail [20:37:05] ^ random puppet 502 error, not lvs issue [20:37:30] 'k, good [20:38:34] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [20:38:45] mutante: is ganglia broken ? [20:38:57] matanya: i think it is, yes [20:39:16] oh, ok. not just me [20:41:02] (03PS1) 10Dzahn: add esams.wmnet to search in resolv.conf [puppet] - 10https://gerrit.wikimedia.org/r/159391 [20:41:38] mutante, you want to create a new wiki during SWAT? o_0 [20:42:34] !log service gmetad restart on nickel.wikimedia.org due to ganglia web not working [20:42:34] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 37476.6510645 [20:42:40] Logged the message, Master [20:42:45] MaxSem: what's wrong? they are config changes? but Reedy already did them anyways [20:43:14] well, new wiki creation is outside of SWAT scope [20:43:16] akosiaris: you fixed it, thx [20:43:22] matanya: ^ it's back [20:43:31] and Reedy is the only person who knows how to do that anyway:) [20:43:44] akosiaris: servermon has ssl warning [20:43:58] MaxSem: what's the correct scope [20:44:32] matanya: yeah, know, already pointed out in the email :-) [20:44:44] * matanya is lagged in mail [20:46:12] mutante, https://wikitech.wikimedia.org/wiki/SWAT_deploys or as greg-g says:) [20:46:45] hah, ok, that definition that is kind of self-referencing [20:46:58] "swat deploys are deploys that happen during swat":) [20:47:23] but alright, i will stick to the Reedy system :) [20:48:06] mutante: https://wikitech.wikimedia.org/wiki/SWAT_deploys#Guidelines [20:48:58] (03PS2) 10Alexandros Kosiaris: Fix syntax error in servermon apache config [puppet] - 10https://gerrit.wikimedia.org/r/159382 [20:49:27] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix syntax error in servermon apache config [puppet] - 10https://gerrit.wikimedia.org/r/159382 (owner: 10Alexandros Kosiaris) [20:49:40] yep, that says "Allowed types of patches" are allowed [20:49:45] i thought that's what i did.. but shrug [20:49:53] eh, wrong paste [20:50:01] " Simple config changes (that don't turn on any new features)" [20:50:31] yeah, there's also border lines/ambiguity/exceptions, which as long as "people" are ok, then it's ok :) [20:50:55] "does the patch set a server on fire?" "no." "then it's not a swat patch" [20:53:04] i would hope adding a wiki doesnt have to be in the "sets on fire" category:) [20:53:36] (03CR) 10Matanya: [C: 031] StrictTransportSecurity for lists.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/145500 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [21:00:04] spagewmf: Respected human, time to deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140909T2100). Please do the needful. [21:01:39] (03CR) 10Matanya: [C: 031] Avoid referencing private contacts in icinga::monitor on labs. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/158355 (owner: 10JanZerebecki) [21:02:21] (03CR) 10Matanya: [C: 031] puppetmaster Apache template - retab [puppet] - 10https://gerrit.wikimedia.org/r/153987 (owner: 10Dzahn) [21:02:54] (03CR) 10Matanya: [C: 031] "doesn't even show in ganglia web UI." [puppet] - 10https://gerrit.wikimedia.org/r/159390 (owner: 10Dzahn) [21:20:44] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [21:31:52] (03PS1) 10Spage: Unenable Wikipedia:Teahouse/Questions/Flow test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159404 [21:45:16] (03CR) 10Dzahn: [C: 032] releases: allow uploads from tin [puppet] - 10https://gerrit.wikimedia.org/r/159267 (owner: 10Filippo Giunchedi) [21:47:21] (03PS3) 10BBlack: Turn off include_optional_ns for gdnsd [puppet] - 10https://gerrit.wikimedia.org/r/158637 [21:48:17] (03CR) 10Dzahn: "Notice: /Stage[main]/Role::Releases/Ferm::Service[tin_package_upload]/File[/etc/ferm/conf.d/10_tin_package_upload]/ensure: created" [puppet] - 10https://gerrit.wikimedia.org/r/159267 (owner: 10Filippo Giunchedi) [21:49:38] !log ebernhardson Started scap: Bump Echo and Flow versions in 1.24wmf19 [21:49:43] Logged the message, Master [21:50:51] (03CR) 10BBlack: [C: 032] Turn off include_optional_ns for gdnsd [puppet] - 10https://gerrit.wikimedia.org/r/158637 (owner: 10BBlack) [21:51:03] (03PS2) 10Krinkle: Remove Wikipedia:Teahouse/Questions/Flow_test from enwiki Flow pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159404 (owner: 10Spage) [21:52:44] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds [21:52:45] PROBLEM - HTTP error ratio anomaly detection on labmon1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds [21:52:54] (03PS3) 10Krinkle: Remove Wikipedia:Teahouse/Questions/Flow_test from enwiki Flow pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159404 (owner: 10Spage) [21:53:49] Ib0aaa60f09 links to https://www.mediawiki.org/wiki/Flow/Rollout#Releases but I don't see Teahouse/Flow_test listed there. [21:53:53] spagewmf: ^ [21:54:31] So it was just you adding that? Just checking whether there's any expectation from product or other people that that page should be there as of a certain date [22:02:34] (03CR) 10Dzahn: adding pending deployment ganglia group and setting it to default (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/159167 (owner: 10RobH) [22:03:28] (03CR) 10Dzahn: "while it seems ok to do it's somehow a mix between ganglia old and ganglia_new, and we should probably just bother adding it in ganglia_ne" [puppet] - 10https://gerrit.wikimedia.org/r/159167 (owner: 10RobH) [22:09:27] (03CR) 10Dzahn: adding pending deployment ganglia group and setting it to default (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/159167 (owner: 10RobH) [22:10:40] Krinkle: it wasn't just spage, he just usually does the more administrative config updates [22:11:25] ebernhardson: so where did it come from? Is it safe to remove based on community alone or is someone gonna be upset? [22:12:32] Krinkle: we've already contacted the relevant people, as for where it came from i'm not sure there is a specific page or anything, general 'find some newbie spaces' has been on the quartlerly calendar for awhile [22:14:29] Krinkle: i generally don't keep too up to date with community goings-on, its much too complicated [22:16:22] k. Just from outside perspective all I see is a commit adding it with no rationale (other than a link that doesn't explain it), and then another commit that removes it with a link to community decision where Erik says it should be moved. [22:16:29] Ah, I see Quidity says it should be removed. [22:21:03] !log ebernhardson Finished scap: Bump Echo and Flow versions in 1.24wmf19 (duration: 31m 25s) [22:21:09] Logged the message, Master [22:21:33] (03PS4) 10Krinkle: Remove Wikipedia:Teahouse/Questions/Flow_test from enwiki Flow pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159404 (owner: 10Spage) [22:23:33] !log Reloading Zuul to deploy I27024680c74ca0130 [22:23:39] Logged the message, Master [22:41:18] (03CR) 10EBernhardson: [C: 032] Remove Wikipedia:Teahouse/Questions/Flow_test from enwiki Flow pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159404 (owner: 10Spage) [22:41:22] (03Merged) 10jenkins-bot: Remove Wikipedia:Teahouse/Questions/Flow_test from enwiki Flow pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159404 (owner: 10Spage) [22:42:36] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: Deploy config change I158e7c6852 (duration: 00m 04s) [22:42:42] Logged the message, Master [22:47:21] (03PS1) 10Dzahn: let NDAed people login on servermon [puppet] - 10https://gerrit.wikimedia.org/r/159419 [22:47:47] (03PS2) 10Krinkle: add domain_search wikimedia.org to terbium [puppet] - 10https://gerrit.wikimedia.org/r/159384 (owner: 10Dzahn) [22:48:37] (03CR) 10Krinkle: [C: 031] add domain_search wikimedia.org to terbium [puppet] - 10https://gerrit.wikimedia.org/r/159384 (owner: 10Dzahn) [22:48:51] (03CR) 10Reedy: [C: 031] let NDAed people login on servermon [puppet] - 10https://gerrit.wikimedia.org/r/159419 (owner: 10Dzahn) [22:49:34] mutante: for what it's worth, the connection the odd way around was to move files around via scp [22:49:37] (03CR) 10Matanya: [C: 031] let NDAed people login on servermon [puppet] - 10https://gerrit.wikimedia.org/r/159419 (owner: 10Dzahn) [22:49:52] mutante: and also for syncing my dotfiles, I used to fetch those from my home dir on fenari [22:49:58] now I'll be doing that from bast1001 [22:50:13] e.g. PS1 and aliases [22:50:43] (03PS3) 10Dzahn: add domain_search wikimedia.org to terbium [puppet] - 10https://gerrit.wikimedia.org/r/159384 [22:51:32] (03CR) 10Dzahn: [C: 032] add domain_search wikimedia.org to terbium [puppet] - 10https://gerrit.wikimedia.org/r/159384 (owner: 10Dzahn) [22:56:02] Krinkle: ok.. hmm.. it did not do the expected thing yet .. hmm [22:59:28] it seems it just writes that resolv.conf once when a server is installed but not changing it later [23:00:01] Krinkle: root@terbium:~# ping bast1001 [23:00:02] PING bast1001.wikimedia.org [23:00:04] RoanKattouw, ^d, marktraceur, MaxSem, spagewmf: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140909T2300). Please do the needful. [23:00:18] !log added wikimedia.org to search in resolv.conf on terbium [23:00:18] hmm, nothing to deploy? [23:00:24] Logged the message, Master [23:00:24] <^d> I'll take it today. [23:00:29] <^d> I've got some stuff putting together. [23:00:37] mutante: Yep, thx. [23:00:45] <^d> MaxSem: spagewmf has something I think. [23:01:11] <^d> Actually, that looks done. [23:02:13] (03CR) 10Dzahn: "it seems this only writes the file once when it's installed, but puppet won't change it after the fact, so i added wikimedia.org manually" [puppet] - 10https://gerrit.wikimedia.org/r/159384 (owner: 10Dzahn) [23:04:17] Krinkle: actually, no, puppet is reverting me [23:04:57] ^d yup Flow updates are done per our deploy window. (Thanks Krinkle for improving a commit msg) [23:05:27] https://github.com/wikimedia/operations-puppet/search?utf8=%E2%9C%93&q=domain_search&type=Code [23:05:28] mutante: strange [23:10:11] (03PS1) 10Ori.livneh: Add ca.wikimedia.org to wikimedia-chapter apache site [puppet] - 10https://gerrit.wikimedia.org/r/159422 [23:10:28] ^ mutante [23:10:34] (right repo; the other patch can be abandoned) [23:10:49] I'd already made a patch for that too... [23:11:01] oh, sorry [23:11:03] which one? [23:11:07] In the wrong repo [23:11:11] Whaa [23:11:17] heh [23:11:26] (03Abandoned) 10Reedy: Add ca.wikimedia.org to wikimedia-chapter apache site [apache-config] - 10https://gerrit.wikimedia.org/r/158808 (owner: 10Reedy) [23:13:43] (03PS4) 10Ori.livneh: apache: Remove old comments referencing 'yaseo' [puppet] - 10https://gerrit.wikimedia.org/r/158996 (owner: 10Krinkle) [23:13:51] (03CR) 10Ori.livneh: [C: 032 V: 032] apache: Remove old comments referencing 'yaseo' [puppet] - 10https://gerrit.wikimedia.org/r/158996 (owner: 10Krinkle) [23:15:25] !log demon Synchronized php-1.24wmf20/extensions/CirrusSearch: Various fixes for things (duration: 00m 05s) [23:15:30] Logged the message, Master [23:15:52] !log Reloading Zuul to deploy I26bc21ed2938e97e7ed6f6b [23:15:58] Logged the message, Master [23:21:36] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Thu 04 Sep 2014 00:21:29 UTC [23:24:37] (03CR) 10Dzahn: [C: 031] "yes please, thank you. all the mw-config is already merged meanwhile, we just needed this" [puppet] - 10https://gerrit.wikimedia.org/r/159422 (owner: 10Ori.livneh) [23:25:48] (03PS2) 10Dzahn: Add ca.wikimedia.org to wikimedia-chapter apache site [puppet] - 10https://gerrit.wikimedia.org/r/159422 (owner: 10Ori.livneh) [23:26:03] (03CR) 10Dzahn: "RT: 8206 - just linked ticket" [puppet] - 10https://gerrit.wikimedia.org/r/159422 (owner: 10Ori.livneh) [23:27:54] (03CR) 10Dzahn: [C: 04-2] "wrong repo due to confusion with submodules, replaced by Change-Id: I09a6e25a7c29" [apache-config] - 10https://gerrit.wikimedia.org/r/158843 (owner: 10Dzahn) [23:28:33] (03Abandoned) 10Dzahn: add ca.wikimedia.org ServerAlias [apache-config] - 10https://gerrit.wikimedia.org/r/158843 (owner: 10Dzahn) [23:28:45] (03CR) 10Ori.livneh: [C: 032] Add ca.wikimedia.org to wikimedia-chapter apache site [puppet] - 10https://gerrit.wikimedia.org/r/159422 (owner: 10Ori.livneh) [23:31:37] (03CR) 10Dzahn: "this is weird, why does puppet not add to /etc/resolv.conf what is expected here? it reverts to just "eqiad"" [puppet] - 10https://gerrit.wikimedia.org/r/159384 (owner: 10Dzahn) [23:33:56] (03PS3) 10Nuria: Adding gzip compression for several file types [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/159181 [23:34:14] RECOVERY - Host baham.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 52.81 ms [23:34:29] :) moar star servers [23:34:40] ori: and thanks [23:39:12] (03PS1) 10BBlack: fix baham ns IP [puppet] - 10https://gerrit.wikimedia.org/r/159427 [23:39:42] (03CR) 10BBlack: [C: 032 V: 032] fix baham ns IP [puppet] - 10https://gerrit.wikimedia.org/r/159427 (owner: 10BBlack) [23:40:34] PROBLEM - Host baham.wikimedia.org is DOWN: CRITICAL - Time to live exceeded (208.80.153.239) [23:44:16] (03PS1) 10Ori.livneh: beta: switch to /srv/mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/159431 [23:48:14] who broke beta wiki ? [23:48:41] how is it broken? [23:48:43] matanya: -> #wikimedia-qa [23:48:54] back now [23:49:04] thanks jeremyb :) [23:49:46] guessing that was bd808 fixing Jenkins. I could be wrong. [23:49:58] no, it's me. fixing. [23:50:04] *fixed [23:51:29] (03CR) 10Dzahn: "https://github.com/wikimedia/operations-puppet/search?utf8=%E2%9C%93&q=domain_search&type=Code why ?" [puppet] - 10https://gerrit.wikimedia.org/r/159384 (owner: 10Dzahn) [23:56:06] * bd808 was only slapping jenkins around and should not have harmed beta in any way [23:58:35] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [23:58:35] RECOVERY - HTTP error ratio anomaly detection on labmon1001 is OK: OK: No anomaly detected [23:58:51] bd808: regarding that bug report - i know, and even mentioned that in the comment, and yet, interesting results.