[00:01:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:02:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.044 second response time [00:32:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:33:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [01:16:28] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:28] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [01:27:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:28:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.893 second response time [01:33:42] (03PS1) 10Reedy: Super secret Wikidata logo for Wikimania HK 2013 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78329 [01:33:58] (03PS2) 10Reedy: Super secret Wikidata logo for Wikimania HK 2013 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78329 [01:34:06] (03CR) 10Reedy: [C: 032] Super secret Wikidata logo for Wikimania HK 2013 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78329 (owner: 10Reedy) [01:34:15] (03Merged) 10jenkins-bot: Super secret Wikidata logo for Wikimania HK 2013 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78329 (owner: 10Reedy) [01:34:48] :) [01:34:57] nobody look [01:35:24] !log reedy synchronized wmf-config 'Sync-ing for consistency (no changes)' [01:35:38] Logged the message, Master [01:35:54] I see what you did, there... [01:36:36] shhhhhhhhhhhhhhhhhush! [02:05:28] !log LocalisationUpdate completed (1.22wmf12) at Fri Aug 9 02:05:28 UTC 2013 [02:05:41] Logged the message, Master [02:11:55] That's weird to see in real time [02:12:26] well, ignoring usually being awake at that time.. [02:15:21] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 9 02:15:21 UTC 2013 [02:15:32] Logged the message, Master [02:22:52] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [02:26:27] mutante: Just an FYI, might want to hold on for creating the new wikis [02:26:50] Need to get a couple of wikidata related things documented to make sure we set it up right [03:15:54] (03PS1) 10Reedy: Losslessly compress WikidataHK.png [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78331 [03:16:12] (03CR) 10Reedy: [C: 032] Losslessly compress WikidataHK.png [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78331 (owner: 10Reedy) [03:16:21] (03Merged) 10jenkins-bot: Losslessly compress WikidataHK.png [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78331 (owner: 10Reedy) [03:18:01] !log reedy synchronized docroot/wikidata/WikidataHK.png 'Compress' [03:18:12] Logged the message, Master [03:33:41] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [03:33:41] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [03:33:41] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [03:33:41] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [03:33:41] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [03:33:42] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [03:52:01] RECOVERY - Puppet freshness on mchenry is OK: puppet ran at Fri Aug 9 03:51:51 UTC 2013 [03:57:10] (03PS1) 10Reedy: Add tyv to langlist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78332 [03:57:38] mutante: ^ For new langcodes, does something need doing on the DNS side too? [03:57:47] (03CR) 10Reedy: [C: 032] Add tyv to langlist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78332 (owner: 10Reedy) [03:57:54] (03Merged) 10jenkins-bot: Add tyv to langlist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78332 (owner: 10Reedy) [04:01:04] !log reedy synchronized langlist [04:01:14] Logged the message, Master [04:01:56] !log reedy synchronized langlist 'attempt 2' [04:02:07] Logged the message, Master [04:03:20] Reedy: yeah [04:05:25] LANGLISTSOURCE=fenari.wikimedia.org:/home/wikipedia/common/langlist [04:05:28] broken I guess :) [04:05:41] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [04:05:41] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [04:05:41] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [04:05:41] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [04:05:41] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [04:05:42] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [04:05:44] I'll fix up manually [04:08:01] !log authdns-update: added "tyv" to langlist [04:08:06] Reedy: done [04:08:11] Logged the message, Master [04:18:37] (03PS3) 10Reedy: (bug 49328) Create Wikipedia in Tyvan language [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/68188 (owner: 10Odder) [04:26:09] Haha [04:26:13] what? [04:26:26] That file should still exist, but in /h/w/c it would be out of date by default [04:26:39] it doesn't [04:26:46] it says common-moved-to-tin [04:26:51] /home/wikipedia/common-before-tin/langlist [04:26:52] yeah.. [04:27:04] it doesn't matter anyway [04:27:09] I'm changing all that [04:27:11] Thanks [04:27:15] (for doing it manually) [04:27:28] we won't have automatic sync from mediawiki but we'll have it in gerrit [04:27:46] I think it's kinda scary to change one file on fenari and automatically have DNS change anyway :) [04:28:08] heh [04:28:16] At least it's not a common action [04:39:39] paravoid: Can you add group write to /home/wikipedia/common-before-tin/docroot/foundation/presentations/ please? [04:39:50] (recursively) [04:40:34] chmod g+w -R /home/wikipedia/common-before-tin/docroot/foundation/presentations/ [04:40:36] I guess.. [04:42:53] doesn't the "before tin" bit imply this isn't used anymore? [04:43:29] (done) [04:57:50] LeslieCarr: ping? [05:02:07] hexmode: hey [05:02:09] what's up [05:02:45] LeslieCarr: I'm looking for you 'cause I have someone who could use your help [05:03:01] oh, i totally jetlagged and just woke up, 3 minutes ago [05:03:06] 13 hours of sleep :( [05:03:13] LeslieCarr: gotcha [05:03:29] i'll probably be over there by 2:30 ? [05:03:38] LeslieCarr: when you get around, could you let me know :) C u then [05:16:11] Reedy: nope, DNS already has it. vi.wikivoyage.org is an alias for wikivoyage-lb.wikimedia.org. it's based on langlist, so the code just needed to exist for any other project before [05:16:35] only needs interaction if it's a language never used before in any project [05:16:48] and i think WP has always been first [05:49:37] (03PS1) 10Dzahn: add timezone for vi.wikivoyage (bug 52034) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78347 [05:50:32] (03CR) 10Dzahn: "so the current name of Saigon is Ho Chi Minh City, but : /usr/share/zoneinfo/Asia/Ho_Chi_Minh: symbolic link to `Saigon` ..." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78347 (owner: 10Dzahn) [06:18:07] mutante: paravoid added it ;) [06:57:57] (03CR) 10Cheers!: "Patch Set 1:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78347 (owner: 10Dzahn) [07:10:47] (03CR) 10Reedy: "http://www.php.net/manual/en/timezones.asia.php" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78347 (owner: 10Dzahn) [07:23:32] Reedy: ah, i didn't get you needed tyv, i thought it was about vi [07:39:09] (03PS1) 10Dzahn: wgSitename for vi.wikivoyage (bug 52034) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78355 [07:39:49] (03CR) 10Dzahn: "..or is this supposed to be in Vietnamese, then please add a patch set" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78355 (owner: 10Dzahn) [07:40:28] mutante: Need to poke wikidata to confirm anything extra we might need to do before creating those wikis [07:54:43] Reedy: ack, trying to poke them via gerrit c patch sets :) [07:55:03] for namespace translations might poke Siebrand [08:11:56] !log apt-get upgrading zirconium [08:12:07] Logged the message, Master [08:20:02] PROBLEM - DPKG on mw1130 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:20:52] PROBLEM - Apache HTTP on mw1130 is CRITICAL: Connection refused [08:28:46] importing Etherpads to Etherpad lite using convert script.. fail [08:29:17] TypeError: undefined is not a function [08:37:42] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [08:38:42] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [08:40:19] hehe [08:40:23] i don't know if ti will actually be possible [08:40:28] are we seriously against just making a clean sweep [08:40:35] warning folks that it's all going away in 30 days [08:40:52] RECOVERY - Apache HTTP on mw1130 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.070 second response time [08:41:02] PROBLEM - Apache HTTP on mw1085 is CRITICAL: Connection refused [08:41:03] LeslieCarr: i hope that's an option, yea [08:41:15] still trying to ask #etherpad-lite-dev though [08:41:25] or well.. at least report the bug [08:41:56] RECOVERY - Apache HTTP on mw1085 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [08:41:56] RECOVERY - DPKG on mw1130 is OK: All packages OK [08:42:24] dislikes the usage of npm to install software [08:42:47] "Note: If you get a message "Error: Cannot find module 'mysql'". npm install mysql You may need to do this for a number of different packages, including ueberDB and async." :p [08:42:52] "may".. "a number of" [08:43:02] ueberDB ? [08:43:59] and yeah, not getting any "cannot find module" error, but the one at Object. :p [08:44:02] at Object. (/usr/share/etherpad-lite/bin/convert.js:37:17) [08:51:16] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [08:55:36] PROBLEM - HTTPS on sodium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:36] PROBLEM - HTTP on sodium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:56:26] RECOVERY - HTTPS on sodium is OK: OK - Certificate will expire on 08/22/2015 22:23. [08:56:26] RECOVERY - HTTP on sodium is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 190 bytes in 0.001 second response time [08:58:48] (03CR) 10Nemo bis: "Why did this change re-enable AFT on en.wiki without notice?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/72496 (owner: 10Matthias Mullie) [09:24:39] (03CR) 10Matthias Mullie: "TL;DR: enwiki wanted to get rid of large-scale AFTv5 deployment, but could still have AFT on a small minority. dewiki wanted to get rid of" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/72496 (owner: 10Matthias Mullie) [09:30:19] !log Assigned IPv4 mapped IPv6 address to mchenry [09:30:30] Logged the message, Master [09:33:20] (03PS1) 10Mark Bergsma: Add IPv4 mapped IPv6 address to sodium [operations/puppet] - 10https://gerrit.wikimedia.org/r/78370 [09:33:33] (03CR) 10Nemo bis: "Thanks, copied your answer to https://bugzilla.wikimedia.org/show_bug.cgi?id=45538#c17 where it's better to continue." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/72496 (owner: 10Matthias Mullie) [09:38:01] (03CR) 10Mark Bergsma: [C: 032] Add IPv4 mapped IPv6 address to sodium [operations/puppet] - 10https://gerrit.wikimedia.org/r/78370 (owner: 10Mark Bergsma) [10:19:30] (03CR) 10Dzahn: [C: 032] Add remaining skins (Cologne Blue and Modern) to sync script. [operations/puppet] - 10https://gerrit.wikimedia.org/r/69444 (owner: 10Mattflaschen) [10:24:16] (03CR) 10Dzahn: "yea, and because that link didn't have Hanoi i didn't use Hanoi" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78347 (owner: 10Dzahn) [11:02:27] 01:29 < mutante> importing Etherpads to Etherpad lite using convert script.. fail [11:02:30] 01:29 < mutante> TypeError: undefined is not a function [11:12:12] mutante: TypeError: Object # has no method 'Client' [11:12:22] maybe an old mysql node.js library ? [11:12:29] or a new one ... [11:13:35] more a new one then, i had to install it with npm [11:14:37] npm ls says now [11:14:38] ├─┬ mysql@2.0.0-alpha8 extraneous [11:14:41] ???? [11:16:03] eh.. [11:16:05] └── (empty) [11:16:13] on zirconium ? [11:16:54] depends on the cwd [11:17:01] ah [11:17:02] for some reason i don't understand... [11:17:31] │ └─┬ mysql@0.9.2 [11:17:43] which cwd ? [11:17:46] /usr/share/etherpad-lite [11:18:25] weird... for me it has the extraneous line and 2.0.0-alpha8... [11:18:38] oh, that is under that "ueberDB" stuf [11:18:50] a yeah.. [11:18:52] wtf? [11:18:57] ├─┬ ueberDB@0.1.1 [11:18:57] │ ├── channels@0.0.2 [11:18:57] │ ├─┬ dirty@0.9.2 [11:18:57] │ │ └── gently@0.9.1 [11:18:57] │ └─┬ mysql@0.9.2 [11:19:28] "dirty", "hashish" and "uglify" ,, nice names haha [11:19:59] yea, don't we dislike the whole npm thing :p [11:20:35] lol [11:22:05] here's the good thing: [11:22:08] 03:48 < JohnMcLear2> mutante: so i might be able to free up a few hours on Sunday, I will keep Mark posted [11:22:27] and that is somebody who is a #etherpad-lite-dev and will be in HKG [11:23:01] maybe we'll also get a newer version than http://apt.wikimedia.org/wikimedia/pool/main/e/etherpad-lite/ [11:23:25] and i still need to get this to build: https://gerrit.wikimedia.org/r/#/c/76654/ sigh [11:24:30] 2011? that is an old deb... [11:25:12] the deb was in apt, but the /debian/ files were not [11:25:23] so i just took them from existing [11:25:33] eh, i mean, they were not in git [11:43:31] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:49:21] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [12:00:46] (03PS1) 10TTO: Set up import sources and user groups for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78377 [12:36:04] apergos: I haven't seen the size of the whole config file. I hear it's messy. :-) [13:21:26] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:31:26] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [13:33:56] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [13:33:56] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [13:33:56] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [13:33:56] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [13:33:56] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [13:33:57] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [13:34:12] (03PS1) 10Mark Bergsma: Bind GitBlit http service to all interfaces, shield off with iptables [operations/puppet] - 10https://gerrit.wikimedia.org/r/78381 [13:34:26] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:20] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [13:38:30] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:39:02] (03CR) 10Mark Bergsma: [C: 032] Bind GitBlit http service to all interfaces, shield off with iptables [operations/puppet] - 10https://gerrit.wikimedia.org/r/78381 (owner: 10Mark Bergsma) [13:39:20] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [13:42:19] (03PS1) 10Mark Bergsma: Remove protocol parameter, doesn't exist [operations/puppet] - 10https://gerrit.wikimedia.org/r/78382 [13:43:13] (03CR) 10Mark Bergsma: [C: 032] Remove protocol parameter, doesn't exist [operations/puppet] - 10https://gerrit.wikimedia.org/r/78382 (owner: 10Mark Bergsma) [13:58:57] (03PS1) 10Mark Bergsma: Remove iptables stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/78384 [13:59:53] (03CR) 10Mark Bergsma: [C: 032] Remove iptables stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/78384 (owner: 10Mark Bergsma) [14:04:30] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:06:10] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [14:06:10] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [14:06:10] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [14:06:10] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [14:06:10] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [14:06:11] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [14:10:22] (03PS2) 10Mark Bergsma: Add an initial ferm module & base::firewall class [operations/puppet] - 10https://gerrit.wikimedia.org/r/61744 (owner: 10Faidon) [14:13:53] is there anyone around who can review some puppet commits for me? [14:14:16] yes [14:19:40] PROBLEM - NTP on pdf2 is CRITICAL: NTP CRITICAL: No response from NTP server [14:22:10] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [14:22:36] (03CR) 10Mark Bergsma: [C: 032] Turn down elasticsearch heap usage in production. [operations/puppet] - 10https://gerrit.wikimedia.org/r/78244 (owner: 10Manybubbles) [14:22:55] thanks @mark! [14:23:07] the other one is https://gerrit.wikimedia.org/r/#/c/76196/ is you have time. [14:23:10] anything else? [14:23:10] ok [14:25:21] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [14:25:30] RECOVERY - NTP on pdf2 is OK: NTP OK: Offset -0.006481051445 secs [14:26:31] (03CR) 10Mark Bergsma: [C: 032] Fix in process runjobs in singlenode mediawiki. [operations/puppet] - 10https://gerrit.wikimedia.org/r/76196 (owner: 10Manybubbles) [14:26:51] thanks so much mark! [14:49:23] (03PS3) 10Mark Bergsma: Add an initial ferm module & base::firewall class [operations/puppet] - 10https://gerrit.wikimedia.org/r/61744 (owner: 10Faidon) [14:49:54] (03CR) 10jenkins-bot: [V: 04-1] Add an initial ferm module & base::firewall class [operations/puppet] - 10https://gerrit.wikimedia.org/r/61744 (owner: 10Faidon) [14:50:20] or perhaps we should just have puppet install that minimal ruleset in conf.d if desired [14:55:29] (03PS4) 10Mark Bergsma: Add an initial ferm module & base::firewall class [operations/puppet] - 10https://gerrit.wikimedia.org/r/61744 (owner: 10Faidon) [14:57:52] (03PS5) 10Mark Bergsma: Add an initial ferm module & base::firewall class [operations/puppet] - 10https://gerrit.wikimedia.org/r/61744 (owner: 10Faidon) [14:58:03] (03PS1) 10Petr Onderka: Deleting deleted pages [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/78393 [14:59:13] paravoid: what do you think, can I merge patchset 5? [15:01:36] looking [15:02:15] (03CR) 10Hashar: "Thanks :-]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75632 (owner: 10Hashar) [15:02:41] lemme do a few ferm-related changes first [15:06:09] (03PS6) 10Faidon: Add an initial ferm module & base::firewall class [operations/puppet] - 10https://gerrit.wikimedia.org/r/61744 [15:12:17] (03CR) 10Faidon: [C: 032] Add an initial ferm module & base::firewall class [operations/puppet] - 10https://gerrit.wikimedia.org/r/61744 (owner: 10Faidon) [15:12:20] ack for merge? [15:13:29] ack [15:14:12] it's completely untested but wth :) [15:14:23] who cares, it doesn't do anything [15:14:31] yeah I merged already :) [15:14:41] i'm gonna test it now ;) [15:14:52] I know ;) [15:15:03] btw, if you need to test rules [15:15:37] ferm --slow --noexec --lines [15:15:44] outputs the iptables commands [15:16:17] oh and ferm --interactive is basically commit and-confirm [15:20:17] (03PS1) 10Mark Bergsma: Add internal subnet definitions [operations/puppet] - 10https://gerrit.wikimedia.org/r/78394 [15:20:44] no need for _V4 _V6 I think [15:20:57] I had those since < 2.2 wasn't so smart about "saddr" [15:21:14] can you think of anywhere where we'd use one and not the other? [15:21:25] (03PS1) 10Mark Bergsma: Firewall GitBlit HTTP port 8080 from the outside world [operations/puppet] - 10https://gerrit.wikimedia.org/r/78395 [15:21:31] does it hurt to have them though? [15:21:35] no [15:21:50] so that^ should work? [15:22:30] I don't need the ( ) there eh [15:22:34] as it's not an array [15:22:48] but... [15:22:51] I do need localhost I guess [15:22:59] should have a definition for that also [15:23:10] my default policy had interface lo ACCEPT; :-) [15:23:56] yeah [15:23:59] but yeah, I think it should work [15:24:03] -1 for whitespace though :P [15:24:08] tab vs. spaces [15:24:10] ah right [15:24:12] annoying [15:24:58] So I'm going to try looking into hooking Elasticsearch to Ganglia today. It looks like there is a module to do it some big repository called "gmond_python_modules [15:24:58] (03PS2) 10Mark Bergsma: Firewall GitBlit HTTP port 8080 from the outside world [operations/puppet] - 10https://gerrit.wikimedia.org/r/78395 [15:25:05] do we import that anywhere? [15:25:23] (03PS3) 10Mark Bergsma: Firewall GitBlit HTTP port 8080 from the outside world [operations/puppet] - 10https://gerrit.wikimedia.org/r/78395 [15:25:53] manybubbles: you mean there's a puppet module out there that does it? [15:26:33] mark: wait a sec [15:26:39] mark: nah - I mean module in the generic sense. [15:26:54] I think you're generically vague ;-) [15:27:20] what do you mean? [15:27:27] okay, won't work but that's because of a bug of mine :) [15:27:28] mark: well the description of the repository is "Repository of user-contributed Gmond Python DSO metric modules" [15:27:44] lemme push something else first [15:28:00] manybubbles: ah, you've found some public repository of gmond python modules somewhere [15:28:08] no, we don't import that entire repository [15:28:24] of course you can import that gmond python module and install it with puppet [15:29:00] @mark: cool. it is "the official repository". I'll have a look at importing what we need with puppet then. [15:29:29] er [15:29:32] "importing with puppet"? [15:29:37] no, import manually, install with puppet [15:30:14] mark: I think I'm not clear on the difference between those two terms. both sound to me like "shove the files in the right place to make ganglia happy" [15:30:30] but I assume I'm just knowledgable enough about something [15:30:36] "importing with puppet" sounds to me like "make puppet download from that third party repository and install automatically" :) [15:30:41] which we don't allow [15:30:52] what you can do is manually grab those files, submit in a patchset through gerrit [15:31:00] then we will review, and every change to them will be reviewed as well [15:31:13] (03PS1) 10Faidon: ferm: don't automatically add semicolons [operations/puppet] - 10https://gerrit.wikimedia.org/r/78396 [15:31:57] this is going to bite us [15:32:03] what is? [15:32:11] we'll forget trailing semicolons [15:32:36] so, the { } block assumes end of statement, and empty statements aren't allowed [15:32:49] so "proto tcp dport 8080 { saddr $INTERNAL ACCEPT; DROP; };" is invalid [15:33:12] I had puppet add the semicolon at the end, so this failed [15:33:29] but now we have to write all rules with the semicolon in the call site :) [15:33:35] mark: I know we don't use pip or any of its brothers. All the files will sit within puppet statically. [15:33:50] (03CR) 10Faidon: [C: 032 V: 032] ferm: don't automatically add semicolons [operations/puppet] - 10https://gerrit.wikimedia.org/r/78396 (owner: 10Faidon) [15:33:54] paravoid: that's fine, but perhaps include a check to see if the rule ends with ; ? [15:34:03] manybubbles: yes that's fine [15:34:05] yours isn't and shouldn't be [15:34:10] that's the point :) [15:34:14] ok [15:34:22] I'll work with upstream to convince him to allow empty statements... [15:34:23] i don't know ferm syntax yet, so dunno :) [15:35:05] http://p.defau.lt/?GdZpTo_MWXP4aJRbhkv3Wg [15:35:08] that's the output of your rule [15:35:34] where are localhost? [15:36:00] ok, of PS1 of your rule [15:36:02] :P [15:36:05] alright [15:36:09] so I can merge? [15:37:03] yep [15:37:45] (03CR) 10Mark Bergsma: [C: 032] Add internal subnet definitions [operations/puppet] - 10https://gerrit.wikimedia.org/r/78394 (owner: 10Mark Bergsma) [15:38:00] (03CR) 10Mark Bergsma: [C: 032] Firewall GitBlit HTTP port 8080 from the outside world [operations/puppet] - 10https://gerrit.wikimedia.org/r/78395 (owner: 10Mark Bergsma) [15:38:13] grrrr [15:38:24] i hate etherpad... lite or not ... [15:38:45] (03CR) 10Cheers!: "If Asia/Hanoi isn't in your list, you can use Asia/Ho_Chi_Minh. Best Regards," [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78347 (owner: 10Dzahn) [15:39:22] that convert.js script... 2 hours later and still it parses stuff wrongly (and i think it is not -lite's fault( [15:39:24] ) [15:41:49] mark: thanks for the push [15:42:27] i need to push puppet [15:42:32] as in finishing this, not as in "git push" [15:42:40] :-) [15:42:41] it's also slow in finishing this [15:45:42] * paravoid bites nails [15:48:40] it added a different bunch of default rules [15:48:44] perhaps defaults from the ferm package? [15:48:50] and I think it tried to restart ferm but failed [15:48:52] so those rules are still there [15:49:08] i'll debug in a bit [15:49:13] the default rules don't seem to do harm [15:50:48] hm, maybe I don't do refresh? [15:51:35] oh, no, it's syntactically invalid [15:51:38] I know why :) [15:51:42] god i think i fixed it [15:51:45] you added $INTERNAL to defs [15:51:49] I must be wrong...... [15:51:53] but you didn't include base::firewall which installs defs [15:54:36] ah that's right [15:54:42] i should call base::firewall instead of ferm directly [15:54:54] sorry, i'm creating a travel profile in the mean time [15:56:08] (03PS1) 10Mark Bergsma: Include base::firewall to get default definitions [operations/puppet] - 10https://gerrit.wikimedia.org/r/78397 [15:56:58] (03CR) 10Mark Bergsma: [C: 032] Include base::firewall to get default definitions [operations/puppet] - 10https://gerrit.wikimedia.org/r/78397 (owner: 10Mark Bergsma) [15:59:29] PROBLEM - NTP on antimony is CRITICAL: NTP CRITICAL: No response from NTP server [16:00:20] RECOVERY - NTP on antimony is OK: NTP OK: Offset -0.0008763074875 secs [16:00:23] heh [16:00:44] looks great to me now [16:04:34] thanks for ferm, it looks great :) [16:11:07] it's very sensible in general [16:11:44] I don't expect us to do much with it [16:11:48] mostly open/close ports [16:18:29] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [16:21:14] 10% converted and only one pad skipped :-) [16:21:39] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:22:19] woo, nice [16:22:37] are you merging with etherpad.wmflabs too? [16:23:01] supposedly... [16:23:55] so epl = etherpad.wikimedia + etherpad.wmflabs + epl ? [16:24:31] (03PS1) 10Mark Bergsma: Setup SSL terminators on the misc Varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/78400 [16:24:40] http://etherpad.wikimedia.org/4J8fHuL0yB that's the one pad convert.js does not like [16:24:54] i am trying hard to understand what it does there .... [16:25:23] (03CR) 10Mark Bergsma: [C: 032] Setup SSL terminators on the misc Varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/78400 (owner: 10Mark Bergsma) [16:25:47] cyrillic? [16:26:12] wtf is that anyway?!? [16:26:32] something about a roleplaying game ??? [16:27:09] wtf [16:30:12] (03PS1) 10Mark Bergsma: Fix certificate dependency [operations/puppet] - 10https://gerrit.wikimedia.org/r/78402 [16:31:08] (03CR) 10Mark Bergsma: [C: 032] Fix certificate dependency [operations/puppet] - 10https://gerrit.wikimedia.org/r/78402 (owner: 10Mark Bergsma) [16:33:57] (03PS1) 10Faidon: ferm: move policy decisions to base::firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/78403 [16:34:00] mark: ^ [16:34:50] the motivation and side-benefit of that is that you get "interface lo ACCEPT" in your setup as well [16:35:01] so no need to accept 127.0.0.1 [16:35:19] (03CR) 10Mark Bergsma: [C: 031] ferm: move policy decisions to base::firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/78403 (owner: 10Faidon) [16:35:27] yep i was thinking along the same lines [16:35:38] (03PS2) 10Faidon: ferm: move policy decisions to base::firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/78403 [16:36:37] (03CR) 10Faidon: [C: 032 V: 032] ferm: move policy decisions to base::firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/78403 (owner: 10Faidon) [16:36:40] cool [16:36:44] I like this [16:36:51] me too [16:37:05] of course I'll like it better when we start moving things to default drop :P [16:37:16] of course [16:37:45] it's funny, your misc varnish cluster is going to help make this less relevant [16:37:57] as we'll be able to move servers like antimony to internal [16:38:03] but we merged ferm as part of this [16:38:04] antimony has things like git replication [16:38:06] so not entirely ;) [16:38:07] how ironic :) [16:38:09] github etc [16:38:16] but sure, it helps [16:38:18] oh that's not gerrit anymore? [16:38:29] maybe it can be internal [16:38:33] i'm not sure, didn't look into it [16:38:51] I meant in general though [16:38:56] yeah I know [16:39:19] that one tcp port blocking was just a long term itch that I wanted to fix now [16:39:24] nod [16:39:37] the added benefit is that ferm gets a bit of traction again ;) [16:39:46] yeah [16:40:00] I was thinking of starting with my baby ceph, but it didn't have much point there [16:40:15] it's also having problems peering enough without me putting firewalls in the middle :P [16:40:23] hehe [16:40:28] so my target was to pick it up for the DNS servers [16:40:41] those hardly need ferm [16:41:12] it's all these damn, badly managed misc servers that need it the most ;) [16:41:15] well yeah, I just wanted to start with a case study, so that others would use it too [16:41:20] sure [16:42:01] feel free to make antimony default drop now if demon is involved in identifiying traffic flows [16:42:22] netstat helps :) [16:54:13] <^d> paravoid: Did we ever get an answer on testsearch1001? [16:54:28] no [16:54:39] I replied to an RT we have about similar issues [16:54:43] cmjohnson1: ping? [16:55:10] <^d> mark: Replication is all from manganese -> antimony, gallium, github. [16:55:12] 46 pads not converted... [16:56:03] <^d> mark: So we can most likely move antimony internal :) [16:56:23] he left, but good to know :) [18:35:42] ^d: manybubbles now that I'm no longer sickly, need any halp on search stuffz? [18:36:16] notpeter: hmmmmz - I'm working on monitoring now but it is ez. what about that machine in tampa? [18:36:31] ah, the mw box [18:36:35] yes, will get that happy [18:37:42] notpeter: glad you're feeling better [18:38:10] yeah - I'm glad too. sickness sucks [18:38:34] greg-g: manybubbles tanks! [18:38:53] where?! [18:38:55] * greg-g ducks [18:39:32] !log running originals swift->ceph sync (syncFileBackend.php on terbium) [18:39:41] <^d> notpeter: Just to catch you up too...testsearch1001 is very sick. [18:39:44] Logged the message, Master [18:40:09] <^d> paravoid thinks it's either hardware or power settings or something. But I don't know much beyond that :) [18:40:48] I think It's RT #5555 [18:40:49] I replied there [18:40:54] okie dokie [18:40:57] <^d> Ah yes, I meant to read that. [18:40:57] will take a peak [18:44:59] <^d> Also, manybubbles has a change in to puppet to lower ES's heap to 7G instead of 30G :) [18:45:28] I believe mark merged that for me this morning. [18:45:38] <^d> Ah, I hadn't looked [18:46:10] <^d> Ah yes, I see that now. [18:47:14] <^d> Ah, clearly shows where ES started running: http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Miscellaneous+eqiad&h=testsearch1002.eqiad.wmnet&tab=m&vn=&mc=2&z=small&metric_group=ALLGROUPS [18:48:12] !log running ceph->swift thumb sync on ms-fe1002 [18:48:23] Logged the message, Master [18:52:11] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [18:56:34] <^d> notpeter, manybubbles: So, I think we're still on target to hit the last week of August. You guys have a preference on day/time for the rollout for mw.org? I'd like to go ahead and commit to a deployment window. [18:57:04] I'll be at the burneyman, so it's all the same to me ;) [18:57:13] :) [18:57:38] ^d: your morning is better for me but otherwise I'm good [19:03:06] (03PS1) 10Manybubbles: Setup metrics collection for elasticserch [operations/puppet] - 10https://gerrit.wikimedia.org/r/78414 [19:07:01] PROBLEM - DPKG on mw131 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:07:58] (03PS2) 10Manybubbles: Setup metrics collection for elasticserch [operations/puppet] - 10https://gerrit.wikimedia.org/r/78414 [19:08:01] RECOVERY - DPKG on mw131 is OK: All packages OK [19:10:18] so the elasticsearch monitoring thingy I've found comes with a json file for a new graph type. I can't find any infrastructure around installing such files in puppet. Am I not seeing it or do we not do it? [19:10:41] RECOVERY - Apache HTTP on mw131 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.400 second response time [19:11:31] <^d> manybubbles: I'm gonna go for the 27th. [19:11:51] ^d: sounds good! [19:12:31] RECOVERY - twemproxy process on mw131 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:18:00] manybubbles: d^ ok, mw131 is up and can render pages [19:18:07] go forth and livehack [19:18:13] thanks! [19:18:41] yep! [19:30:58] notpeter: I just finished up elasticsearch->ganglia metric collection and added you as a reviewer. I'm less comfortable with our nagios setup. is that something you can take a look at? [19:31:13] sure [19:33:45] thanks! [19:48:52] (03PS7) 10Ottomata: Adding role/analytics/kafka.pp Also adding modules/kafka [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 [20:04:28] (03PS8) 10Ottomata: Adding role/analytics/kafka.pp Also adding modules/kafka [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 [21:41:45] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [21:42:45] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [22:10:19] (03PS4) 10Dr0ptp4kt: Adding Wikipedia Zero automation testing servers to XFF whitelist. [operations/puppet] - 10https://gerrit.wikimedia.org/r/74509 [22:41:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:43:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 2.360 second response time [23:34:56] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [23:34:56] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [23:34:56] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [23:34:56] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [23:34:56] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [23:34:57] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours