[00:02:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 00:02:39 UTC 2013 [00:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [00:04:00] RECOVERY - DPKG on db1048 is OK: All packages OK [00:15:21] bd808: There's a bot that capable of relaying changes. [00:33:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 00:33:26 UTC 2013 [00:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [01:06:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 01:06:45 UTC 2013 [01:07:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [01:25:28] (as I mentioned on #wikimedia-dev) git.wikimedia.org seems down, proxy error reading a blob URL and timeout accessing https://git.wikimedia.org [01:30:46] spagewmf: yeah, chad knows :/ [01:30:53] spagewmf: apparently bots aren't being nice [01:31:00] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [01:31:00] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [01:31:00] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [01:31:00] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [01:31:00] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [01:31:01] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [01:31:12] greg-g thx for the update, no worries [01:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 01:32:45 UTC 2013 [01:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:01] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 02:03:19 UTC 2013 [02:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [02:08:47] !log LocalisationUpdate completed (1.22wmf12) at Wed Aug 7 02:08:47 UTC 2013 [02:08:59] Logged the message, Master [02:22:30] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Aug 7 02:22:30 UTC 2013 [02:22:41] Logged the message, Master [02:33:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 02:33:21 UTC 2013 [02:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [03:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 03:02:38 UTC 2013 [03:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [03:33:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 03:33:19 UTC 2013 [03:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [03:35:03] (03CR) 10Dzahn: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/71968 (owner: 10Hashar) [03:56:00] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [04:05:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 04:05:45 UTC 2013 [04:06:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:33:00] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 04:32:51 UTC 2013 [04:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:35:56] (cross-posted from #wikimedia-dev) Are production wfDebug logs archived, and for how long? [04:36:53] I'd like to help a Wikisym researcher answer questions about edit conflicts, and the debug logs seem to be the only place I can find this information. [04:37:27] Obviously, they are sensitive and we would have to review any data before publishing. [05:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 05:02:39 UTC 2013 [05:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [05:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 05:32:46 UTC 2013 [05:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [05:56:10] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 06:02:41 UTC 2013 [06:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [06:03:43] (03PS1) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [06:04:19] (03CR) 10jenkins-bot: [V: 04-1] Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 (owner: 10Yuvipanda) [06:21:40] PROBLEM - Disk space on analytics1023 is CRITICAL: DISK CRITICAL - free space: / 1069 MB (3% inode=90%): [06:28:54] (03PS1) 10Rangilo Gujarati: Request filed at https://meta.wikimedia.org/w/index.php?title=Planet_Wikimedia&oldid=5668583 [operations/puppet] - 10https://gerrit.wikimedia.org/r/78009 [06:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 06:32:42 UTC 2013 [06:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [06:37:00] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [06:48:00] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [07:01:16] (03PS2) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [07:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 07:02:35 UTC 2013 [07:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:07:22] (03PS1) 10ArielGlenn: switch primary and secondary in rsync between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78011 [07:08:06] (03PS3) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [07:08:14] git.wikimedia.org appears to be done. is that a known known? [07:08:26] it is? [07:08:39] 503 with guru meditation from varnish [07:09:05] https://dpaste.de/XWJEA/raw/ [07:10:57] I take it retrying gts you the same thing? [07:11:23] * apergos tries looking at a random commit [07:13:04] not giving me an error, nor doing anything else useful [07:15:05] yeah, and actually i heard someone grumbling about that earlier [07:15:27] at the time i assumed s/he reported it, but then i thought i had better check [07:15:44] now I got HTTP/1.1 503 Service Temporarily Unavailable [07:16:07] !log restarted gitblit, (btw the init script did not stop it properly) [07:16:13] try again. [07:16:18] you were too quick on the trigger [07:16:19] Logged the message, Master [07:16:29] worked. [07:16:37] thanks apergos [07:16:48] thanks for reporting it. ganglia showed nothing unusual [07:16:48] i'll file a bug for the init script [07:16:54] which means no load spike or any of that [07:17:01] it might not be the init script [07:17:13] i didn't think it was, but that's a bug regardless [07:17:34] it could be that [07:17:43] Didn't Chad fix that recently? [07:17:48] Or at least, made a changeset to do so [07:17:53] well, in that case I'll leave it to you to file the bug [07:18:14] simply because it sounds like the console log might be handy [07:18:19] Reedy: dunno [07:18:24] ori-l: apergos: This might be related https://gerrit.wikimedia.org/r/77909 [07:18:42] yeah, they were talking about this in the channel yesterday [07:18:53] now that you bring it up [07:19:04] we were kind of hesitant though to block google.... [07:19:12] understandable [07:19:16] Reedy: google is hammering it [07:19:21] I wonder if we can get em to jut slow down their craler [07:19:23] I saw that earlier [07:19:23] that would be enough [07:19:41] * apergos looks around at their google contacts... wrong tz but I'll ping someone [07:19:49] unless your "did d fix that" was talking about the int script [07:20:28] apergos: well gitblit does support lucene indexing (its just broken on our install afaik) so google doesn't really need to index it at all [07:20:43] and there is also github for full text search as well [07:22:08] I think we might has well have the results show up there, anyways I'll see what people say over there [07:22:20] it will take a few days likely to get feedback [07:23:46] (03PS1) 10Lcarr: updated to text-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/78015 [07:23:49] hmmm maybe crawling that zip|gz|bzip2 link for every dir in every commit (or wherever else) is what is killing it [07:29:52] (03PS3) 10Dzahn: the existing etherpad-lite_1.0-wm2 package in a operations/debs repo for completeness [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76654 [07:31:24] (03PS1) 10TTO: Give testwiki some custom namespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78016 [07:32:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 07:32:49 UTC 2013 [07:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:38:48] LeslieCarr: what were you saying about esams/europe issues? I have an odd one, images not showing up for me on the projects (either originals or scaled) but works for retrieval from the u.s. (failure to retrieve anything from upload.wm) [07:40:54] and I see a new core dump on cr2 from 10am but who knows what that actually does [07:42:44] (03CR) 10ArielGlenn: [C: 032] switch primary and secondary in rsync between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78011 (owner: 10ArielGlenn) [07:43:47] (03CR) 10TTO: [C: 031] "All languages for which VE is enabled have these messages (except ru lacking 'visualeditor-beta-appendix' - but I don't think that is used" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [07:46:10] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:51:10] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [07:55:56] (03PS1) 10ArielGlenn: wikivoyage-lb using text-varnish so updated checks accordingly (rt 5581) [operations/puppet] - 10https://gerrit.wikimedia.org/r/78017 [08:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 08:02:40 UTC 2013 [08:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:07:39] (03CR) 10Hashar: [C: 031 V: 032] "This will essentially produce the same console output to scap operators, so as far as I am concerned this is not going to disrupt my workf" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 (owner: 10Pyoungmeister) [08:08:08] (03PS1) 10TTO: Set up flood flag for zhwikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78018 [08:09:09] (03PS4) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [08:11:48] (03PS5) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [08:12:18] (03CR) 10ArielGlenn: [C: 032] wikivoyage-lb using text-varnish so updated checks accordingly (rt 5581) [operations/puppet] - 10https://gerrit.wikimedia.org/r/78017 (owner: 10ArielGlenn) [08:26:00] RECOVERY - Puppet freshness on neon is OK: puppet ran at Wed Aug 7 08:25:50 UTC 2013 [08:32:41] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 08:32:36 UTC 2013 [08:33:41] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:49:13] (03CR) 10coren: [C: 031] "The tool labs stuff is okay by me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [08:51:05] (03PS1) 10Yuvipanda: Read routing tables from Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/78025 [08:54:41] PROBLEM - Disk space on analytics1023 is CRITICAL: DISK CRITICAL - free space: / 1069 MB (3% inode=90%): [08:55:38] (03CR) 10Tim Landscheidt: "What's the licence of modules/labsproxy/files/redis.lua?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 (owner: 10Yuvipanda) [09:02:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 09:02:43 UTC 2013 [09:03:41] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:19:54] (03PS4) 10Dzahn: the existing etherpad-lite_1.0-wm2 package in a operations/debs repo for completeness [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76654 [09:20:25] (03Abandoned) 10Dzahn: RT #5464 - apply etherpad-lite live hack fix by apergos [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76661 (owner: 10Dzahn) [09:21:38] (03PS5) 10Dzahn: the existing etherpad-lite_1.0-wm2 package in a operations/debs repo for completeness [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76654 [09:30:54] (03PS1) 10ArielGlenn: re-enable rsync between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78036 [09:32:47] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 09:32:37 UTC 2013 [09:33:13] (03CR) 10ArielGlenn: [C: 032] re-enable rsync between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78036 (owner: 10ArielGlenn) [09:33:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:38:07] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:39:08] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [09:42:07] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:07] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [09:47:10] (03PS2) 10Lcarr: updated to text-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/78015 [09:49:48] (03CR) 10Lcarr: [C: 032] updated to text-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/78015 (owner: 10Lcarr) [09:50:16] LeslieCarr: what is in there besides a commit message? [09:50:27] oh [09:50:36] haha someone else fixed that in the meantime [09:50:38] me [09:50:39] :-D [09:50:42] yay [09:50:45] you rock [09:50:48] thanks! [10:06:27] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:22:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:23:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [10:32:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:32:57] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 10:32:52 UTC 2013 [10:33:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [10:33:27] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:40:52] (03PS1) 10ArielGlenn: puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 [10:41:32] (03CR) 10jenkins-bot: [V: 04-1] puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 (owner: 10ArielGlenn) [11:02:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [11:04:14] (03PS2) 10ArielGlenn: puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 [11:04:57] (03CR) 10jenkins-bot: [V: 04-1] puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 (owner: 10ArielGlenn) [11:08:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:12:52] RECOVERY - search indices - check lucene status page on search19 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.110 second response time [11:15:15] (03PS3) 10ArielGlenn: puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 [11:15:52] (03CR) 10jenkins-bot: [V: 04-1] puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 (owner: 10ArielGlenn) [11:18:03] (03PS4) 10ArielGlenn: puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 [11:21:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:24:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [11:25:37] @info dbkey [11:25:38] RoanKattouw_away: Unknown identifier (dbkey) [11:25:45] @info wikimania2013wiki [11:25:45] RoanKattouw_away: [wikimania2013wiki: s3 (DEFAULT)] db1019: 10.64.16.8, db1003: 10.64.0.7, db1010: 10.64.0.14, db1035: 10.64.16.24 [11:25:54] @info centralauth [11:25:54] RoanKattouw_away: [centralauth: s7] db1041: 10.64.16.30, db1007: 10.64.0.11, db1024: 10.64.16.13, db1028: 10.64.16.17 [11:26:04] @info db1024 [11:26:04] RoanKattouw_away: [db1024: s7] 10.64.16.13 [11:26:12] @replag s7 [11:26:12] RoanKattouw_away: [s7] db1041: 0s, db1007: 0s, db1024: 0s, db1028: 0s [11:31:32] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [11:31:32] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [11:31:32] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [11:31:32] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [11:31:32] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [11:31:33] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [11:33:52] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 11:33:45 UTC 2013 [11:34:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:52:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:53:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [12:02:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:03:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [12:03:28] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:28] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:28] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:28] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:28] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:29] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [12:08:48] !log removing 91.198.174.7/32 from ssl3001 & maerlant, old/deprecated ipv6 testing IP [12:09:00] Logged the message, Master [12:10:13] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:22:53] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [12:23:43] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.61 ms [12:26:55] (03CR) 10ArielGlenn: [C: 032] puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 (owner: 10ArielGlenn) [12:33:43] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 12:33:42 UTC 2013 [12:34:13] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:38:52] (03PS1) 10ArielGlenn: fix rsync dup package declaration in mirror.pp/download.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/78051 [12:42:08] (03CR) 10ArielGlenn: [C: 032] fix rsync dup package declaration in mirror.pp/download.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/78051 (owner: 10ArielGlenn) [12:46:12] (03PS1) 10ArielGlenn: fix ensure typo in dump gluster rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/78053 [12:46:29] (03CR) 10jenkins-bot: [V: 04-1] fix ensure typo in dump gluster rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/78053 (owner: 10ArielGlenn) [12:48:22] (03PS2) 10ArielGlenn: fix ensure typo in dump gluster rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/78053 [12:51:41] (03CR) 10ArielGlenn: [C: 032] fix ensure typo in dump gluster rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/78053 (owner: 10ArielGlenn) [13:07:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:32:56] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 13:32:50 UTC 2013 [13:33:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:38:57] (03PS1) 10Dereckson: (bug 52578) New user group 'botadmin' on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78056 [13:56:56] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [14:10:42] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:29:15] (03PS1) 10Jgreen: change shell for user otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78058 [14:32:02] (03CR) 10Jgreen: [C: 032 V: 031] change shell for user otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78058 (owner: 10Jgreen) [14:33:12] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 14:33:02 UTC 2013 [14:33:42] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:50:19] <^d> manybubbles: I've almost finished reviewing your index-splitting change. [14:50:31] Thanks! [14:50:35] I'm sorry there is so much of it [14:51:42] <^d> No worries. [14:52:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:53:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [15:03:47] <^d> manybubbles: Review posted. tl;dr: "Mostly good, need some minor fixes & clarification" [15:04:08] will fix! [15:09:20] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:21:39] (03PS2) 10Petr Onderka: Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 [15:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [15:26:07] <^d> manybubbles: And I did that all before breakfast...I should eat now :) [15:26:23] ^d: I was thinking it is early there [15:27:19] * ^d is an early riser [15:27:22] <^d> I'm usually up and about by 6:30 at the latest. [15:30:05] (03PS3) 10Petr Onderka: Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 [15:32:24] (03PS1) 10Ori.livneh: EventLogging: Specify 'raw=1' in raw logger input stream URIs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78070 [15:32:35] yo manybubbles [15:32:47] any chance you could merge that? ^ [15:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 15:32:42 UTC 2013 [15:33:00] it's a config change for a software / server i administer [15:33:06] really trivial. [15:33:20] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:33:21] manybubbles merging on ops/puppet [15:33:33] he doesn't have permissions to do that :) [15:33:45] hey paravoid [15:33:47] * ^d will give everyone permissions just for fun :) [15:33:49] yeah - I was gonna say: "0 chance. I can't click +2" [15:33:51] (03CR) 10Faidon: [C: 032] EventLogging: Specify 'raw=1' in raw logger input stream URIs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78070 (owner: 10Ori.livneh) [15:33:58] thanks :) [15:34:18] (03CR) 10Faidon: [C: 032] Rename 'redis.py' to 'redis_monitoring.py' to avoid conflict [operations/puppet] - 10https://gerrit.wikimedia.org/r/77657 (owner: 10Ori.livneh) [15:34:31] bonus round! [15:34:43] ^d: fun one: https://bugzilla.wikimedia.org/show_bug.cgi?id=52612 [15:34:52] for after breakfast [15:35:33] <^d> Ouch. [15:35:34] <^d> Yeah [15:35:58] <^d> paravoid: Another easy one :) https://gerrit.wikimedia.org/r/#/c/76884/ [15:40:23] (03PS2) 10Faidon: svn::server: remove some unused packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/76884 (owner: 10Demon) [15:40:57] there's nothing inherent with svn not working when e.g. graphviz is installed [15:41:20] puppet question: any way/examples of a class that adds content to an apache vhost's config file? [15:41:20] so I don't think an ensure => absent is appropriate; it might be installed due to other packages in the system, for example [15:41:48] (03CR) 10Faidon: [C: 032 V: 032] svn::server: remove some unused packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/76884 (owner: 10Demon) [15:44:11] (03PS1) 10Jgreen: fixed exim group for otrs delivery pipe [operations/puppet] - 10https://gerrit.wikimedia.org/r/78072 [15:45:21] (03CR) 10Jgreen: [C: 032 V: 031] fixed exim group for otrs delivery pipe [operations/puppet] - 10https://gerrit.wikimedia.org/r/78072 (owner: 10Jgreen) [15:45:26] <^d> paravoid: So basically I want to remove the bits from the config entirely (they're unused now that svn is r/o). [15:45:46] <^d> Eventually I need to move the read-only service off formey since it's EOL and in Tampa. [15:46:00] <^d> And I don't want a bunch of old junk on whatever eqiad box it ends up on :) [15:46:51] ^d: saw my PS2 above? :) [15:46:52] I merged that [15:46:56] and removed the packages manually on formey [15:47:50] <^d> Hehe, no I didn't :) [15:47:52] <^d> Thanks then [15:48:52] (03PS2) 10Demon: Remove obsolete backup stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/76885 [15:52:20] ^d: ensure => absent for /svnroot/bak won't work [15:52:27] it won't be recursive [15:52:35] you need recurse => true [15:52:36] <^d> Ah, I can just nuke it by hand. [15:52:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:52:52] but it's the same as before, I'm not sure if I see the point of having puppet clean up this one box [15:53:07] let's just remove the dump class altogether and clean up manually, it looks very simple [15:53:37] if it was something on the appservers I'd probably have a different opinion [15:54:00] <^d> amending. [15:54:05] (03PS4) 10Petr Onderka: Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 [15:54:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [15:54:29] (03CR) 10Petr Onderka: [C: 032 V: 032] Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 (owner: 10Petr Onderka) [15:54:49] (03PS3) 10Demon: Remove obsolete backup stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/76885 [15:55:17] (03CR) 10Faidon: [C: 032] Remove obsolete backup stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/76885 (owner: 10Demon) [15:56:18] svndump.php & cron cleaned up [15:56:27] shall I do /svnroot/bak too? [15:56:33] <^d> Yeah that's not needed anymore. [15:56:40] seems to be backups, so I thought to confirm first :) [15:56:53] done [15:57:03] <^d> Old backups. I made final backups and gzip'd them. [15:57:10] perfect [15:57:26] <^d> I figured the old rotated backups on tridge can be nuked to save some space, and the one-off backups get copied $somewhereSafe. [15:57:57] ori-l: time to chat about a vagrant+puppet+apache design question? https://www.mediawiki.org/wiki/User:BDavis_(WMF)/Notes/Thumb.php_with_Vagrant#Puppet [15:58:20] <^d> paravoid: The final backups for wherever we want to stash them are currently at formey:/svnroot/final-backup/ [15:58:22] bd808: oh, are you working on thumb.php? [15:58:32] paravoid: yes [15:58:40] ^d: I have zero clue about our backup system(s) [15:58:48] ^d: akosiaris is starting to ramp up on this, afaik [15:58:54] I got it working with manual changes. Now I want to turn it into a role for vagrant [15:58:58] bd808: cool [15:59:07] bd808: I do media storage among other things [15:59:07] <^d> paravoid: Cool, I'll sync up with him then :) [15:59:36] bd808: so I'm available if there's something I can help you with [15:59:57] and don't worry about asking silly newbie questions, I won't judge at all :) [16:00:07] I've been there too and it wasn't too far in the past [16:00:13] At the moment I just need some puppet & apache style guidance I think [16:00:36] I need to add a block of config to the apache vhost when the role is enabled [16:00:39] the vagrant puppet stuff are a bit different than production [16:00:44] and I don't know much about it [16:01:00] so for example, we don't do LocalSettings.php via puppet on prod [16:01:01] and I have a couple ideas of how that might be done, but don't know which is "better" or "right" [16:01:14] or maybe even "possilble" [16:01:44] bd808: I have to finish syncing a change and then run off. I took a very cursory glance and it looks like good progress. I'll take a look in a few hours, if that's cool, unless paravoid figures it out first. (paravoid, thanks for helping.) [16:02:03] I don't think I can be of much help here... [16:02:26] ori-l: cool beans. I'll take a stab at something and then you can tell me what I did so very wrong. :) [16:08:51] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:11:12] bd808: you probably want something like [16:11:13] https://dpaste.de/6OGHH/raw/ [16:11:23] (poorly formatted because i did it in a hurry) [16:13:22] ori-l: my brain was making it much harder than that, but that might work [16:14:24] if you're feeling up to it, you could factor out of the mediawiki::extension resource type a more generic mediawiki::settings resource type that gives the same nice mapping of puppet hashes to PHP code (paravoid is probably shuddering) [16:14:42] i've been meaning to do that, and modify mediawiki::extension so that it uses it [16:14:55] i'm still very jetlagged, so if that makes no sense at all, it's probably my fault [16:15:11] ori-l: are you in HK? [16:15:17] yep [16:15:25] GO TO BED! [16:15:43] was just about to go out for a while actually :P [16:16:19] * bd808 shakes head at kids these days [16:22:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:23:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [16:30:16] (03PS1) 10Jgreen: move otrs homedir from /opt/otrs-home to /var/lib/otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78082 [16:31:16] (03CR) 10Jgreen: [C: 032 V: 031] move otrs homedir from /opt/otrs-home to /var/lib/otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78082 (owner: 10Jgreen) [16:33:11] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 16:33:07 UTC 2013 [16:33:51] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:37:19] (03PS1) 10Demon: Redo search configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78083 [16:39:20] (03PS2) 10Demon: Redo search configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78083 [16:44:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:46:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.728 second response time [16:48:45] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [16:49:24] haaai paravoid, have you created a librdkafka .deb yet? [16:49:30] no [16:49:40] well, I did months ago but it's outdated by now [16:50:02] ok, but the debian/ stuff is there? [16:50:04] i could build another one? [16:57:49] /usr/bin/ld.bfd.real: unrecognized option '-Wl,-z,relro' [17:00:50] ottomata: (working on updating it) [17:00:55] what do you need for btw? [17:02:30] want to play with varnishkafka in labs [17:05:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:48] I haven't started to work on varnishkafka packaging yet [17:07:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [17:08:50] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:11:11] blergh, makefile is broken [17:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:23:42] !log authdnsiupdate for lists.w.o spf records [17:23:44] greg-g: ^ [17:23:54] Logged the message, RobH [17:24:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [17:24:56] why is google on that SPF record? [17:25:00] it shouldn't be [17:26:00] RobH: wee [17:26:01] I think 'v=spf1 a mx ~all' should be enough [17:26:08] the rest are really unneeded for lists [17:26:21] RobH: ^ [17:26:24] =/ [17:26:37] so dont include ipv4/ipv6 ranges or google? [17:27:01] (is that since they are included in top level?) [17:27:28] google won't ever send from a @lists.wikimedia.org address [17:27:52] this doesn't belong to google (until we switch to google groups anyway *ducks*) [17:28:19] and the rest aren't needed because all the MTAs that would ever send from @lists.wikimedia.org are on the MX record [17:28:21] * greg-g kicks paravoid [17:29:03] :D [17:33:00] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 17:32:55 UTC 2013 [17:33:50] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:36:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:53] RobH: are you fixing? [17:39:48] sorry, was in PMs with folks [17:40:02] paravoid: ok, so lemme fix [17:40:11] this is why the 5 minute ttl rocks [17:40:13] heh [17:41:00] paravoid: mx all? [17:41:14] lists.wikimedia.org. 600 IN TXT "v=spf1 ?all" [17:41:17] yes? [17:41:41] no [17:41:45] lists.wikimedia.org. 600 IN TXT "v=spf1 ?all" [17:41:45] lists.wikimedia.org. 600 IN SPF "v=spf1 ?all" [17:41:53] ok, then i misunderstand. [17:42:07] 20:26 < paravoid> I think 'v=spf1 a mx ~all' should be enough [17:42:14] "a mx" [17:42:38] lists has the mx of lists and mchenry [17:42:46] yes, that's fine [17:43:03] im confused as shit. [17:43:09] what's the issue? [17:43:18] ottomata: https://github.com/paravoid/librdkafka/tree/debian [17:43:36] lists 1H IN A 208.80.154.4 ; service IP on sodium [17:43:36] 1H IN MX 10 lists [17:43:36] 1H IN MX 60 mchenry [17:43:37] ; lists.wikimedia.org SPF txt and rr records [17:43:39] lists.wikimedia.org. 600 IN TXT "v=spf1" [17:43:41] woot, danke paravoid [17:43:41] lists.wikimedia.org. 600 IN SPF "v=spf1" [17:43:43] ? [17:43:47] that seems fucked up to me [17:44:06] IN TXT "v=spf1 a mx ~all" [17:44:14] ok [17:45:00] "mx" inside the SPF record means "the mx (mail receivers) of this domain are trusted to send mails on behalf of this domain" [17:45:10] ahh, ok [17:45:28] and i still need both txt and spf with that in it [17:45:33] just confirming before i push ;] [17:45:44] yeah, if powerdns accepts SPF RRs, yes [17:45:45] or you can just svn diff on sockpuppet [17:46:03]