<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: FeedWordPress 0.91</title>
	<atom:link href="http://projects.radgeek.com/2005/04/09/feedwordpress-091/feed/" rel="self" type="application/rss+xml" />
	<link>http://projects.radgeek.com/2005/04/09/feedwordpress-091/</link>
	<description>the software industry of a secessionist republic of one</description>
	<pubDate>Wed, 27 Aug 2008 23:32:41 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: bdblogs</title>
		<link>http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-2481</link>
		<dc:creator>bdblogs</dc:creator>
		<pubDate>Tue, 13 Jun 2006 11:50:39 +0000</pubDate>
		<guid isPermaLink="false">http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-2481</guid>
		<description>&lt;p&gt;I have installed feedwordpress, which takes several rss feeds and displays their posts ordered by date. it works fine.Feedwordpress only brings me the parsed data from the posts. how can I get the TITLE/URL of the weblog as well?&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I have installed feedwordpress, which takes several rss feeds and displays their posts ordered by date. it works fine.Feedwordpress only brings me the parsed data from the posts. how can I get the TITLE/URL of the weblog as well?</p>]]></content:encoded>
	</item>
	<item>
		<title>By: andromeda strain</title>
		<link>http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-68</link>
		<dc:creator>andromeda strain</dc:creator>
		<pubDate>Mon, 09 May 2005 18:18:18 +0000</pubDate>
		<guid isPermaLink="false">http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-68</guid>
		<description>&lt;p&gt;The Feed Filters plugin doesn't work. I know this may be annoying for you but using Yahoo! News as a source of feeds this duplicate problem appears continually. Ie. if you define some post categories, each referred to a keyword search in yahoo, when an article results in two queries it will be posted two times, one in each category, insted of say one time only with two categories assignments...&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>The Feed Filters plugin doesn&#8217;t work. I know this may be annoying for you but using Yahoo! News as a source of feeds this duplicate problem appears continually. Ie. if you define some post categories, each referred to a keyword search in yahoo, when an article results in two queries it will be posted two times, one in each category, insted of say one time only with two categories assignments&#8230;</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Rad Geek</title>
		<link>http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-36</link>
		<dc:creator>Rad Geek</dc:creator>
		<pubDate>Thu, 21 Apr 2005 00:41:40 +0000</pubDate>
		<guid isPermaLink="false">http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-36</guid>
		<description>&lt;blockquote&gt;&lt;p&gt;I'm using yahoo news and I have the problem that some articles keep getting added every day. &lt;/p&gt;
&lt;p&gt;It would be great if there were a check for duplicate or similar entries.&lt;/p&gt;
&lt;p&gt;I've looked at the code, but haven't been able to figure out how to do this.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I could be mistaken, but my suspicion is that the problem with Yahoo! is that &lt;a href="http://www.intertwingly.net/blog/2005/04/09/Clone-Wars"&gt;their feeds are broken&lt;/a&gt;. Some big sites remain notorious offenders on incorrect or nonexistant use of the Atom &lt;code&gt;element or the RSS 2&lt;/code&gt; element, which FeedWordPress uses to determine which posts have been syndicated and which have not.&lt;/p&gt;

&lt;p&gt;Different aggregator authors have dealt with this problem in different ways. I prefer not to deal with it (directly) because most attempts to solve the duplication problems that broken feeds cause end up complicating the code without producing reliably good results. However, if the use you want to put FeedWordPress to demands that you use broken feeds and try to strip out duplicates, then this seems like a natural place to use a &lt;code&gt;syndicated_item&lt;/code&gt; filter (where you could check the current item against the database according to whatever criteria of similarity you want to use, and then return &lt;code&gt;NULL&lt;/code&gt; if it seems to be a duplicate or pass the item through if it does not.&lt;/p&gt;

&lt;p&gt;Here's a rather simple-minded example of how you might do this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&#60;?php
/*
Plugin Name: Feed Filters
Description: Filters incoming posts
Author: Charles Johnson
Version: 2005.04.20
Author URI: http://www.radgeek.com/
*/

add_filter('syndicated_item', 'filter_out_duplicates');

// Don't add posts with duplicate content.
function filter_out_duplicates ($item) {
    if (isset($item['content']['encoded']) and $item['content']['encoded']):
        $content = $item['content']['encoded'];
    else:
        $content = $item['description'];
    endif;

    // If capitalization or spacing are all that's different, it's probably
    // a duplicate.
    $content = $wpdb-&#62;escape(trim(strtolower($content)));

    $id = $wpdb-&#62;get_var("
    SELECT ID FROM $wpdb-&#62;post
    WHERE
        TRIM(LOWER(post_content)) = '$content'
    ");
    if (is_null($id)) : $item = NULL; endif;
    return $item;
}
?&#62;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You'd put this code in its own PHP module, then install that module as a WordPress plugin like any other (copy it to &lt;code&gt;wp-content/plugins&lt;/code&gt; and activate it from the Dashboard). Note that this is a preliminary attempt which I HAVE NOT TESTED, so I can't guarantee that I haven't made some kind of boneheaded error. Depending on the sources you are pulling the feeds from, you will very probably want somewhat more fine-grained logic than this for detecting duplicates, but it seems like this is the sort of start you would want to get off to.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<blockquote><p>I&#8217;m using yahoo news and I have the problem that some articles keep getting added every day. </p>
<p>It would be great if there were a check for duplicate or similar entries.</p>
<p>I&#8217;ve looked at the code, but haven&#8217;t been able to figure out how to do this.</p>
</blockquote>

<p>I could be mistaken, but my suspicion is that the problem with Yahoo! is that <a href="http://www.intertwingly.net/blog/2005/04/09/Clone-Wars">their feeds are broken</a>. Some big sites remain notorious offenders on incorrect or nonexistant use of the Atom <code>element or the RSS 2</code> element, which FeedWordPress uses to determine which posts have been syndicated and which have not.</p>

<p>Different aggregator authors have dealt with this problem in different ways. I prefer not to deal with it (directly) because most attempts to solve the duplication problems that broken feeds cause end up complicating the code without producing reliably good results. However, if the use you want to put FeedWordPress to demands that you use broken feeds and try to strip out duplicates, then this seems like a natural place to use a <code>syndicated_item</code> filter (where you could check the current item against the database according to whatever criteria of similarity you want to use, and then return <code>NULL</code> if it seems to be a duplicate or pass the item through if it does not.</p>

<p>Here&#8217;s a rather simple-minded example of how you might do this:</p>

<pre><code>&lt;?php
/*
Plugin Name: Feed Filters
Description: Filters incoming posts
Author: Charles Johnson
Version: 2005.04.20
Author URI: <a href="http://www.radgeek.com/" rel="nofollow">http://www.radgeek.com/</a>
*/

add_filter(&#8217;syndicated_item&#8217;, &#8216;filter_out_duplicates&#8217;);

// Don&#8217;t add posts with duplicate content.
function filter_out_duplicates ($item) {
    if (isset($item['content']['encoded']) and $item['content']['encoded']):
        $content = $item['content']['encoded'];
    else:
        $content = $item['description'];
    endif;

    // If capitalization or spacing are all that&#8217;s different, it&#8217;s probably
    // a duplicate.
    $content = $wpdb-&gt;escape(trim(strtolower($content)));

    $id = $wpdb-&gt;get_var(&#8221;
    SELECT ID FROM $wpdb-&gt;post
    WHERE
        TRIM(LOWER(post_content)) = &#8216;$content&#8217;
    &#8220;);
    if (is_null($id)) : $item = NULL; endif;
    return $item;
}
?&gt;
</code></pre>

<p>You&#8217;d put this code in its own PHP module, then install that module as a WordPress plugin like any other (copy it to <code>wp-content/plugins</code> and activate it from the Dashboard). Note that this is a preliminary attempt which I HAVE NOT TESTED, so I can&#8217;t guarantee that I haven&#8217;t made some kind of boneheaded error. Depending on the sources you are pulling the feeds from, you will very probably want somewhat more fine-grained logic than this for detecting duplicates, but it seems like this is the sort of start you would want to get off to.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-35</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Wed, 20 Apr 2005 21:49:41 +0000</pubDate>
		<guid isPermaLink="false">http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-35</guid>
		<description>&lt;p&gt;I think I've found something useful in the following php functions:&lt;/p&gt;

&lt;p&gt;http://www.php.net/manual/en/function.similar-text.php&lt;/p&gt;

&lt;p&gt;http://www.php.net/manual/en/function.levenshtein.php&lt;/p&gt;

&lt;p&gt;I'll try those in combination with the code jeremy posted above and see if that solves the problem of multiple articles on the same event.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I think I&#8217;ve found something useful in the following php functions:</p>

<p><a href="http://www.php.net/manual/en/function.similar-text.php" rel="nofollow">http://www.php.net/manual/en/function.similar-text.php</a></p>

<p><a href="http://www.php.net/manual/en/function.levenshtein.php" rel="nofollow">http://www.php.net/manual/en/function.levenshtein.php</a></p>

<p>I&#8217;ll try those in combination with the code jeremy posted above and see if that solves the problem of multiple articles on the same event.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-34</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Wed, 20 Apr 2005 20:37:36 +0000</pubDate>
		<guid isPermaLink="false">http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-34</guid>
		<description>&lt;p&gt;Great script! almost perfect :)&lt;/p&gt;

&lt;p&gt;I'm using yahoo news and I have the problem that some articles keep getting added every day. &lt;/p&gt;

&lt;p&gt;It would be great if there were a check for duplicate or similar entries.&lt;/p&gt;

&lt;p&gt;I've looked at the code, but haven't been able to figure out how to do this.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Great script! almost perfect :)</p>

<p>I&#8217;m using yahoo news and I have the problem that some articles keep getting added every day. </p>

<p>It would be great if there were a check for duplicate or similar entries.</p>

<p>I&#8217;ve looked at the code, but haven&#8217;t been able to figure out how to do this.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Jeremy@theppn.org</title>
		<link>http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-30</link>
		<dc:creator>Jeremy@theppn.org</dc:creator>
		<pubDate>Fri, 15 Apr 2005 03:34:09 +0000</pubDate>
		<guid isPermaLink="false">http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-30</guid>
		<description>&lt;p&gt;I was in the process of doing exactly what this plugin does, you have saved me hours of work and I really like some of your design decisions.&lt;/p&gt;

&lt;p&gt;one thing I didnt like though was that the feeds mod date changed the real post mod date in wp, so I fiddled around a bit and added to the list of post variables&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$post['post_date'] = date('Y-m-d H:i:s', $post['epoch']['issued']);
$post['post_modified'] = date('Y-m-d H:i:s', time() );
$post['syndication_modified'] = $post['epoch']['modified'] ;
$post['post_date_gmt'] = gmdate('Y-m-d H:i:s', $post['epoch']['issued']);
$post['post_modified_gmt'] = gmdate('Y-m-d H:i:s', time() );
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which obviously I then checked freshness with, &lt;code&gt;get_post_meta ( $result-&#62;id, 'syndication_modified', true );&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;one of my feeds though doesnt include the date with its rss feeds, so would constantly be updating in the db, this morning I added this to the freshness check, after checking the date. (the strip slashes is because the site also has its feed badly encoding so I check with and without slashes stripped too.) I should clean up the code but I thought I would pass it back upstream sooner rather than later.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;if (!$result):
    $freshness = 2; // New content
elseif ($post['epoch']['modified'] &#62; $old_mod_time ): // was $result-&#38;gt;modified
    // some feeds do not provide a date at all, so we need to compare content to see if there really was an update or not.
    if ( strlen ( $result-&#62;post_title ) != strlen ( stripslashes ( $post['post_title'] ) ) ) {
        if ( strlen ( $result-&#62;post_title ) != strlen ( $post['post_title'] ) ) {
            $freshness = 1;
        } else {
            $freshness = 0;
        }
    } elseif ( strlen ( $result-&#62;post_content ) != strlen ( stripslashes ( $post['post_content'] ) ) ) {
        if ( strlen ( $result-&#62;post_content ) != strlen ( $post['post_content'] ) ) {
            $freshness = 1;
        } else {
            $freshness = 0;
        }
    } elseif ( strlen ( $result-&#62;post_excerpt ) != strlen ( stripslashes ( $post['post_excerpt'] ) ) ) {
        if ( strlen ( $result-&#62;post_excerpt ) != strlen ( $post['post_excerpt'] ) ) {
            $freshness = 1;
        } else {
            $freshness = 0;
        }
    } else {
        $freshness = 0;
    }
else:
&lt;/code&gt;&lt;/pre&gt;
</description>
		<content:encoded><![CDATA[<p>I was in the process of doing exactly what this plugin does, you have saved me hours of work and I really like some of your design decisions.</p>

<p>one thing I didnt like though was that the feeds mod date changed the real post mod date in wp, so I fiddled around a bit and added to the list of post variables</p>

<pre><code>$post['post_date'] = date(&#8217;Y-m-d H:i:s&#8217;, $post['epoch']['issued']);
$post['post_modified'] = date(&#8217;Y-m-d H:i:s&#8217;, time() );
$post['syndication_modified'] = $post['epoch']['modified'] ;
$post['post_date_gmt'] = gmdate(&#8217;Y-m-d H:i:s&#8217;, $post['epoch']['issued']);
$post['post_modified_gmt'] = gmdate(&#8217;Y-m-d H:i:s&#8217;, time() );
</code></pre>

<p>which obviously I then checked freshness with, <code>get_post_meta ( $result-&gt;id, 'syndication_modified', true );</code></p>

<p>one of my feeds though doesnt include the date with its rss feeds, so would constantly be updating in the db, this morning I added this to the freshness check, after checking the date. (the strip slashes is because the site also has its feed badly encoding so I check with and without slashes stripped too.) I should clean up the code but I thought I would pass it back upstream sooner rather than later.</p>

<pre><code>if (!$result):
    $freshness = 2; // New content
elseif ($post['epoch']['modified'] &gt; $old_mod_time ): // was $result-&amp;gt;modified
    // some feeds do not provide a date at all, so we need to compare content to see if there really was an update or not.
    if ( strlen ( $result-&gt;post_title ) != strlen ( stripslashes ( $post['post_title'] ) ) ) {
        if ( strlen ( $result-&gt;post_title ) != strlen ( $post['post_title'] ) ) {
            $freshness = 1;
        } else {
            $freshness = 0;
        }
    } elseif ( strlen ( $result-&gt;post_content ) != strlen ( stripslashes ( $post['post_content'] ) ) ) {
        if ( strlen ( $result-&gt;post_content ) != strlen ( $post['post_content'] ) ) {
            $freshness = 1;
        } else {
            $freshness = 0;
        }
    } elseif ( strlen ( $result-&gt;post_excerpt ) != strlen ( stripslashes ( $post['post_excerpt'] ) ) ) {
        if ( strlen ( $result-&gt;post_excerpt ) != strlen ( $post['post_excerpt'] ) ) {
            $freshness = 1;
        } else {
            $freshness = 0;
        }
    } else {
        $freshness = 0;
    }
else:
</code></pre>]]></content:encoded>
	</item>
	<item>
		<title>By: Jon ("the Jester")</title>
		<link>http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-28</link>
		<dc:creator>Jon ("the Jester")</dc:creator>
		<pubDate>Tue, 12 Apr 2005 20:18:59 +0000</pubDate>
		<guid isPermaLink="false">http://projects.radgeek.com/2005/04/09/feedwordpress-091/#comment-28</guid>
		<description>&lt;p&gt;Hey! Thank you so much for providing a plugin to replicate/replace PlanetPlanet. You have completely saved my life!!! :)&lt;/p&gt;

&lt;p&gt;The only thing I'm trying to figure out is if there's a way I can put for "the_title" some sort of [No Title] thing if a person posted without a title. This happens to be a case for a friend using LiveJournal, and while it's a minor problem cosmetically, it'd be nice to fill it with something like a [No Title].&lt;/p&gt;

&lt;p&gt;Otherwise thank you again and again for the plugin. You've made life easier for a lot of us. :)&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Hey! Thank you so much for providing a plugin to replicate/replace PlanetPlanet. You have completely saved my life!!! :)</p>

<p>The only thing I&#8217;m trying to figure out is if there&#8217;s a way I can put for &#8220;the_title&#8221; some sort of [No Title] thing if a person posted without a title. This happens to be a case for a friend using LiveJournal, and while it&#8217;s a minor problem cosmetically, it&#8217;d be nice to fill it with something like a [No Title].</p>

<p>Otherwise thank you again and again for the plugin. You&#8217;ve made life easier for a lot of us. :)</p>]]></content:encoded>
	</item>
</channel>
</rss>
