da: (bit)
[personal profile] da
I've given up on using LJ for my RSS feeds. I've got 88 of them, which means I sometimes don't see real people posts for pages and pages. I'm jumping to Google Reader. Google Reader will import "OPML" data, so that's how I wanted to do the transfer. This is a how-to.

(If you are hoping to read your LJ friends list directly from an RSS reader, you might instead try this exporter, which grabs the necessary data for friends- it doesn't include feeds and is beyond the scope of this how-to.)

The short version:

1) if you're handy with Perl, grab the code at the bottom of this entry, change the user name, and run the code.
1b) if you're not handy with Perl, give me a shout and I'll run it for you on my server. :)

2) Output is a list of RSS URLs. To translate these to OPML, feed them to this page. Copy and paste them into the big text box, then hit the "Create OPML" link. Seconds later, you will have an output file, which you should save to disk (the file name doesn't matter).

3) Optionally, open the file in a text editor and change the "title" parts from the URL into a sensible title for each feed. Yeah, I was too lazy to write my own OPML and fix that.

4) in Google Reader, the left-hand lower corner, choose "Manage Subscriptions". Choose "Import/Export". Browse and upload your OPML file.

Done!

So far, I like the google reader interface, and now I can actually pay attention to the real people on my list who do still post. (I appreciate y'all! I did this for yoooou!)

---
Don't bother reading the rest unless you want technical details; mostly here for google searching. Let me know if this helped anybody!

The code:

#!/usr/bin/perl

use strict;
use WWW::Mechanize;

my $base_url = "http://da-lj.livejournal.com/profile";

my $m = WWW::Mechanize->new ( autocheck => 1 );

$m->get( $base_url );

my $profile_html = $m->content;
my @feed_lines = ($profile_html =~ /watchingfeeds_body.*/g);
my @feed_urls = ($feed_lines[0] =~ /href='(.*?)'/g);

foreach my $lj_url (@feed_urls) {
    $m->get( $lj_url );
    print $m->find_link( text => 'XML' )->url() . "\n";
}


And that's it. I'm using WWW::Mechanize, which is the bee's knees if you have to do screen-scraping in Perl.

I started off with a manual grab of my "watching" page, a word-processor search-and-replace, and was about to run a batch of 'wget's to grab the lj-feed pages when I realized it would be quicker in perl.

The biggest drawback to this method is the cost of installing WWW::Mechanize in the first place. CPAN makes it easy(ish) but it has a tonne of dependencies. ...I guess it's just one step if you're on a reasonably recent Debian/Ubuntu.

Anyhow.

Date: Friday, 27 May 2011 04:58 am (UTC)
From: [identity profile] http://users.livejournal.com/merle_/
You da(_lj) man!

I have the same question as [livejournal.com profile] thingo. I'd been looking into Mechanize for work purposes but did not see a trivial way to add in authentication. My best guess was to start the script by logging in with hardcoded password and then start making requests using the cookie it returned, but.. eeeew.

Date: Sunday, 29 May 2011 07:08 pm (UTC)
From: [identity profile] da-lj.livejournal.com
See my comment to [livejournal.com profile] thingo above...

WWW::Mechanize *could* do that authentication as you suggest. Yes, sort of ew.

http://www.livejournal.com/support/faqbrowse.bml?faqid=306

gave me another idea, which is to feed your browser's LJ cookie directly to WWW::Mechanize. Let me know if you get that to work!

Date: Sunday, 29 May 2011 09:59 pm (UTC)
From: [identity profile] http://users.livejournal.com/merle_/
I think you could feed a cookie in (and was how I was thinking of doing it), but the problems are:
- the cookies are tied to an IP and change in a serial manner
- every now and then LJ just logs you out
- if you have a slow connection you can see in the status bar that it goes through a bunch of redirects

None of those is insurmountable but it makes for a fragile system that will need constant maintenance of that cookie.

December 2024

S M T W T F S
12 34567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Monday, 14 July 2025 02:49 pm
Powered by Dreamwidth Studios