Using Delphi and TWebBrowser to process WordPress’s HTML Source code

I don’t have a lot of time since I am heading off to my brother-in-law’s highschool graduation in upstate NY tomorrow. I might be able to post date a few articles for the Verizon Phone tips category, but this will be the last meaty post for a little while. So, today you get a program that I wrote at the beginning of the month when I wanted to let people know I’d written an article about the release of WordPress 2.03: TrackbackGrabber.

This little routine will go to a WordPress post, grab the source code and isolate all of the track backs for you. It turns them in to seperate lines of <255 characters each that you can paste into your ping back line on your post. Then when you publish an article, you will let everyone on that list know about it. This is somethign I used to do manually and it took forever! IMPORTANT: Don’t abuse this or it will back fire and you’ll quickly be labeled a spammer and will be reported to Akismet, Spam Karma and other spam filtering tools. If that happens none of your comments, trackbacks or manually typed entries, will get through anywhere!

Concepts Demonstrated:


  • Surfing the web using Delphi

  • Using a TWebBrowser Active X control to navigate the DOM structure of a website

  • Retreiving the source code of a webpage

  • The fundamentals of creating a web bot in Delphi

  • Filtering contents of a TMemo

  • How to tell when a page has loaded in the TWebBrowser

That last one was the tricky part. If you access WebBrowser1.Document too soon, you will get an access violation. Sleeps and process messages, no matter how many you use or how long you wait, do NOT consistently work. Pressing the button (calling WebBrowser1.Navitate twice) always worked but I found WebBrowser1.ReadyState in the documentation and checking that seems to be the right way to do it.

Exe: http://www.thecodecave.com/downloads/delphi/TrackbackGrabber.exe
Source Code: http://www.thecodecave.com/downloads/delphi/TrackbackGrabber.zip

Main unit:
[delphi]
// ****************************************************************************
// TrackBackGrabber_Main 06/Jun/2006
// Written by Brian Layman (AKA Capt. Queeg AKA SilverPaladin)
// Visit him at http://www.TheCodeCave.com
//
// WordPress’s Trackback system is a way to connect articles about the same
// topic. If you post something related to a popular article, this tool allows
// you to tell other bloggers that have posted related articles that you have
// additional information on the subject.
//
// Warning: I can’t think of any way that this routine could cause harm to
// your computer, but it could spell the end of your blog if you abuse it.
// Abuse this tool and it WILL back fire. You’ll quickly be labeled a
// spammer and will be reported to Akismet, Spam Karma and other spam
// filtering tools. If that happens, none of your comments, trackbacks or
// manually typed entries, will get through anywhere! It would really stink
// to have every comment you leave on any WordPress blog automaticly
// deleted.
//
// As always, I’ll say it is a good best practice to understand every line
// of new code before you run it. Who knows what could be lurking? Better
// yet, do not run this example at all. You should stop right now and erase
// the files. For if it causes blue smoke to be emitted from your network
// card, if it erases all users from your computer, or if it makes your
// sister break up with her lawyer boyfriend and start dating a caver, it
// is not my fault. (Actually that last one might be an improvement, but
// it is still not my fault.) But the fact of the matter is, computers
// have a mind of their own and we programmers live on the wild side.
//
// Usage: TrackBackGrabber.exe
// Supply an URL to a WordPress post and it will grab all of the track
// back links and then consolidate those to lines 255 characters long.
// Then all you need to do is paste each line into your WordPress post’s
// track back field and hit save. Each site will only be updated once.
//
// Licensing – You can use this source as you will. It’s free for
// commercial, shareware and gpl use as you like. I hope it helps.
// If this program & source really helps you out, please visit
// http://www.thecodecave.com/did-that-help/ and read more.
//
// History:
// 06/Jun/2006 – BL – Created
//
// ****************************************************************************
unit TrackbackGrabber_Main;

interface

uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
StdCtrls, OleCtrls, SHDocVw, mshtml;

type
TfrmTrackback = class(TForm)
memoResults: TMemo;
btnGather: TButton;
editAddress: TEdit;
WebBrowser1: TWebBrowser;
procedure btnGatherClick(Sender: TObject);
private
{ Private declarations }
procedure ProcessPage;
public
{ Public declarations }
end;

var
frmTrackback: TfrmTrackback;

const
// This defines what to look for.
// In this case it is TrackBacks.
// Deliberate references to this article for.
SEARCH_STR = ‘Trackback from Memo1.Lines.Count)
do begin
S := Memo1.Lines[0] + ‘ ‘ + Memo1.Lines[Loop];
if (Length(S) < 255) then Memo1.Lines[0] := S else begin Memo1.Lines.Insert(0, Memo1.Lines[Loop]); inc(Loop); end; Memo1.Lines.Delete(Loop); end; end; // ProcessPage end. [/delphi]