Cleaning up comment spam

The past couple of days I’ve received a few hundred comment spams from “Kelly Ronald”, “John Reed”, “Nicholas Truman”, “Peter Back”, and “Alexander Kolt”, from IP addresses in Mexico, Taiwan, France, Australia, and California, among others. Most of them are tagged by the stopword list, but it’s a reminder that I should revisit the antispam implementation while I’m reworking the site. For now, I’m making good use of the bulk comment edit feature in WordPress.

Jeff Clavier appears to have gotten the same treatment:

If you are like me, you got blasted by “friendly” comments from Alexander Kolt, Nicolas Trumen, John Reed, Peter Back, and Kelly Ronald – all praising your blog, your posts and yourself.
This new generation of comment spam is more clever than previous but for one thing – the fact that spammers are picking old posts that are not commented upon anymore. Otherwise they use legit blogs/blog posts and in a few cases, it is not even clear which web site they are “pimping”

Jeff also turned up a security blog with additional info:

We have experienced a “massive attack” of SPAM on our blogging system from various hosts all pointing to two websites:
http://www.cosmicbuddha.com/blog/archives/ 001169.html (I have broken the URL intentionally)
And
http://anthony.ianniciello.net/blog/archives/ 000079.html (I have again broken the URL intentionally)
The comments contained very brief sentences and links to the above web sites.
From what it looks like it was an act of an attack against automatic blacklisting and un-moderated comments, probably not conducted by authors’ of the above blogs.

The author of at least one of the sites linked to in this spam run doesn’t seem to be responsible, he’s got a comment on the post linked above, and one of his posts has effectively been taken over by the discussion about how he ended up as one of the two target links in the posted spam comments.

This batch of spam seems a bit random. The typical spam postings I see here try to link to spamblogs and commercial sites. None of the linked sites in this set appear to benefit from the spam. So perhaps this is a test run for something in development. Wonderful thought.

Separately, I’ve also seen a number of attempts to send spam e-mail through a hard coded PHP mail form. Bill Lazar mentions seeing some similar traffic on his site:

In the last few days, though, somebody or someone’s script has found the form and is filling it out repeatedly. I guess the idea is that a useful percentage of web forms will trigger an automated response that’s of interest to the programmer though just what isn’t clear to me. The script fills in the form fields with the same data, an email address of a four or five character random group of letters (such as xtpku) at this domain.

The bad formmail posts are originating from 213.114.195.37 and 66.166.127.226, among others. I don’t think it’s actually succeeding in getting mail sent anywhere, but it’s clogging up the administrative mailbox with failure messages.

Update 09-14-2005 16:20 PDT: Updating to WP-Contact Form 1.3 seems to help. Still seeing attempted spam from new IP addresses, including 62.93.34.155, 67.169.28.125, 146.83.216.207, 206.206.126.44, and 210.0.200.2. Hopefully they’ll figure out that it’s not working and move on.

Small steps versus theorizing, Reboot7

Lot of interesting posts and presentations coming from last week’s Reboot7 conference in Copenhagen. The attendees are predominantly involved with new internet applications such as blogging, tagging, peer-to-peer, voice over IP, social software, and collaborative development, all of which are new, fluid, evolving, and somewhat incompatible with existing business and social models. Progress in new and evolving fields can sometimes get bogged down in “Vision” or “Strategy”, so I’m happy to see this observation about the need and value of small steps from Johnnie Moore:

A theme that seemed to run through Reboot7 was the advocacy of taking small steps over theorising. David Heinemeier Hansson, who built web application Ruby on Rails, stressed the advantage of getting something basic up and running fast. In a presentation on The Skype Brand, Malthe Sigurdsson talked about getting out frequent, small revisions.

Along similar lines, Scoble writes:

I’m stuck with some images coming out of the Reboot conference last week: the power of being small.

Lots of people were talking about the shipping power of small teams. Mostly due to Jason Fried’s talk.

He’s turning out to be one influential developer. Why? Cause he, and two other coworkers, are churning out new features at a torrid pace. Here’s an example of his thinking about development teams: don’t write a functional spec. Whoa. I love his idea for what to do instead: write a one-page story.

In a emerging, largely undefined area, taking small, concrete steps (albeit sometimes at a rapid pace) in a general direction can often uncover more “ground truth” more quickly, with less resource, than a fully investigated, heavily staffed program. Unfortunately, it’s often easier to explain a more comprehensive program, even though the size and overhead of the activity may place a fundamental handicap on it, making it less likely to succeed. There’s also a tendency to want to systematize everything at the outset, to try for the “grand unified theory of everything”, which can become crippling (the early days of XML and CORBA comes to mind). In a new or emerging market, the “Great” can easily become the enemy of the “Good”, or “Useful”. Bear in mind, if it really is new, there’s a good chance it’s not going to be right on the first few tries, so best spend your resources wisely rather than making a wild bet that you’ve found the One True Answer.

Within various corporate R&D and business planning settings, I’ve repeatedly seen that small, motivated teams (1-10 people) can often make substantial headway in new business areas by finding equally motivated customers and solving their needs quickly, frequently without official support (or oversight) from their management. These efforts are often crippled when they do gain “official” status, thus adding the need to be externally explainable in the team’s decision making process, and sometimes also gaining a requirement for a roadmap for world domination. If they survive this stage, most of these small, fast teams are crushed by the subsequent addition of dozens or hundreds of new people and the associated management overhead, organizational empire building, and huge burn rate, all added in an effort to staff up and implement the premature plan for world domination. The team is no longer fast and burns through huge resources committed to an inflexible and obsolete plan in an emerging market space. Oops.

See also: Seth Godin’s Small is the New Big

Caveat: Established markets really do need scale and structure. Sometimes Big is the New Big, too.

Update 2005-06-16 10:16: Great Enough! (more from Seth Godin)

If you don’t ship, it’s not really worth doing. More important, we’ve only got a finite amount of time and resources to invest in anything (thanks, Chris Morris). The real issue is this: when do we stop working on something (because it’s good enough) and work on some other element of the offering.

Are you a nerd?


I am nerdier than 94% of all people. Are you nerdier? Click here to find out!

Came across a post linking to this nerd test while looking into the Semiologic Static Front Page plugin for WordPress.

Overall, you scored as follows:

6% scored higher (more nerdy), and
94% scored lower (less nerdy).

What does this mean? Your nerdiness is:

Supreme Nerd. Apply for a professorship at MIT now!!!.

Hmm. Only 94%?

Adobe Buys Macromedia – initial thoughts

This makes sense. Although Adobe and Macromedia have competed on the content creation front over the past years, starting out from the print world in Adobe’s case and the CD-ROM world in Macromedia’s case, this should allow the combined organization to focus on making the existing tools play better, and move on to the broader problem of document and information management.

Life will be just fine for the existing customer base of print and interactive developers, who will probably end up with a toolbox of Photoshop, Illustrator, Dreamweaver, Flash, InDesign, and Acrobat, each of which are great, even dominant, in their categories, have loyal user communities, and will become more useful as they become better integrated.

The more interesting question is the one the merger is predicated on, which is how to address the broader space of document workflow and information management.

Adobe has been training everyone to think of PDF as “electronic paper”, in that it behaves like a printed document with well defined, mostly fixed presentation of a text and graphics. This is mostly how it is used today, literally replacing paper documents in print-on-demand applications such as product literature, distribution of paper forms for health, government, and corporate applications, or formatted output of books and publications.

Macromedia Flash, on the other hand, is geared toward dynamic graphic presentation, and almost nothing is static, but it can and typically does retrieve new underlying content to be presented through the Flash software client. Complex multimedia presentations are routinely implemented in Flash, including entire web sites. Flash is a more a programmable content presentation system than anything else.

PDF excels at migrating a paper-based workflow model to an online environment, because it turns paper documents into something digital that can be moved around electronically. While paper isn’t going away for a long, long, time, if ever, the problem in the corporate / enterprise space is that we may be moving to an environment where we aren’t starting out with static data very often, and the document is coming from an array of content sources. This is an area where Flash might do well, if other approaches such as Ajax don’t solve enough of the problem.

Look at this blog or any news site as a very simple example. None of the articles posted are actually fixed documents, they’re all facets of an underlying database of content. Now think about the information handled in various corporate workflows. A lot of it already lives in an assortment of databases, some of them actual “databases”, but also many document files scattered across the hard drives of the company. A lot of documents floating around are really a sort of snapshot of a particular view of the underlying data at a point in time. Taken further, we get XML-based interactive documents such as Google Maps, or similar applications such as these demos at Laszlo Systems. It’s not a stretch to imagine existing blogging software wrapped around existing databases and data sources within an enterprise and publishing RSS feeds which are automatically aggregated into “workflow” documents, this is already starting to appear in bits and pieces.

Hardcopy output has a huge advantage in being persistent technology — we can be reasonably sure that the paper document can be read in 50 years, while the same can not be said about the PDF document on CD-ROM, DVD, or any other current storage media. But it also seems that a direction for “documents” will be towards presenting faceted views of the data content available to the publisher. PDF has the “paper” part covered. Flash is useless for “paper” but has a great installed base of dynamic presentation clients.

I would find it disconcerting if the “paper” PDF documents started updating themselves with much more than customer contact data or similar, and I don’t think I’d trust a Flash web site to give me the same content from one week to another. That might be just an age thing, since I’m used to “paper” behaving a particular way, which might change in the future.

Adobe has needed to do something on the enterprise side for years. After bringing in Macromedia, they’ll still need to find a way to address the content / information management side, but Flash seems like a better fit to interactive documents than retrofitting dynamic presentations into PDF.

Next, they need to link up with some content management / database / XML solutions that are both human-friendly and auditor-compliant.

updated 04-27-2005
comments at kottke.org
discussion at slashdot, followup discussion at slashdot

Using WordPress for non-blog sites

Here are some links related to using WordPress for sites that aren’t blogs:

http://www.radicalcongruency.com/20040531-tutorial-using-wordpress-to-power-your-non-blog-website

http://www.grahamazon.com/wordpress-as-cms/

which describes the Stanford Community Health Resource Center site at http://chrc.stanford.edu/

Nvu – open source FrontPage-like tool

This looks interesting, if it works.
http://www.nvu.com/
From the web site:
Nvu (pronounced N-view, for a “new view”) is an Open Source project started by Linspire, Inc. Linspire is committed exclusively to bringing Desktop Linux to the masses, and realized that an easy-to-use web authoring system was needed for Linux to continue its expansion to the Desktop. Linspire contributes significant capital, expertise, servers, bandwidth, marketing, and other resources to guarantee the continuation and success of the Nvu project.

Page 2 of 212