RIA, SEO and deep linking

This post is an analy­sis of the cur­rent sit­u­a­tion about those entwined mat­ters, quite use­lessly con­tro­verted I dare say. The tech­nol­ogy I’ll focus on is Flash/Flex but it applies for Sil­verlight and other plat­forms as well.

In this arti­cle, when I talk about dif­fer­ent “views”, I mean dif­fer­ent tech­nolo­gies used to dis­play some con­tent, be it Flash, Sil­verlight, Java FX, etc.

Con­tent crawling

Ear­lier practises

The idea of pre­sent­ing a mul­ti­ple views (or aspects) for some con­tent isn’t new at all. It has been a com­mon prac­tise for a some web agen­cies to develop two web­sites for their clients, one in HTML and one in Flash. The very pur­pose of that is to be (sup­pos­edly) acces­si­ble to 100% of the web users and also to be search engine optimized.

The Flash view would pro­vide more inter­ac­tive con­tent to the vis­i­tors on broad band whilst the HTML web­site would be focused on clar­ity, quick infor­ma­tion look up and acces­si­bil­ity. Build­ing a Flash and an HTML web­site for the same con­tent (more or less) involves dupli­ca­tion of sim­i­lar work BUT dif­fer­ent skills. Such a strat­egy con­sumes more time and more peo­ple, more resources. In return, you can opti­mize your HTML view for search engines and page rank­ing for the ben­e­fits of the Flash view.

The huge advan­tage of this is that your dynamic (and sta­tic) HTML con­tent is crawled by search engines and you can pro­pose a redi­rec­tion to the vis­i­tors who may like to browse the Flash view instead.

Recent evo­lu­tion

Ever since Google and other search engines were able to look into .swf files, some peo­ple have thought “fine ! all the con­tent within my Flash web­site will be indexed !”. They need to know what is sta­tic and dynamic con­tent. They also need to know the very basic thing : how an RIA works.

Sta­tic con­tent is embed­ded con­tent, you find it in the result­ing file, .swf for Flash or Flex. Dynamic con­tent is loaded from exter­nal sources like data­bases, so data is NOT in the appli­ca­tion itself.

Most of the RIAs work dynam­i­cally. Search engines CANNOT index dynamic con­tent, mean­ing directly from the data­bases. What do search engines see when they look into my Flash web site then ? Almost noth­ing. They can see the title and descrip­tion and a few sta­tic things in the Flash library. That’s all.

Nowa­days

It is often the case cur­rently that Flash web sites remain in Flash alone for the HTML view is a loss of time com­pared to its devel­op­ment cost. With the ris­ing of RIAs on the web, more peo­ple are look­ing for SEO solutions.

A recent trend among Flash and Flex devel­op­ers is to talk about some ulti­mate SEO weapon called XSLT. It’s actu­ally not a big deal. How­ever, we can use it to make what was a huge bother before : a sec­ond view of the same data model.

First of all, XSLT works on XML doc­u­ments and is quite pow­er­ful to trans­form them into any result you need, like HTML. In most cases how­ever, the web site’s data is stored in a rela­tional SQL data­base so it’s an extra step to retrieve data and trans­form it into XML to then feed it to XSLT. And log­i­cally, you’d use the same XML for your RIA but it’s not manda­tory indeed.

Even though XLST trans­forms data, you still have to write and test the whole thing and XSLT is quite touchy, there’s no room for errors. Depend­ing on the com­plex­ity of your out­put, it may be quite a pain. The point is that it still con­sumes less time than the for­mer HTML view devel­op­ment I dis­cussed about BECAUSE the rai­son d’être of the HTML ver­sion for RIAs is not the same any­more. The XSLT out­put is meant for search engine bots and acces­si­bil­ity.

If you look at the HTML view from the cur­rent XSLT imple­men­ta­tions, it’s very sim­ple, even with­out lay­out some­times. Stan­dard users with­out the Flash Player wouldn’t want to surf on such plain and bor­ing HTML. So why isn’t it focused on those users ? Because it would be the same as the for­mer solu­tion. The pur­pose of the XSLT imple­men­ta­tion here is to pro­vide an aver­age solu­tion, a first step towards SEO for RIAs on the web.

To my knowl­edge, there aren’t many web­sites with such an imple­men­ta­tion yet. A “ref­er­ence” in the mat­ter is the Flex direc­tory. If you dis­play the HTML sources, you’ll see numer­ous div lay­ers con­tain­ing the direc­tory data.

Deep link­ing

Talk­ing about deep link­ing a few years ago was only a mean to empha­size that Flash web­sites just couldn’t do it.

The solu­tion has come with Actionscript’s Exter­nal­In­ter­face : anchors han­dled by Javascript for deep link­ing.

Cur­rently, this method is more and more used in Flash and Flex web appli­ca­tions for it’s quite easy to imple­ment. As stated above, deep link­ing is achieved through HTML anchors. Why is that ? Because anchors avoid the browser from reload­ing the win­dow, thus restart­ing your whole web appli­ca­tion. How­ever, anchors aren’t the per­fect answer to the problem.

Con­sider this deep link­ing example :

HTML web site : www.yoursite.com/article/roundup-is-lethal

RIA web site : www.yoursite.com/#article/roundup-is-lethal

Obvi­ously, the first URL is bet­ter. There’s no workaround at the moment but it’s a lit­tle sac­ri­fice con­sid­er­ing you can deep link an RIA. How it works ? Javascript lis­tens to changes in the URL and noti­fies the RIA if some­thing hap­pens. The RIA can also call a Javascript method to update, or rather rewrite the URL when the RIA state changed.

Talk­ing about Flash, there is a nice library called SWFAd­dress that pro­vides you with those func­tion­al­i­ties (for Flex, there is also a built-in library). Be aware that imple­ment­ing deep link­ing in an RIA must be planned. It is not a lot of work but you have to decide what will be deep linked or not and archi­tect your appli­ca­tion accordingly.

Search engine bots

Detec­tion

The strat­egy above requires to detect search engine bots. You need to know who’s com­ing on your web appli­ca­tion in order to serve either HTML con­tent or the rich appli­ca­tion itself.

There is no absolute mean to detect every sin­gle search engine bot because there are thou­sands of them. So you will prob­a­bly use a smaller list­ing of the most impor­tant search engines bots on the web.

Fake user agent

Ineluctably, the ways to detect the real user agent of a given vis­i­tor aren’t per­fect. The motto for secu­rity related mat­ters is to min­i­mize most of the crit­i­cal weak­nesses, not to build the per­fect defense. It is pos­si­ble to ver­ify the iden­tity of a user agent to some extent, and that will be suf­fi­cient for most of the evil and vio­lent crawlers. You can do it with DNS lookup and MySQL caching for instance.

Search engine optimization

Cloak­ing

First of all, I saw a lot of talks about cloak­ing when it comes to pro­vide both Flash and HTML views of a web­site. It is often used to intro­duce the killing line “it could get you banned from Google !!”.

I even found a redun­dant post among sev­eral SEO sites that hap­pily empha­size that SWFOb­ject is con­sid­ered “dan­ger­ous”. If that state­ment is true, many Flash web­sites have a prob­lem as SWFOb­ject is famous right now. How­ever, those alert­ing posts you can read are all dated from 2007 and many things change in one year so it might not be dan­ger­ous anymore.

Google bot (I don’t know for the other bots) dis­likes that we hide con­tent, be it by css or like SWFOb­ject does, by using DIV lay­ers to hide the HTML under the Flash layer.

Nev­er­the­less, if your con­tent in HTML matches what you can find in your Flash view, there shouldn’t be any prob­lem. But you’re still at risk.

Rumors and gossip

As I was look­ing up the web for insights about Flash and SEO, I read on Google’s forum some com­ments whose author some­times doesn’t know what he/she’s talk­ing about, hin­der­ing the dis­cus­sion about how to bet­ter designs so that RIAs can be search engine opti­mized.

It is a fact that human beings dis­like changes. It is also a fact that peo­ple often do not seek fur­ther than what is affirmed. They like an idea or not and if you con­front them, they will fight for it. In case of the SEO com­mu­nity, talk­ing about Flash and SEO will never fail to gen­er­ate a few worth­less com­ments like “con­sider using gifs instead of Flash”.

To the peo­ple who still do not under­stand and keep think­ing about the cur­rent sit­u­a­tion as a war between HTML and Flash, I’ll write this : given a sit­u­a­tion, a con­text, a full-flash web­site or RIA is a rel­e­vant answer to a prob­lem, as it might not be in another sit­u­a­tion. That’s all there is to it.

It’s no use telling “Try to use Flash only where it is needed”. You can find this very “advice” on Google’s web­site but it’s quite stu­pid to write that under the “Best use of flash” title. The same state­ment about “using some­thing only when needed” is valid for every­thing anyway.

RIA and search engine

With the spread of RIAs, search engines are likely to col­lab­o­rate with Adobe and Microsoft as Google did with the Adobe Search Engine SDK. It’s merely the begin­ning of a new era of visual con­tent though. Some day, I believe search engines will pro­vide sup­port and crawl dynamic con­tent, with the col­lab­o­ra­tion of both the data hold­ers and index­ers. On top of this, video con­tents are spread­ing on the web. Index­ing their con­tent (not only their titles as Dai­ly­mo­tion, Youtube and the likes do) is going to be one of the next chal­lenges for search engines too.

Acces­si­bil­ity

Upset­ting matter

Why talk about acces­si­bil­ity now of all times ? It’s not totally off-topic. SEO is a kind of acces­si­bil­ity for search engines after all.

As a part of the talks about ergonomic designs in Flash, acces­si­bil­ity is some­thing every­one knows but vaguely. You can eas­ily find a few peo­ple who argue that screen read­ers can’t read Flash con­tent, the bot­tom line being “Flash is bad, make your web­site in HTML because it’s accessible”.

First of all, Flash does have acces­si­bil­ity fea­tures. How­ever, they’re often ignored because it’s even less likely design­ers will care for the blind peo­ple when their appli­ca­tion is more visually-oriented than HTML websites.

Blind peo­ple

We had a meet­ing one month ago with the pres­i­dent of the Swiss Fed­er­a­tion of the Blind and Visu­ally Impaired. He showed us how he, as a blind, “sees” the web. He got a screen reader and it was my first time lis­ten­ing to one such equip­ment. I was stunned. The voice was fast, so insanely fast that any­one but a trained ear like his could understand.

So he went on and told us many HTML web­sites aren’t prop­erly acces­si­ble to blind people.

Take for instance those ugly HTML web sites that pile texts and news every­where pos­si­ble in the lay­out. Even at an alien-like voice speed, the screen reader seems to talk with­out cease. Bet­ter yet : the menus in Javascript. Let’s assume we have a com­plex menu with a hun­dred but­tons nested in cat­e­gories. They are hid­den by javascript and show up only on rollover/click on a cat­e­gory in the main bar. On the other hand, the screen reader reads the HTML so it tells the whole hun­dred but­tons’ label. Unbearable.

Then he went onto some Flash web­sites and although the screen reader had to wait for some ani­ma­tions to end, it enu­mer­ated the but­tons and the con­tent cor­rectly. It is not an idyll though. HTML is still more sup­ported by screen read­ers in general.

Con­clu­sion

You can opti­mize your RIA for search engines with the solu­tion dis­cussed above. It increases acces­si­bil­ity as well because you expose HTML to the crawlers and peo­ple who don’t have the RIA plu­gin. Nev­er­the­less, bear in mind that its pri­mary objec­tive is not to be a replace­ment of the RIA itself.

Update 1 on 24 June 2008 : If you want to test an exam­ple with code, Ahmet wrote a nice arti­cle about this con­cept, based on what he found in the Flex direc­tory. There’s even a schema. :)

Update 2 on 6 July 2008 : The sit­u­a­tion has evolved quite a bit, with Google­bot attempt­ing to crawl dynamic data. You might want to read my arti­cle about that.


About this entry