Google Search reaches out to Flash content

In my pre­vi­ous arti­cle about RIAs and SEO, I talked about a solu­tion to expose HTML pages through XSL trans­for­ma­tion of dynamic con­tent. I also men­tioned an attempt of Google to crawl Flash con­tent, inef­fi­cient as it could only read sta­tic data. Well now the sit­u­a­tion evolved and looks very promising.

Adobe, with it’s recent “open mania”, has lifted the SWF (and FLV) for­mat spec­i­fi­ca­tions and cre­ated the Open Screen project on the fly to help Adobe increase the Flash Player embed­ding into var­i­ous devices (any­thing with a screen basi­cally). Fol­low­ing this, Adobe has been work­ing in coop­er­a­tion with Google in order to make Flash more index­able by search engines.

Pros

Google was given a spe­cial ver­sion of the Flash Player so that its index­ing robots could retrieve data directly from a SWF. The player just behaves like a stan­dard human user and accesses data, writes it in a com­pre­hen­sive robot-language and gives it back to the robot. So Google­bot is now able to crawl dynamic data !

Soon Yahoo will fol­low in the ven­ture and prob­a­bly other ven­dors as well.

Here are a few arti­cles you’ll want to read :

Google sums it up with this short news :

Now that we’ve launched our Flash index­ing algo­rithm, web design­ers can expect improved vis­i­bil­ity of their pub­lished Flash con­tent, and you can expect to see bet­ter search results and snippets.

Cons

Ron Adler and Janis Stip­ins from Google relieve the designers :

Basi­cally, you don’t need to do any­thing. The improve­ments that we have made do not require any spe­cial action on the part of web design­ers or web­mas­ters. If you have Flash con­tent on your web­site, we will auto­mat­i­cally begin to index it, up to the lim­its of our cur­rent tech­ni­cal ability.

So they say. Oh wait. Per­haps it’s not that sim­ple. There are two impor­tant prob­lems. Ron and Janis said there are still three tech­ni­cal lim­i­ta­tions, two of them being the following.

1. Google­bot does not exe­cute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed.

If you’re a Flash designer, you’ll imme­di­ately think of SWFOb­ject which is actu­ally based on Javascript. Lots of Flash web sites use it. We don’t know whether it is of a type that won’t be exe­cuted by Googlebot.

2. We cur­rently do not attach con­tent from exter­nal resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will sep­a­rately index that resource, but it will not yet be con­sid­ered to be part of the con­tent in your Flash file.

It is a major prob­lem because a lot of Action­Script devel­op­ers use a light SWF files that loads the rest of the appli­ca­tion on demand. Sec­tions are often bro­ken down into sev­eral mod­ules, some­times meant for re-usability (an Object Ori­ented convenience).

Search­a­bil­ity

Andrea Hill wrote some inter­est­ing thoughts on this topic.

Another major chal­lenge in open­ing appli­ca­tions up to search is being able to direct the searcher to the rel­e­vant sec­tion within the experience.

I couldn’t agree more with that. There is a huge dif­fer­ence between RIAs and HTML in that the RIAs aren’t acces­si­ble auto­mat­i­cally, you have to decide what is to be acces­si­ble. Adobe’s answer is in the SWF Search­a­bil­ity FAQ (listed above) :

To gen­er­ate URLs at run­time that reflect the spe­cific state of SWF con­tent or RIA, devel­op­ers can use Adobe Flex com­po­nents that will update the loca­tion bar of a browser win­dow with the infor­ma­tion that is needed to recon­struct the state of the application.

For com­plex sites that have a finite num­ber of entry points, you can high­light the spe­cific URLs to a search spi­der using tech­niques such as site map XML files. Even for sites that use a sin­gle SWF, you can cre­ate mul­ti­ple HTML files that pro­vide dif­fer­ent vari­ables to the SWF and start your appli­ca­tion at the cor­rect sub­sec­tion. By cre­at­ing mul­ti­ple entry points, you can get the ben­e­fits of a site that is indexed as a suite of pages but still only need to man­age one copy of your appli­ca­tion. For more infor­ma­tion on deep-linking best prac­tices, visit www.sitemaps.org/faq.php.

With all this in mind, it is not safe yet to think Flash web sites will be indexed eas­ily. Google is already work­ing on the lim­i­ta­tions of the Google­bot and I’m look­ing for­ward to hear­ing news of their progresses.


About this entry