RIA, SEO and deep linking
This post is an analysis of the current situation about those entwined matters, quite uselessly controverted I dare say. The technology I’ll focus on is Flash/Flex but it applies for Silverlight and other platforms as well.
In this article, when I talk about different “views”, I mean different technologies used to display some content, be it Flash, Silverlight, Java FX, etc.
Content crawling
Earlier practises
The idea of presenting a multiple views (or aspects) for some content isn’t new at all. It has been a common practise for a some web agencies to develop two websites for their clients, one in HTML and one in Flash. The very purpose of that is to be (supposedly) accessible to 100% of the web users and also to be search engine optimized.
The Flash view would provide more interactive content to the visitors on broad band whilst the HTML website would be focused on clarity, quick information look up and accessibility. Building a Flash and an HTML website for the same content (more or less) involves duplication of similar work BUT different skills. Such a strategy consumes more time and more people, more resources. In return, you can optimize your HTML view for search engines and page ranking for the benefits of the Flash view.
The huge advantage of this is that your dynamic (and static) HTML content is crawled by search engines and you can propose a redirection to the visitors who may like to browse the Flash view instead.
Recent evolution
Ever since Google and other search engines were able to look into .swf files, some people have thought “fine ! all the content within my Flash website will be indexed !”. They need to know what is static and dynamic content. They also need to know the very basic thing : how an RIA works.
Static content is embedded content, you find it in the resulting file, .swf for Flash or Flex. Dynamic content is loaded from external sources like databases, so data is NOT in the application itself.
Most of the RIAs work dynamically. Search engines CANNOT index dynamic content, meaning directly from the databases. What do search engines see when they look into my Flash web site then ? Almost nothing. They can see the title and description and a few static things in the Flash library. That’s all.
Nowadays
It is often the case currently that Flash web sites remain in Flash alone for the HTML view is a loss of time compared to its development cost. With the rising of RIAs on the web, more people are looking for SEO solutions.
A recent trend among Flash and Flex developers is to talk about some ultimate SEO weapon called XSLT. It’s actually not a big deal. However, we can use it to make what was a huge bother before : a second view of the same data model.
First of all, XSLT works on XML documents and is quite powerful to transform them into any result you need, like HTML. In most cases however, the web site’s data is stored in a relational SQL database so it’s an extra step to retrieve data and transform it into XML to then feed it to XSLT. And logically, you’d use the same XML for your RIA but it’s not mandatory indeed.
Even though XLST transforms data, you still have to write and test the whole thing and XSLT is quite touchy, there’s no room for errors. Depending on the complexity of your output, it may be quite a pain. The point is that it still consumes less time than the former HTML view development I discussed about BECAUSE the raison d’être of the HTML version for RIAs is not the same anymore. The XSLT output is meant for search engine bots and accessibility.
If you look at the HTML view from the current XSLT implementations, it’s very simple, even without layout sometimes. Standard users without the Flash Player wouldn’t want to surf on such plain and boring HTML. So why isn’t it focused on those users ? Because it would be the same as the former solution. The purpose of the XSLT implementation here is to provide an average solution, a first step towards SEO for RIAs on the web.
To my knowledge, there aren’t many websites with such an implementation yet. A “reference” in the matter is the Flex directory. If you display the HTML sources, you’ll see numerous div layers containing the directory data.
Deep linking
Talking about deep linking a few years ago was only a mean to emphasize that Flash websites just couldn’t do it.
The solution has come with Actionscript’s ExternalInterface : anchors handled by Javascript for deep linking.
Currently, this method is more and more used in Flash and Flex web applications for it’s quite easy to implement. As stated above, deep linking is achieved through HTML anchors. Why is that ? Because anchors avoid the browser from reloading the window, thus restarting your whole web application. However, anchors aren’t the perfect answer to the problem.
Consider this deep linking example :
HTML web site : www.yoursite.com/article/roundup-is-lethal
RIA web site : www.yoursite.com/#article/roundup-is-lethal
Obviously, the first URL is better. There’s no workaround at the moment but it’s a little sacrifice considering you can deep link an RIA. How it works ? Javascript listens to changes in the URL and notifies the RIA if something happens. The RIA can also call a Javascript method to update, or rather rewrite the URL when the RIA state changed.
Talking about Flash, there is a nice library called SWFAddress that provides you with those functionalities (for Flex, there is also a built-in library). Be aware that implementing deep linking in an RIA must be planned. It is not a lot of work but you have to decide what will be deep linked or not and architect your application accordingly.
Search engine bots
Detection
The strategy above requires to detect search engine bots. You need to know who’s coming on your web application in order to serve either HTML content or the rich application itself.
There is no absolute mean to detect every single search engine bot because there are thousands of them. So you will probably use a smaller listing of the most important search engines bots on the web.
Fake user agent
Ineluctably, the ways to detect the real user agent of a given visitor aren’t perfect. The motto for security related matters is to minimize most of the critical weaknesses, not to build the perfect defense. It is possible to verify the identity of a user agent to some extent, and that will be sufficient for most of the evil and violent crawlers. You can do it with DNS lookup and MySQL caching for instance.
Search engine optimization
Cloaking
First of all, I saw a lot of talks about cloaking when it comes to provide both Flash and HTML views of a website. It is often used to introduce the killing line “it could get you banned from Google !!”.
I even found a redundant post among several SEO sites that happily emphasize that SWFObject is considered “dangerous”. If that statement is true, many Flash websites have a problem as SWFObject is famous right now. However, those alerting posts you can read are all dated from 2007 and many things change in one year so it might not be dangerous anymore.
Google bot (I don’t know for the other bots) dislikes that we hide content, be it by css or like SWFObject does, by using DIV layers to hide the HTML under the Flash layer.
Nevertheless, if your content in HTML matches what you can find in your Flash view, there shouldn’t be any problem. But you’re still at risk.
Rumors and gossip
As I was looking up the web for insights about Flash and SEO, I read on Google’s forum some comments whose author sometimes doesn’t know what he/she’s talking about, hindering the discussion about how to better designs so that RIAs can be search engine optimized.
It is a fact that human beings dislike changes. It is also a fact that people often do not seek further than what is affirmed. They like an idea or not and if you confront them, they will fight for it. In case of the SEO community, talking about Flash and SEO will never fail to generate a few worthless comments like “consider using gifs instead of Flash”.
To the people who still do not understand and keep thinking about the current situation as a war between HTML and Flash, I’ll write this : given a situation, a context, a full-flash website or RIA is a relevant answer to a problem, as it might not be in another situation. That’s all there is to it.
It’s no use telling “Try to use Flash only where it is needed”. You can find this very “advice” on Google’s website but it’s quite stupid to write that under the “Best use of flash” title. The same statement about “using something only when needed” is valid for everything anyway.
RIA and search engine
With the spread of RIAs, search engines are likely to collaborate with Adobe and Microsoft as Google did with the Adobe Search Engine SDK. It’s merely the beginning of a new era of visual content though. Some day, I believe search engines will provide support and crawl dynamic content, with the collaboration of both the data holders and indexers. On top of this, video contents are spreading on the web. Indexing their content (not only their titles as Dailymotion, Youtube and the likes do) is going to be one of the next challenges for search engines too.
Accessibility
Upsetting matter
Why talk about accessibility now of all times ? It’s not totally off-topic. SEO is a kind of accessibility for search engines after all.
As a part of the talks about ergonomic designs in Flash, accessibility is something everyone knows but vaguely. You can easily find a few people who argue that screen readers can’t read Flash content, the bottom line being “Flash is bad, make your website in HTML because it’s accessible”.
First of all, Flash does have accessibility features. However, they’re often ignored because it’s even less likely designers will care for the blind people when their application is more visually-oriented than HTML websites.
Blind people
We had a meeting one month ago with the president of the Swiss Federation of the Blind and Visually Impaired. He showed us how he, as a blind, “sees” the web. He got a screen reader and it was my first time listening to one such equipment. I was stunned. The voice was fast, so insanely fast that anyone but a trained ear like his could understand.
So he went on and told us many HTML websites aren’t properly accessible to blind people.
Take for instance those ugly HTML web sites that pile texts and news everywhere possible in the layout. Even at an alien-like voice speed, the screen reader seems to talk without cease. Better yet : the menus in Javascript. Let’s assume we have a complex menu with a hundred buttons nested in categories. They are hidden by javascript and show up only on rollover/click on a category in the main bar. On the other hand, the screen reader reads the HTML so it tells the whole hundred buttons’ label. Unbearable.
Then he went onto some Flash websites and although the screen reader had to wait for some animations to end, it enumerated the buttons and the content correctly. It is not an idyll though. HTML is still more supported by screen readers in general.
Conclusion
You can optimize your RIA for search engines with the solution discussed above. It increases accessibility as well because you expose HTML to the crawlers and people who don’t have the RIA plugin. Nevertheless, bear in mind that its primary objective is not to be a replacement of the RIA itself.
Update 1 on 24 June 2008 : If you want to test an example with code, Ahmet wrote a nice article about this concept, based on what he found in the Flex directory. There’s even a schema.
Update 2 on 6 July 2008 : The situation has evolved quite a bit, with Googlebot attempting to crawl dynamic data. You might want to read my article about that.
About this entry
You’re currently reading “RIA, SEO and deep linking,” an entry on Théâtre magique
- Published:
- 06.05.08 / 10
- Category:
- Flash/Flex
4 Comments
Jump to comment form | comments rss [?] | trackback uri [?]