Maybe scoble was right-human annotated search and curated data

Deepak at bbgm and Simon Brocklehurst pointed me to the uproar that scobles post a few months back caused. In his post Robert scoble  said search engines like Mahalo  and not google are the future of the internet . I am beginning to agree with some of what he said.

I want to explain myself with the following example. I have been reading up a lot on two subjects , one is ffmpeg the open source video and audio library and the other is on “mouse kappa light chain” sequences. In my own search for information on these specialized topics I have probbaly used google only about 20% of the time . The remaining 80% of the time I spent on reading the ffmpeg archives offered up by gmane the open source mailing list search project and my own search within my archived gmail label “ffmpeg”compiled from mailings in the ffmpeg-user newsgroup. Similarly for the mouse infoseek I spent almost no time on google , but instead trolled around the pubmed and ebi databases for all my relevant information.

The 20% google time was spent aggregating my information on google notebook besides doing quick searches on people or paper titles once I found a relevant source to find related content on the web.

All of this had me thinking that say the ffmpeg lead developers or community were to start a specialized manually annotated search repository . I would almost never turn to google for any of that domain information. The same holds true for the mouse sequence search !.

So the bottom line..I can start to see why manually annotated and curated search starts to be a big deal…because we all know that we rarely go beyond the first page for a google search result. The human expert just makes sure that the first page of any search is most likely to be relevant ..no matter what pagerank it has.

Advertisements

Ignite Boston – Tweaking, mixing competition and collaboration

As you probably saw from my post on processing, ignite-boston II was a blast. I was there talking about opening up science and bioscreencast.com. The talks at this Ignite were really interesting .

I especially liked the one by Ned Gulley from Mathworks who spoke of “Tweaking” a wiki-like matlab based programming contest. Tweaking is a great concept that effectively mixes collaboration with competition. In traditional code-jams and programing contests groups usually work in relative isolation to submit their best code for any given task which is then ranked at the very end and a winning group declared. In Tweaking all user entries are publicly displayed throughout the week that the contest runs. All entrants can modify and adapt each others code to move up the ladder. As each tweaked entry is submitted a secret test routine dynamically ranks the entry and it moves in the ranking. In Tweaking even single tweaks which result in a leap in functionality can move an entrant up the rankings and this opens up a whole trajectory for code improvement for all participants.To mix the wiki like collaborative element with code competitions seems to be a great way to enrich the coding experience for everyone.

Ned spoke of a series of projects , including one on SARS phylogeny and a lattice model protein folding simulation which tingled my structural biology neurons. Neds paper on tweaking talks about how tweaking is not only a fun learning experience for all participants, but also an interesting study into the nature of collaboration and the interplay of motivation , reward , collaboration and competition. If only all science projects worked like Tweaking.

Image Credit: Mathworks Tweaking protein folding competition

Ned blogs at the Starchamber, a blog whose Resident buzzwords are : synthetic biology, ambient displays, swarm robotics, wise crowds.

Processing and Visualizing Data

I first heard of processing , from a post on Natures Nacent blog in which Euan Addie talked of how terrible visualization tools in the bio-space are.

I heard Ben Frys talk at the Ignite Boston event where he introduced processing to the audience and also gave a breathtaking demo ( despite the contrast issues that the venue had) of the app. Especially amazing was his visualization of the HapMap data ( including one that was featured on Nature Magazines cover).

Ben is writing a book for oreilly called Visualizing data ( available through Oreilly rough cut) . I plan to start playing with processing and definitely recommend checking this space for the video link to his talk which should soon be up.

Image: The Processing book written by Casey Reas and Ben Fry

Processing website

This is the stuff of code cracking legend: cracking the San Jose semaphore

As a boy scout I was proud of my ability to signal morse code at a speed faster than I could type. But nothing prepared me for the excitement I felt reading about this example of code-cracking geekery.

This stuff so excites the code itch in me.

Its an article that I saw first on the wired blog called “Sleuths Break Adobe’s San Jose Puzzle, Find Pynchon Inside”.
The associated pdf description runs into 18 pages and makes a great read. Check it out and learn how two brilliant individuals cracked what the spinning orange disks conceived by artist Ben rubin were transmitting from atop the Adobe building in San Jose.

An eye for an IE – bioscreencast.com , internet explorer and the javascript jungle

Well bioscreencast.com now supports Internet explorer .. you can read all about this and other beta 0.2 enhancements on our blog.

Since one of the stated aims of this blog, is my desire to learn javascript, this post talks about javascript standards and the google web toolkit…

On bioscreencast, Suresh , our lone web ranger , using the amazing yui library, his web design  skills and tech wizardry, designed the site first beta site to play well on firefox and safari . Both these browsers are  closer to the ECMAscript standards than something like Internet Explorer (IE). Consequently our site initilly worked on firefox and safari.

Like most javascript centric UI’s , the site had to face up to the real problems associated with javascript i.e browser personalities . Javascript is famous for how it behaves somewhat differently depending on whether you are using a browser like Internet Explorer or something like Firefox and safari. The ECMA standard was a move to get people to agree on what is “javascript”. Despite the existence of this standard , its interesting that even two standards compliant browsers …dont necessarily treat your website code exactly the same way!. Which makes navigating the javascript jungle a crazy proposition.

This diversity , provided the justification for things like the google web toolkit. How the GWT works is simple, you code in Java ..the program sits on the server..and when the browser requests a particular URL , the gwt java app generates the javascript depending on the browser..so IE gets IE centric javascript code , firefox,  javascript that suites its palate etc etc…. SO for people who want the “AJAXy” sexiness that goes with javascript..you could just stick to java and still harness the dynamic characteristics of javascript.

So , to summarize, coding in javascript necessitates the ability to deal with many of its dialects..or throwing all that out and adopting the java Google web toolkit.

In any case..it turns out, with a few very minor tweaks to the javascript code , bioscreencast.com now plays very well with all browsers especially IE 6 and IE7.

Based on the feedback we have got from many people and from our analytics..it is amazing how many people are still using Internet explorer 5 and 6, and its good to know that our website now welcomes most browsers…

Put your brain to use: Galaxy Zoo

I caught this on Natures Nascent blog. Like re-CAPTCHA which I had blogged about before. This project uses the human brain to classify galaxies. You know the types , spiral , elliptical , merging etc etc.

The way it works is fun , you sign up, go through the tutorial , take the test ..If you get 8 out of 15 correct, you can start classifying galaxies. No worries, if you dont pass the test..you can just take it again , till you earn your stars .

Once you do , you can start classifying galaxies

A few things about the project capture my interest.

As as far as the codeitch, itch goes ..its amazing how much better the human brain is ,at recognizing patterns like the ones in the spiral galaxy above and telling it apart from an elliptical one. I know image processing algorithms are getting better and better as the days go ( I had my beginings in structural biology with 3d reconstruction of viruses from projection images and some single particle reconstruction)..but the human brain it seems still takes the prize…

The second amazing thing is , the galaxy zoo in just two days , classified a million images with community participation. And their servers are struggling to meet the load.
A big hurrah for public participation and open science indeed.

Powered by ScribeFire.

Image link from the Sloan Digital survey 

Out of the cradle and into a beta- Bioscreencast.com

I am very excited to announce the coming to fruition of a project that five of us have been working on for the last few months.Its a site based on screencasts called Bioscreencast.com. You can read more about the site on our bioscreencast blog post and at my Omics world blog.

The entire site was coded into life by one person, our head web geek and javascript junkie, Suresh.

As a wannabe coder, I came away amazed at the sheer power of the many open source libraries out there, the robustness of mysql databases, the sheer elegance of css..the swiss army knife like ffmpeg , the clumsiness of php, the list goes on. What made the whole process doubly enjoyable was that all five of us are relative web newbies.. and learning how these things work along the way was a lot of fun.

Watching Suresh work his web magic has made me want to learn more of the six technologies I want to master, and I have also added a few more to the list( more on this in future posts)

Just thought Id give my plug for Bioscreencast.com. I hope the life scientists out there ,  like the site,  and all of you will keep your feedback coming.

Links:

The Bioscreencast website

The Bioscreencast Blog

One of our co-conspirator Deepaks intro post

The entire site was coded and crafted by one person, our head geek Suresh .