Friday, November 6, 2009

Week 10 Readings

Web Search Engines - This was an interesting article, and I learned some new stuff about web search engines. Like the fact that crawlers need to deal with all sorts of issues - speed, politeness, excluded content, duplicate content and spam rejection. The main thing I took from the 2nd part was the fact that the web's vocabulary numbers in the hundreds of millions. This is due to the different languages, the new acronyms and the fact that people make up words all the time.

Current developments and future trends for the OAI protocol for metadata harvesting - I had never heard of the OAI (Open Archives Initiative) Protocol before this article. It seems to be a great idea though. As the article puts it, this initiative was started as a way to federate the access to different online archives. The article focused on three archives in particular that are still in development:
1. Open Language Archives Community
2. Sheet Music Consortium
3. National Science Digital Library
Each of these archives is unique in it's scope and mission and they all have their own problems to deal with when it comes to open access. The two main issues that affect the OAI itself are that of completeness and discoverability. They want to make it easy for users to navigate between the different repositories and they also want as much information available to the user that they can.

The Deep Web: Surfacing Hidden Value - I admit I found this to be the most interesting of the three readings. This article stated that most search engines such as Google and Yahoo are only searching the surface of the information available on the web. Apparently there is 7,500 terabytes of information in the Deep Web and there is one service that will search through all of it - that is Bright Planet. I still don't know what to think of this assertion. Would searching the deep web make the results more relevant to the user, or will it simply dilute the rest of the information on the surface?

1 comment:

  1. I agree that the last article was the most interesting. I had the same thought about pulling up so much information from the deep web. It seems to me that this would fill results with information that is not relavent to anyone but the creator.

    ReplyDelete