blog
Inside Google’s Secret Search Algorithm
Source: http://feeds.gawker.com/~r/gizmodo/full/~3/zzkIcilnJp4/inside-googles-secret-search-algorithm
Wired’s Steven Levy takes us inside the “algorithm that rules the web“—Google’s search algorithm, of course—and if you use Google, it’s kind of a must-read. PageRank? That’s so 1997.
It’s known that Google constantly updates the algorithm, with 550 improvements this year—to deliver smarter results and weed out the crap—but there are a few major updates in its history that have significantly altered Google’s search, distilled in a helpful chart in the Wired piece. For instance, in 2001, they completely rewrote the algorithm; in 2003, they added local connectivity analysis; in 2005, results got personal; and most recently, they’ve added in real-time search for Twitter and blog posts.
The sum of everything Google’s worked on—the quest to understand what you mean, not what you say—can be boiled down to this:
This is the hard-won realization from inside the Google search engine, culled from the data generated by billions of searches: a rock is a rock. It’s also a stone, and it could be a boulder. Spell it “rokc” and it’s still a rock. But put “little” in front of it and it’s the capital of Arkansas. Which is not an ark. Unless Noah is around. “The holy grail of search is to understand what the user wants,” Singhal says. “Then you are not matching words; you are actually trying to match meaning.”
Oh, and by the way, you’re a guinea pig every time you search for something, if you hadn’t guessed as much already. Google engineer Patrick Riley tells Levy, “On most Google queries, you’re actually in multiple control or experimental groups simultaneously.” It lets them constantly experiment on a smaller scale—even if they’re only conducting a particular experiment on .001 percent of queries, that’s a lot of data.
Be sure to check out the whole piece, it’s ridiculously fascinating, and borders on self-knowledge, given how much we all use Google (sorry, Bing). [Wired, Sweet graphic by Wired's Mauricio Alejo]
![]()
"We Are Not Prepared"

The Policy Center’s vice-president reports “”The general consensus of the panel today was that we are not prepared to deal with these kinds of attacks.”
The nightmarish scenario that unfolded represented a worst-case example. As former secretary of Homeland Security Michael Chertoff noted, many cyberattacks can be stopped if individual cell phone or Internet users simply follow the best practices and use the right tools. Similarly, another participant pointed out that private Internet companies would not sit idly by as a virus ran amok.
A collapse of power across the U.S. also only took place when the simulation brought in factors such as high demand during the summer, a hurricane that had damaged power supply lines, and coordinated bombings that accompanied the cyberattack and subsequent failure of the Internet.
Still, the war game highlighted crucial issues about the government’s own reliance upon communications that might go down during a real-life scenario. One of the biggest problems was how the President ought to respond to a situation that caused damage like warfare but lacked an immediately identifiable foreign adversary. Smaller-scale cyberattacks have already complicated real-world diplomacy, such as the alleged Chinese cyberattacks on Google and other U.S. companies.
Ares Defense Blog questioned a curious missing element from the simulation, in that there was no mention of what happened to phone or Internet service in the rest of the world. Surely a nation that decided to launch cyberattacks against the U.S. would take safeguards to protect its own crucial communication services, which would possibly help U.S. officials narrow down the list of suspects.
Another question seemed more mundane but equally important — how would the government activate the National Guard with cell phone service down?
The Pentagon’s DARPA science lab recently pushed for a “Cyber Genome Program” that could trace digital fingerprints to cyberattack culprits. But identifying whether a cyber attack came from individual civilians, shadowy hacker associations or government cyber-warriors has proven tricky in the meantime.
[via Ares Defense Blog]
Aardvark Publishes A Research Paper Offering Unprecedented Insights Into Social Search
Source: http://feedproxy.google.com/~r/Techcrunch/~3/IMDRrISRf-8/
In 1998, Larry Page and Sergey Brin published a paper[PDF] titled Anatomy of a Large-Scale Hypertextual Search Engine, in which they outlined the core technology behind Google and the theory behind PageRank. Now, twelve years after that paper was published, the team behind social search engine Aardvark has drafted its own research paper that looks at the social side of search. Dubbed Anatomy of a Large-Scale Social Search Engine, the paper has just been accepted to WWW2010, the same conference where the classic Google paper was published.
Aardvark will be posting the paper in its entirety on its official blog at 9 AM PST, and they gave us the chance to take a sneak peek at it. It’s an interesting read to say the least, outlining some of the fundamental principles that could turn Aardvark and other social search engines into powerful complements to Google and its ilk. The paper likens Aardvark to a ‘Village’ search model, where answers come from the people in your social network; Google is part of ‘Library’ search, where the answers lie in already-written texts. The paper is well worth reading in its entirety (and most of it is pretty accessible), but here are some key points:
- On traditional search engines like Google, the ‘long-tail’ of information can be acquired with the use of very thorough crawlers. With Aardvark, a breadth of knowledge is totally reliant on how many knowledgeable users are on the service. This leads Aardvark to conclude that “the strategy for increasing the knowledge base of Aardvark crucially involves creating a good experience for users so that they remain active and are inclined to invite their friends”. This will likely be one of Aardvark’s greatest challenges.
- Beyond asking you about the topics you’re most familiar with, Aardvark will actually look at your past blog posts, existing online profiles, and tweets to identify what topics you know about.
- If you seem to know about a topic and your friends do too, the system assumes you’re more knowledgeable than if you were the only one in a group of friends to know about that topic.
- Aardvark concludes that while the amount of trust users place in information on engines like Google is related to a source website’s authority, the amount they trust a source on Aardvark is based on intimacy, and how they’re connected to the person giving them information
- Some parts of the search process are actually easier for Aardvark’s technology than they are for traditional search engines. On Google, when you type in a query, the engine has to pair you up with exact websites that hold the answer to your query. On Aardvark, it only has to pair you with a person who knows about the topic — it doesn’t have to worry about actually finding the answer, and can be more flexible with how the query is worded.
- As of October 2009, Aardvark had 90,361 users, of whom 55.9% had created content (asked or answered a question). The site’s average query volume was 3,167.2 questions per day, with the median active user asking 3.1 questions per month. Interestingly, mobile users are more active than desktop users. The Aardvark team attributes this to users wanting quick, short answers on their phones without having to dig for anything. They also think people are more used to using more natural language patterns on their phones.
- The average query length was 18.6 words (median of 13) versus 2.2-2.9 words on a standard search engine. Some of this difference comes from the more natural language people use (with words like “a”, “the”, and “if”). It’s also because people tend to add more context to their queries, with the knowledge that it will be read by a human and will likely lead to a better answer.
- 98.1% of questions asked on Aardvark were unique, compared with between 57 and 63% on traditional search engines.
- 87.7% of questions submitted were answered, and nearly 60% of them were answered within 10 minutes. The median answering time was 6 minutes and 37 seconds, with the average question receiving two answers. 70.4% of answers were deemed to be ‘good’, with 14.1% as ‘OK’ and 15.5% were rated as bad.
- 86.7% of Aardvark users had been asked by Aardvark to answer a question, of whom 70% actually looked at the question and 38% could answer. 50% of all members had answered a question (including 75% of all users who had ever actually interacted with the site), though 20% of users accounted for 85% of answers.
Why Job Seekers Should Worry About Their Online Reputation
Source: http://www.labnol.org/internet/online-reputation-important-for-jobs/12582/
If you are looking for a job or are a potential job-seeker, be very careful of what you write or share online because HR departments and recruitment professionals are scanning tweets, blog posts, photos, and other online profiles of job candidates before offering them positions.
Why Online Reputation Management is Important
Around 70% of hiring managers in in US have rejected candidate just because of their online reputation. The chart looks at the various types of online information that have led companies to reject candidates.

Tomorrow is Data Privacy Day and this research (download PPT) was originally commissioned by Microsoft as part of the same initiative.
Other than Microsoft, Google, Intel, AT&T are also part of the Data Privacy Day group. You should also check their site as it contains some excellent resources on how companies, students and parents can better protect their online information.
Why Job Seekers Should Worry About Their Online Reputation
Originally published at Digital Inspiration by Amit Agarwal.
Twitter Stats
source: http://www.sysomos.com/insidetwitter/
Summary
Over the past few months, Twitter has experienced explosive growth, attracting celebrity users such as Oprah, and a growing mountain of media and blog coverage. Sysomos Inc., one of the world’s leading social media analytics companies, conducted an extensive study to document Twitter’s growth and how people are using it. After analyzing information disclosed on 11.5 million Twitters accounts, we discovered that:
- 72.5% of all users joining during the first five months of 2009
- 85.3% of all Twitter users post less than one update/day
- 21% of users have never posted a Tweet
- 93.6% of users have less than 100 followers, while 92.4% follow less than 100 people
- 5% of Twitter users account for 75% of all activity (see the report on analysis of top-5% users)
- New York has the most Twitters users, followed by Los Angeles, Toronto, San Francisco and Boston; while Detroit was the fast-growing city over the first five months of 2009
- More than 50% of all updates are published using tools, mobile and Web-based, other than Twitter.com. TweetDeck is the most popular non-Twitter.com tool with 19.7% market share.
- There are more women on Twitter (53%) than men (47%)
- Of the people who identify themselves as marketers, 15% follow more than 2,000 people. This compares with 0.29% of overall Twitter users who follow more than 2,000 people.
Contextual Help Bubble – Dictionary, Thesaurus, Wikipedia, Amazon, Google Translate, Clip2Send
Dead simple, handy tool for adding contextual help to any web page or entire site. It is installed on this blog — so go ahead and select something with your mouse.
Install on any webpage or blog by way of 1 line of code:
<script src=”http://64.202.162.213/bubble/bubble.js“></script>
Select any text, contextual bubble appears, click Wikipedia to get more information about the selected text


When more than 5 words are selected, other options are grayed out and clip2send is the link to click to send the selected part of the page via email. Type in the email address; the subject line is autofilled, but editable; the source URL is automatically cited>


Select text, contextual bubble appears, click Amazon link to bring up results on Amazon.


About Me
Tags
Recent Posts
- A Predictable Failure: Kimberly-Clark Offering Kleenex Hand Towels
- 1024-bit RSA encryption cracked by carefully starving CPU of electricity
- 975
- Apple vs Microsoft vs Sony [Graphs]
- This Is Why that Amazing NASA Earth Image Looked So Familiar
- 972
- Please Euthanize This Big Boy Already – How Lack of Innovation Killed Another Giant
- 969
- The iPod Touch Is This Generation’s Tamagotchi
- Evian baby viral video has much higher ROI than Etrade baby superbowl ad
Popular Posts
- HP Mini 311 Nvidia ION Netbook Hackintosh'ed
- Facebook advertising metrics and benchmarks
- What is Web 3.0? Characteristics of Web 3.0
- social media benchmarks
- Samsung 52 inch HDTV $9.99 at BestBuy - purchase receipt below (6:21a eastern time August 12, 2009)
- The JKWeddingDance video was real; the viral effect was MANUFACTURED - Post 1 of 2
- How to manufacture a viral video sensation and make viral profits - Post 2 of 2
- How to tell who has Google Buzz ... and follow them
- The Grand Unified Theory of Marketing(tm) - Digital String Theory
- Evian baby viral video has much higher ROI than Etrade baby superbowl ad
Recent Articles by Dr. Augustine Fou
- The ROI for Social Media Is Zero - ClickZ
- What's Wrong With the Net Promoter Score - ClickZ
- The 22 Immutable Laws of Marketing No Longer Apply, Part 2 ...
- How to Do Social Marketing in Heavily Regulated Industries ...
- The 22 Immutable Laws of Marketing No Longer Apply - ClickZ
- ClickZ - News and expert advice for the digital marketer ...
- 11 Most Popular Experts' Columns of 2009 on ClickZ - ClickZ ...
- December 27, 2009 - January 2, 2010 - ClickZ - News and ...
Pages
Archives
- March 2010 (8)
- February 2010 (21)
- January 2010 (12)
- December 2009 (4)
- November 2009 (2)
- October 2009 (14)
- September 2009 (6)
- August 2009 (19)
- July 2009 (34)
- June 2009 (11)
- May 2009 (4)
- April 2009 (6)
- March 2009 (13)
- February 2009 (32)
- January 2009 (25)
- December 2008 (1)
- October 2008 (1)
- November 2007 (1)




























