entirety

Source: http://gizmodo.com/5882027/sharing-with-friends-of-friends-on-facebook-exposes-you-to-150000-people

Sharing with "Friends of Friends" on Facebook Exposes You to 150,000 PeopleWell this is mildly terrifying: according to a new Pew study, the Facebook privacy mode a lot of us rely on for photos and status updates is, on average, anything but private. Time to reconsider your settings, everyone.

The finding is staggering—Friends of Friends can hit as many as over seven million people:

Facebook users can reach an average of more than 150,000 Facebook users through their Facebook friends; the median user can reach about 31,000 others. At two degrees of separation (friends-of-friends), Facebook users in our sample can on average reach 156,569 other Facebook users. However, the relatively small number of users with very large friends lists, who also tended to have lists that are less interconnected, overstates the reach of the typical Facebook user. In our sample, the maximum reach was 7,821,772 other Facebook users. The median user (the middle user from our sample) can reach 31,170 people through their friends-of-friends.

When you think friend of a friend, the IRL analogue comes to mind. Your buddy’s buddy. That guy you met at a bar who seems okay. Your girlfriend’s pals from college. They must be okay people, right? They’re so narrowly removed from you, why not share all your photos with them?

Because 150,000+ people includes a hell of a lot of strangers you probably shouldn’t trust, and certainly don’t (and will never) know personally. You can read the study in its entirety below. [Pew]

PIP Facebook Users 2.3.12

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Friday, February 3rd, 2012 Uncategorized No Comments

How Google Crunches All That Data

Source: http://gizmodo.com/5495097/how-google-crunches-all-that-data

If data centers are the brains of an information company, then Google is one of the brainiest there is. Though always evolving, it is, fundamentally, in the business of knowing everything. Here are some of the ways it stays sharp.

For tackling massive amounts of data, the main weapon in Google’s arsenal is MapReduce, a system developed by the company itself. Whereas other frameworks require a thoroughly tagged and rigorously organized database, MapReduce breaks the process down into simple steps, allowing it to deal with any type of data, which it distributes across a legion of machines.

Looking at MapReduce in 2008, Wired imagined the task of determining word frequency in Google Books. As its name would suggest, the MapReduce magic comes from two main steps: mapping and reducing.

The first of these, the mapping, is where MapReduce is unique. A master computer evaluates the request and then divvies it up into smaller, more manageable “sub-problems,” which are assigned to other computers. These sub-problems, in turn, may be divided up even further, depending on the complexity of the data set. In our example, the entirety of Google Books would be split, say, by author (but more likely by the order in which they were scanned, or something like that) and distributed to the worker computers.

Then the data is saved. To maximize efficiency, it remains on the worker computers’ local hard drives, as opposed to being sent, the whole petabyte-scale mess of it, back to some central location. Then comes the second central step: reduction. Other worker machines are assigned specifically to the task of grabbing the data from the computers that crunched it and paring it down to a format suitable for solving the problem at hand. In the Google Books example, this second set of machines would reduce and compile the processed data into lists of individual words and the frequency with which they appeared across Google’s digital library.

The finished product of the MapReduce system is, as Wired says, a “data set about your data,” one that has been crafted specifically to answer the initial question. In this case, the new data set would let you query any word and see how often it appeared in Google Books.

MapReduce is one way in which Google manipulates its massive amounts of data, sorting and resorting it into different sets that reveal new meanings and have unique uses. But another Herculean task Google faces is dealing with data that’s not already on its machines. It’s one of the most daunting data sets of all: the internet.

Last month, Wired got a rare look at the “algorithm that rules the web,” and the gist of it is that there is no single, set algorithm. Rather, Google rules the internet by constantly refining its search technologies, charting new territories like social media and refining the ones in which users tread most often with personalized searches.

But of course it’s not just about matching the terms people search for to the web sites that contain them. Amit Singhal, a Google Search guru, explains, “you are not matching words; you are actually trying to match meaning.”

Words are a finite data set. And you don’t need an entire data center to store them—a dictionary does just fine. But meaning is perhaps the most profound data set humanity has ever produced, and it’s one we’re charged with managing every day. Our own mental MapReduce probes for intent and scans for context, informing how we respond to the world around us.

In a sense, Google’s memory may be better than any one individual’s, and complex frameworks like MapReduce ensure that it will only continue to outpace us in that respect. But in terms of the capacity to process meaning, in all of its nuance, any one person could outperform all the machines in the Googleplex. For now, anyway. [Wired, Wikipedia, and Wired]

Image credit CNET

Memory [Forever] is our week-long consideration of what it really means when our memories, encoded in bits, flow in a million directions, and might truly live forever.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Wednesday, March 17th, 2010 news No Comments

Aardvark Publishes A Research Paper Offering Unprecedented Insights Into Social Search

Source: http://feedproxy.google.com/~r/Techcrunch/~3/IMDRrISRf-8/

In 1998, Larry Page and Sergey Brin published a paper[PDF] titled Anatomy of a Large-Scale Hypertextual Search Engine, in which they outlined the core technology behind Google and the theory behind PageRank. Now, twelve years after that paper was published, the team behind social search engine Aardvark has drafted its own research paper that looks at the social side of search. Dubbed Anatomy of a Large-Scale Social Search Engine, the paper has just been accepted to WWW2010, the same conference where the classic Google paper was published.

Aardvark will be posting the paper in its entirety on its official blog at 9 AM PST, and they gave us the chance to take a sneak peek at it. It’s an interesting read to say the least, outlining some of the fundamental principles that could turn Aardvark and other social search engines into powerful complements to Google and its ilk. The paper likens Aardvark to a ‘Village’ search model, where answers come from the people in your social network; Google is part of ‘Library’ search, where the answers lie in already-written texts. The paper is well worth reading in its entirety (and most of it is pretty accessible), but here are some key points:

  • On traditional search engines like Google, the ‘long-tail’ of information can be acquired with the use of very thorough crawlers. With Aardvark, a breadth of knowledge is totally reliant on how many knowledgeable users are on the service. This leads Aardvark to conclude that “the strategy for increasing the knowledge base of Aardvark crucially involves creating a good experience for users so that they remain active and are inclined to invite their friends”. This will likely be one of Aardvark’s greatest challenges.
  • Beyond asking you about the topics you’re most familiar with, Aardvark will actually look at your past blog posts, existing online profiles, and tweets to identify what topics you know about.
  • If you seem to know about a topic and your friends do too, the system assumes you’re more knowledgeable than if you were the only one in a group of friends to know about that topic.
  • Aardvark concludes that while the amount of trust users place in information on engines like Google is related to a source website’s authority, the amount they trust a source on Aardvark is based on intimacy, and how they’re connected to the person giving them information
  • Some parts of the search process are actually easier for Aardvark’s technology than they are for traditional search engines. On Google, when you type in a query, the engine has to pair you up with exact websites that hold the answer to your query. On Aardvark, it only has to pair you with a person who knows about the topic — it doesn’t have to worry about actually finding the answer, and can be more flexible with how the query is worded.
  • As of October 2009, Aardvark had 90,361 users, of whom 55.9% had created content (asked or answered a question). The site’s average query volume was 3,167.2 questions per day, with the median active user asking 3.1 questions per month. Interestingly, mobile users are more active than desktop users. The Aardvark team attributes this to users wanting quick, short answers on their phones without having to dig for anything. They also think people are more used to using more natural language patterns on their phones.
  • The average query length was 18.6 words (median of 13) versus 2.2-2.9 words on a standard search engine.  Some of this difference comes from the more natural language people use (with words like “a”, “the”, and “if”).  It’s also because people tend to add more context to their queries, with the knowledge that it will be read by a human and will likely lead to a better answer.
  • 98.1% of questions asked on Aardvark were unique, compared with between 57 and 63% on traditional search engines.
  • 87.7% of questions submitted were answered, and nearly 60% of them were answered within 10 minutes.  The median answering time was 6 minutes and 37 seconds, with the average question receiving two answers.  70.4% of answers were deemed to be ‘good’, with 14.1% as ‘OK’ and 15.5% were rated as bad.
  • 86.7% of Aardvark users had been asked by Aardvark to answer a question, of whom 70% actually looked at the question and 38% could answer.  50% of all members had answered a question (including 75% of all users who had ever actually interacted with the site), though 20% of users accounted for 85% of answers.
Information provided by CrunchBase


Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Tuesday, February 2nd, 2010 digital No Comments

Facebook is going down – pageviews, average stay, pages per visit – why?

From the Compete charts below, it is clear that Facebook is seeing a decline in pageviews, average stay, and pages per visit.  But why?

I know that I have reduced the time I spend on Facebook and I have also reduced the number of messages and other social actions as well.  And I have deleted virtually all of my personal and family photos and will not upload any more. These may be the first signs of a waning of Facebook due to a number of factors.

I can’t get my stuff back out

For example, Facebook has stated that it will not participate in OpenSocial because they do not want people to be able to export their content, conversations, photos, etc, out of Facebook and use on another social network. I am concerned that I will not be able to retrieve or back up content which I believe is mine. I like to have control over my family photos, conversations with friends, etc. I am willing to accept as a “cost” of using the Facebook system the fact that they know who my friends are.  But I am less willing or unwilling to continue putting my content where I cannot get it back, in its entirety.  (Google Docs, for example, just launched a feature where you can back up everything back out of Google Docs into Microsoft Office formats).

Ads in the stream, erosion of trust

A second issue mentioned in a previous post is the increase in advertising on Facebook and also the more unscrupulous practice of injecting ads “into the stream” — ads masquerading as status updates. These are harmful to the overall trust built up in the community and I have un-friended quite a few people whose accounts were clearly used to promote events, products, etc.

Ad-effectiveness sucks

From a prior post – http://bit.ly/EhiW9 – Facebook advertising metric are absolutely abysmal. They keep trying to sell advertisers on the hundreds of billions of pageviews they throw off. But advertisers are getting smarter and more and more of them will buy ads on a cost-per-click basis (instead of CPM, cost per thousand impressions basis).  This means that the ad revenues that Facebook enjoyed from gross INefficiencies will be decimated.


facebook-pageviews

facebook-average-stay

facebook-pages-per-visit

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Friday, October 30th, 2009 digital No Comments

Dr. Augustine Fou is Digital Consigliere to marketing executives, advising them on digital strategy and Unified Marketing(tm). Dr Fou has over 17 years of in-the-trenches, hands-on experience, which enables him to provide objective, in-depth assessments of their current marketing programs and recommendations for improving business impact and ROI using digital insights.

Augustine Fou portrait
http://twitter.com/acfou
Send Tips: tips@go-digital.net
Digital Strategy Consulting
Dr. Augustine Fou LinkedIn Bio
Digital Marketing Slideshares
The Grand Unified Theory of Marketing