Collecting New Media

Just another iSchool Blogs site

Archive for the 'Big Data' Category

13 February
Comments Off

Buying with #Hashtags

During our discussion yesterday, when Megan said that she thinks we won’t have USB ports in 5 years . . . I had a bit of a panic attack.  I told the people in my group about it, so we could have a little laugh and to show how my life experiences have made it more difficult for me to fit into the ischool community, and how I often feel like an “other” when it comes to talk of technology.  My immediate thought was  . . . . where will I put all of my photos? My downloaded financial papers?   I’m 3/4 of the way to being an “old” myself, so the thought of my financial information aggregated out on a service like dropbox and on someone else’s server  freaked me out.  After a moment I calmed down, sure that there will be other options that I will be comfortable with.

Funny thing is . . . I had another panic moment this morning when I was listening to the news . I happened to hear the end of a segment about American Express’ new purchasing with #hashtags program.  I only heard part of the commercial.  Amex Sync On Twitter – American Express – YouTube  I had heard about the Amex Synch program before, but the ability to make purchases on Twitter was news to me.  The panic stemmed from thoughts about how this can possibly be safe.  I was thinking . . . haven’t they seen the Farmer’s Insurance commercial where J.K. Simmons has all those dogs hanging off of him and he says that it is dangerous to leave boxes on your curb for trash pickup because it tells the thieves what you have that they can steal? Farmers Insurance “Dog Bites” | Great-Ads. Doesn’t publicly tweeting your purchases say: I’m a techy . . . I buy lots of gadgets . . . come break into my home?

I still haven’t gotten past the danger of letting everyone on twitter know when you buy things (the purchase tweets are public), but after looking into it some more I do feel a little better. Twitter and Amex to let you pay with a hashtag – CNN.com They don’t release any of your private/card information (which seems as safe as any other online purchase) and they require a confirmation tweet within 15 minutes of their reply to ensure that it wasn’t an accidental tweet.  Benefits include free 2 day shipping and supposedly the offers will all be great “deals”.

So this brings me to the conversation we had yesterday about the Library of Congress Twitter collection and how it should be curated and made accessible.  A program like this can turn a seemingly unimportant twitter feed like mine into an important marketing tool.  My purchasing habits on twitter can suddenly make every other tweet I’ve made (and anything those tweets might say about me) important to that corporation who wants to market their product to people in my precise demographic.   Just yesterday we said that we cannot tell today what private information on twitter might be important tomorrow . . . and this is a perfect example.

How will these companies possibly aggregate and analyze all of the information that they would get from the oodles tweets of people who will probably use this service?  I have no idea . . . but I guarantee you that there are companies out there right now who are talking about solutions to that problem.  If activity like this occurs more and more on Twitter (which I think is inevitable given that previously Twitter’s only means of income has been advertisements) I think we can expect to see large companies attempting to make deals with the Library of Congress, to get useful access to their twitter collection.

Share
14 April
Comments Off

More Collecting of Personal Financial Information

As if you didn’t have enough to worry about, yet another credit score is now being collected by your friends in the private sector financial services industry! As if the three separate reports that you can only access freely once a year from Experian, Equifax and TransUnion weren’t enough, now CoreLogic has released a new type of deeper credit score that delves even further into your financial transactions. The previous three reports each kept separate records of your loan payments and bankruptcies. Now with the new and improved CoreLogic report, every bill you were ever late on, every missed property tax payment, every payday loan, every homeowners association due, every child support payment, every rental payment, every health care bill, whether your mortgage is upside down, every everything about your financial life will now be available to anyone with a thick checkbook through the CoreLogic report. So now we have yet another reason to worry about every minor detail of modern financial transactions and it’s all because CoreLogic wanted to provide lenders with more information about you!

….um yeah right. CoreLogic provides residential property valuations, appraisals, renter risk screening, vehicle records, loan origination programs, in addition to the new credit score…and all the data is publicly available! Thus CoreLogic is yet another scumbag financial services corporation aggregating publicly available data on people and packaging it for easy consumption so that they can turn a profit. Nothing more nothing less. As a private corporation they don’t care about you or anyone as long as they make a buck on your information. It’s in essence the same trend we see in advertising with Google and Facebook, although at least they provide us with information and social contact. As far as I can tell, CoreLogic provides the public with nothing more than another headache.

All of these data aggregation companies have one thing in common – they represent the dark side of record collection. We often think of record keeping for good, but really, little good can come from this type of record…and once negative or fraudulent information gets on your record, you have to pay just for the opportunity to have them review it for possible removal. Needless to say, this can affect your ability to buy or even rent a home and puts many people in precarious or even destitute situations. Although the official company line provided by CoreLogic is that the “added information could help consumers with thin credit files by illustrating positive behaviors elsewhere”, this is mostly just a way for another ethically questionable company to screw over poor people who cant afford and/or don’t know how to fight their money grubbing tactics when they collect information kept by supposedly diligent record keepers. I honestly believe that the best policy toward these types of records may come from Fight Club – we need to blow them all up so everyone can start from absolute zero!

Share
31 March
Comments Off

The Ultimate Collection

I had a three hour layover in Houston the other day and decided to pick up a copy of Wired Magazine. I used to subscribe to Wired in the late 1990′s but became disenchanted with the rag after ludicrous predictions of a never ending economic boom failed to materialize. What can I say, I was in my mid-twenties and gullible. Older and wiser, I certainly harbor no such illusions these days.

Needless to say, such Pollyannaish predictions no longer grace the cover of any magazine either. The lead article in Wired’s April 2012 issue has a much darker theme – a new data center being built by the NSA (National Security Agency) in the Utah desert. This data center aims to collect every communication over the internet and any other communication channel via satellite communication collection and taps on US telecom switches. The data center can collect and store over a yottabyte, or 1024 bytes, of data – roughly the equivalent of 500 quintillion (500,000,000,000,000,000,000) pages of text. Code breakers inside the facility would then mine the data and crack encryption codes using processors created in Oak Ridge Tennessee, the former location of the Manhattan Project. Although most experts believe breaking the 128bit encryption algorithms used by email programs using currently available commercial processors “would likely take longer than the age of the universe”, “more messages from a given target” make it “more likely…for the computers to detect telltale patterns”. Considering that the Utah facility could hold every email ever sent, everyone’s email could be potentially be monitored here.

Moreover, increased processing speeds should allow code breakers to perform brute force attacks on encrypted data to discover the contents within. The processors being created at Oak Ridge have the capability of performing a quadrillion (1015) operations a second, or a petaflop – a top of the line Intel i7 3930 processor performs only 150 gigaflops (109 operations a second). Researchers at the Oak Ridge facility are working to make the processors even faster. According to the Wired article, their “next goal is to reach exaflop speed, one quintillion (1018) operations a second, and eventually zettaflop (1021) and yottaflop” (FLOPS). So the NSA facility in Utah can store a seemingly endless amount of data, while researchers in Tennessee create ultra fast processors to decrypt any individual’s, group’s or nation’s communications. And of course this taxpayer funded surveillance is all justified by the US Congress due to the prior regime’s nebulous war on terror and its continued implementation by the current regime.

Creating faster processors and collecting the world’s knowledge might be a noble service, but record collecting can also be extremely nefarious….and I sincerely doubt the NSA is interested in sharing their knowledge for the benefit of humanity. As someone who finds the NSA’s fascist tendencies of information monitoring beyond disturbing, I pray for my children’s sake and the sake of humanity that Wired is as wrong about this project and its feasibility as they were about the non-existent economic boom more than a decade ago.

Share
31 March
Comments Off

Collecting the World’s Data… and Then What?

The New York Times published a profile of Gilad Elbaz last weekend with the title Just the Facts. Yes, All of Them. Elbaz has founded a company called Factual with the intention of gathering and storing every fact in the world.

What exactly does he mean by “every fact in the world”?  From the article:

Geared to both big companies and smaller software developers, it includes available government data, terabytes of corporate data and information on 60 million places in 50 countries, each described by 17 to 40 attributes. Factual knows more than 800,000 restaurants in 30 different ways, including location, ownership and ratings by diners and health boards. It also contains information on half a billion Web pages, a list of America’s high schools and data on the offices, specialties and insurance preferences of 1.8 million United States health care professionals. There are also listings of 14,000 wine grape varietals, of military aircraft accidents from 1950 to 1974, and of body masses of major celebrities. Odd facts matter too, Mr. Elbaz notes.

Factual’s website allows visitors to sample some of their data sets. Some, such as the DonorChoose.org Resurces list, are impressively detailed. Some, like the American Idol Finalists or the List of Christmas television specials, are oddly incomplete and vague (no “Abed’s Uncontrollable Christmas”? For shame). For the American Idol list, what is the definition of a “noteworthy song,” who makes that call, and for what purposes might that information be more useful than a comprehensive list of every song performed? I don’t know why Factual would want to use these lists as examples of their data prowess when they have so many gaps. Maybe I’m unfairly picking on the Entertainment category. The Places data, which includes local business information from 50 countries, is fascinating.

The idea of gathering the world’s data is all very interesting, but I’m not sure I totally get Factual’s mission. Elbaz says, “What if you could spot any error, as soon as you wrote it?” and the article states that Factual’s plan is to “build the world’s chief reference point for thousands of interconnected supercomputing clouds.” That concept as I understand it—an interlinked reference database—is something I would like to see. I don’t know if that’s what Factual is, though. The uses described in the article are mostly commercial, which seems kind of a waste for such a powerful dataset. I’m much more interested in the potential for data analysis and identifying data trends and, in the future, open access.

Maybe licensing data for apps is just a means to an end for now—the company is not making a profit yet because Elbaz is reinvesting the money in more datasets, and they still have a long way to go before they get all the facts in the world. It seems like they have some brilliant minds and lots of money invested in the problem, so I’m interested to see how the company develops.

Finally, Elbaz says he hopes that people one day leave their data to science, which I think is an interesting idea for this class—is our legacy a set of data points?

Share
26 March
Comments Off

What Kind of Museum is This?

On the theme of museums this week I read about the Proteus Gowanus museum in Brooklyn.  Admittedly they do not strictly collect new media, but they do create some pretty interesting exhibits.  In one, of which I only saw a picture on foursquare and could not find any reference to on their website, it seems as though didactic materials are backed with Velcro and can be moved and rearranged by visitors.  Their rotating exhibits focus on particular themes, such as Mend, Travel, and Libraries.  Their space is part museum, archive, library, and hacker space.  This is their mission:

Proteus Gowanus seeks to create an alternative, culturally rich environment designed to stimulate the creative process; a place where the boundaries between the artist and non-artist fade, where images and ideas from disparate disciplines are juxtaposed to create new meanings. Through exhibitions, programs and publications, Proteus Gowanus invites artists, writers, workers in other disciplines and community members to participate in open-ended explorations. Proteus Gowanus acts as an interpreter of culture and place, deepening the community’s sense of context and connection.

                As a part of their current exhibition on migration, artist Sal Randolph has created an experiential project in which participants receive a “Psychogeographic Destination Kit” (which includes a pencil and a notebook) and a train ticket with instructions for an excursion.  Travel information was disseminated and collected by the Bureau of Unknown Destinations at the museum, and they are keeping an archives of travelers’ experiences on their adventures.  Unfortunately, they have already given out all of the train tickets they can (and we do not live in New York), but if you want you can download a traveler’s kit off of their website and begin to wander on your own.  All they ask is that you return evidence of your trip.   

This project got me thinking about our discussions of collecting experiences, which seems to be a large part of what the museum is trying to do.  Instead of trying to collect the experience of viewing a certain show or playing a certain video game, they are trying to make an archives of experience.  By initiating, storing, arranging, and providing access to a number of people’s personal adventures that are all based on the same premise, and recorded in a variety of media, the museum will have amassed a collection that records what can happen when people let themselves wander.

The project at Proteus Gowanus is more art installation than collection policy, but it seems to me to demystify the idea of collecting experiences.  Namely, if collecting experiences is our goal, then we should actively seek to make it happen.  In the vast sea of big data, meaning can come from collecting one or two people’s social media experiences.  It does not seem that different to me than collecting a few diaries of emigration in addition to a ship’s log—just on a much larger scale.  A well-rounded collection policy would combine the two in order to get a more complete picture of any specific historical moment or phenomenon.

Share
26 February
1Comment

Infographics as narrative

Starting in 2005, Nicolas Felton self reported details of his daily life and compiled them into pretty, bound annual reports (also available in poster form) at the end of the year, using  clean infographics to represent different aspects of his life. There have been numerous interviews and write ups about him but listening to Radiolab is the first I’d heard of it.

Felton documents encounters with people, modes and times in transit, groceries, meals etc. Building this data set allows him to tell the story of his year through “different lenses.” He can ask himself, How many waffles did I eat last year?
How much money did I spend on toothpaste?
How often did I dry clean that jacket?

Some of his data collection might be similar to what is documented in a journal but rather than looking back on a diary to remember the past week/month/year of ones life, he is aggregating and summarizing them visually.

But, unlike a journal, most of the things Felton documents  are not things people would typically record for later use, (who wants to know how many plants they killed in a year?) It is here too, that Felton sees as a huge benefit and intriguing opportunity in self reporting.

Reporting and representing this data, is, in his own words, his “favorite way of telling stories now, making things that are invisible or too large to be comprehended, making them visible…. A compelling and memorable way of revealing invisible stories.”

Felton attempted to encourage others to do the same with Daytum, an app that Felton and Ryan Case developed. Felton developed the app to encourage others to discover new things about themselves, which to me, also sounds like the purpose of journaling, self discovery.

In 2010, Felton’s father passed away, and he decided the 2010 report would be a representation of his father’s life. So Felton wrote a “biography in a different format, a more valid one, rooted in facts.” Felton used 4348 records and told the story using the data his father left behind.

An example of this might be:
Number of photos – 93
Percentage of photos where he wears a tie – 18%.

In the Radiolab interview, Felton addresses how the data he chose to include in his analysis and representation inherently shows his bias as a curator and the story he chose to tell about his father’s life.

All of this combined with Ford’s article about the lack of narrative in Facebook had me thinking about infographics and data visualzation as a new form of narrative.

Aaron Koblin’s TED talk describes different data visualization projects and the “interface” as a story telling mechanism, which seems to be taking infographics and making them dynamic and interactive. Kobin has worked on projects visualizing flight patterns across the US, and manipulations of the data to focus on aircraft type or altitude level. There was also a project tracking cell phone data to show the geographic movement of people into the city center to celebrate a holiday in Amsterdam. Are these new ways to represent information developing new forms of narrative?

Researching a bit into Facebook’s timeline feature/interface led me to discover that Felton and Ryan Case had been hired and were the source of inspiration for the new design. While I am unsure about the motivations behind ‘timeline’  its related to telling ones own story, but maybe its just about collecting more data.

Share