Friday, December 5, 2014

Week 14: Security, Privacy, and Cloud Computing

1. O'Harrow, R. (2005). Chapter 10. In No Place to Hide: Behind the Scenes of Our Emerging Surveillance Society (281-300). New York: Free Press.

  • electronic surveillance: future of data collection
  • transit cards monitor traffic, travel activity
  • hand readers @ workplaces instead of traditional punch cards
  • GPS, CCTV
  • tollbooths as security points
    • e-toll credits to verify location
  •  RFID (radio frequency id) @ heart of system
    • "virtual borders"
  • "why worry if you have nothing to hide"? --> awkward logic?
  • surveillance as defense/security ---> but v. what/who?

2. Jaeger, P., Lin, J., Grimes, J., & Simmons, S. (2009). Where is the cloud? Geography, economics, environment, and jurisdiction in cloud computing. First Monday, 14(5). http://firstmonday.org/ojs/index.php/fm/article/view/2456/2171

3. Library Data in the Cloud - National Information Standards Organization. (n.d.). Retrieved November 21, 2014, from http://www.niso.org/news/events/2014/virtual/data_in_the_cloud/

4.  Cloud Computing Online Training. (2014, Mar 3) Learning Cloud Computing With Amazon Web Services What Is The Cloud. Retrieved from https://www.youtube.com/watch?v=Neys3rci14o

  • cloud computing: large data centers with enough dynamism to make scalable for users
    • functionality depends on size and continuity:
    • efficient flow of data
  • although not familiar w/ term or unaware of own use, many ppl already involved in it
    • ex. Gmail, Flickr
  • "cloud" not just physical machines
    • also raises policy issues
  •  diff components
    • infrastructure
      • computational resources
      • storage
      • ex. Amazon Elastic Compute Cloud
    • platform
      • software stack
      • ex. Google App Engine
    • application
      • Web services running on top of cloud computing component


What is...?
  • cloud computing offers possible solutions to "Web-scale" challenges in processing data
  • commercialization of "utility computing" services and development
    • addtl revenues
    • consolidation: overall reduced costs
  • liberates users from maintaining infrastructure


Who uses...?
  • app hosting
    • cloud provider w/ maintenance tasks
  • batch processing
    • large amt of data
  • temporary use x existing IT infrastructure, aka cloud bursting
    • temporary/seasonal peaks
  • user data + apps in cloud cluster
    • owned and maintained by provider
    • legal issues?


Where is...?
  • centralization of info + countless computing resources
  • location of data centers a major issue: possibility of portable dc?
    • suitable physical space (at least warehouse-sized)
    • near high-capacity Internet connections
    • lots of affordable electricity/other energy resources
    • laws of jurisdiction 
      • adjudication of cases?
      • govt intervention?
      • costs

Rules and policies
  • users expect reliable, high-speed 24/7 access
  • also secure and private connections
  • liability + intellectual property + ownership of data
  • easy transfer of data
  • for corporations: ability to be audited

Week 12: Muddiest points

1. I'm familiar with the concept of folksonomy as an active user of the photo-sharing site Flickr, but I'm wondering how extensive the use is as a supplement to the controlled vocabulary provided by other institutions, or whether adoption of what was before a folksonomic term depends on the frequency/popularity of that term.

Friday, November 21, 2014

Week 12: Web 2.0, Social Media, and Libraries

1. Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, 53(1), 59-68. doi: 10.1016/j.bushor.2009.09.003

social media popular but still unclear definition
  • difference from web 2.0 and user-generated content?
  • some cos. remain uncomfortable with "freer" customer/client interaction 
    • less "control" on part of co.
  • but s.m. <> www as platform for exchanging info
    • form +++ powerful than 1970s BBS

what it is/n't
  • 1959: Open Diary, (we)blog
  • 1979: Usenet (Duke)
  • 2000s: high-speed Internet access
    • 03: MySpace
    • 04: Facebook
  • 2004: Web 2.0 = ideological + technical foundation
    • new way that software devs + users collab in www
    • content + apps continuously modified
    •  basic fxnalities: Flash, RSS, AJAX (.js)
  • 2005: User-generated content (UGC)
    • published on publicly-accessible site or social networking site
      • excludes emails/IMs
    • creative 
      • excludes existing content
    • "amateur"
      • excludes commercial purpose
  • s.m. = Internet-based apps combining Web 2.0 + UGC
    • apps heterogeneous
    • but no systematic way s.m. apps can be categorized
    • possibly: "richness" of medium + degree of social presence

Challenges and opportunities of s.m.
  • collaborative projects
    • joint outcome may be better than individual efforts
    • wikis v. social bookmarking
  •  blogs
    • usu. by 1 indiv., but can provide forum for interaxn
    • increasingly adopted by firms
  • content communitites
    • media content between users
    • YouTube, Flickr, Slideshare
    • copyright??
  • social networking sites
    • personal info, but also brand communities
    • Facebook, MySpace
  • virtual game worlds
    • highest level of richness + social presence
    • World of Warcraft, Everquest
  • virtual social worlds
    • similar to game worlds, except no rules for possible interaxns
    • Second Life

Companies and social media
  • choose appropriate medium for purpose
  • select app or make own
  • ensure s.m. activities align w/ each other
  • also w/ firm's overall media strategy
  • access for all employees
  • stay active, interesting, humble, slightly informal, honest

2. Lankes, R. D., Silverstein, J., & Nicholson, S. (2007). Participatory networks: The library as conversation. American Library Association. Available at http://quartz.syr.edu/rdlankes/Publications/Others/ParticiaptoryNetworks.pdf

Libs in "convo business"
  • knowledge business --> "convo business"
    • ppl learn through convo
    • info lit + critical thinking
    • convo w/in individual: metacognition
  • how can web 2.0, social media further facilitate ideas traditionally provided by brick-and-mortar lib?
  • tech --> new possibilities for reaching ideals

Tech integration
  • usefulness of tech must me measured v. against lib. mission
  • social networks
  • wikis: mass decision-making
  • loosely coupled APIs (application programming interface)
    • "convo" b/w apps
    • Google Maps
  • mashups: ease of incorporation
  • permanent betas
    • Google Labs, MIT Libs
  • +++ users, improved software
  • folksonomies: UG classification

Core new tech: AJAX and Web services
  • AJAX: Asynchronous JavaScript and XML 
    • browser < data > server w/o refreshing entire page
    • open-source, light programming skills
  • Web services
    • software-software interaxns
    • e.g. ISBN no. to search multiple catalogs
    • lightweight, aggregate for +++ fxnality

Library 2.0
  • which apps for which purposes? strategies?
  • choose appropriately for user participation
  • social networking sites

Participatory librarianship in axn
  • connect w/ constituencies and other institutions
  • Worldcat
  • informalize the catalog 
    • enhance info provided
    • incorporate folksonomies
  • reference x community involvement
    • develop online knowledge base
    • offer + meeting spaces
    • + access points
    • community repositories?
  • institutional, digital repositories

3. Salomon, D. (2013). Moving on from Facebook Using Instagram to connect with undergraduates and engage in teaching and learning. College & Research Libraries News, 74(8), 408-412. Available at http://crln.acrl.org/content/74/8/408.full

Study at UCLA Powell Library
  • use of Instagram to reflect undergrad pop.
    • students doc. time in lib via app
    • even w/ low no. of followers @ beginning, + interactive than FB
  • Instagram 3rd most pop. in U.S.
    • still visual, but move away from text stimulation?
  • allow integration of lib activites and uni curriculum
  • social media: addtl factor for measuring impact on student success?
  • another way for lib to be engaged, to reject stereotypes of "stuffiness"?
 

Week 11: Muddiest points

1. The following question, I think, is beyond the scope of the class, but I will ask anyway: since I'm interested in audiovisual collections, I was wondering about the barriers not just in access and continued (or any) use, but funding and sustaining such materials. Here am I thinking about finding and then maintaining the equipment required for digitization, or even just playback.

2. Regarding institutional repositories, it seems as though it's mainly geared toward faculty, and even then, perhaps some faculty may not be interested or are aware of such a resource for their preprints, etc.. I'm not quite sure whether there is the same push for students---especially, for instance, undergraduates working on their senior theses---to deposit their work in the IR.

Friday, November 14, 2014

Week 11: Digital library and web search

1. Paepcke, A., GarcĂ­a-Molina, H., & Wesley, R. (2005). Dewey Meets Turing Librarians, Computer Scientists, and the Digital Libraries Initiative. D-Lib Magazine, 11. Retrieved from http://www.dlib.org/dlib/july05/paepcke/07paepcke.html


NSF --> Digital Libraries Initiative (1994)
  • collaboration librarians x CSists
    • research x daily affairs, aka theory x practice
    • shared values
      • need to share w/ wider community
      • linkage of reliable info not just for "info pros" but also CS
  • Google one of many results 
  • how to access, share funding?
    • misconceptions from both parties
  • "hubs" as new framework for collections online
  • connections b/w librarians <> scholarly authors


2. Lynch, C. A. (2003). Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age. Association of Research Libraries, 26. Retrieved from http://www.arl.org/storage/documents/publications/arl-br-226.pdf

Institutional repositories (2002)
  • definition
    • provides services to uni community for mgmt and dissemination of digital mats.
      • work by both fac & students
      • research & teaching
    • stewardship of such mats.
      • also data
    • supported by diff. techs.
  • ++ accountability for unis
    • ++ active role in scholarly publishing
    • forging more strategic, mutually beneficial alliances

New patterns in access/dissemination
  • decrease in online storage costs
  • standards for metadata --> interop.

MIT DSpace x HP (2003)
  • model for other reps both in the U.S. and internationally
  • open-source software
    • esp. important for institutions w/ significantly lower endowments/resources

Strategic importance
  • near-term & long-term preservation of scholarly works, esp. by faculty
  • supplementary materials
    • preprints? "first access"
  • also affiliation w/ institution
  • what is worth collecting?
  • encourage faculty to use institution resources
    • complement to disciplinary repositories

Potential dangers
  • institutional control over intell. property
  • centralization (inst.) v. decentralization (discipline/dept)
    • risk of inappropriate policy constraints?
  • too fashionable?
    • hasty implementation w/o judging merits or sustained commitment?

Networked info standards and infrastructure
  • preservable formats
  • identifiers
    • persistent and consistent reference to mats.
  • rights doc. and mgmt
    • again, metadata
    • but also controlled vocab (?) 

3. Hawking, D. (2006). How Things Work: Web Search Engines: Parts 1 and 2. IEEE Computer. Retrieved from http://web.mst.edu/~ercal/253/Papers/WebSearchEngines-1.pdf

Data processing
  • tools and interfaces have many of same data structures and algorithms in common
  • search engines can't/shouldn't index all pgs
    • b/c no. of pgs is infinite
  • more useful to
    • reject "low-value content"
    • ignore huge vols. of accessible data

Problems and techniques
  • multiple locations for data centers
    • helps tolerate redundancy and faults
    • PC types depends on factors like price, speed, memory, physical size, etc.
    • clusters can target specialized functions
      • ex. crawling, indexing, replication

Crawling algorithms
  • queue of unvisited URLs
    • started by 1 or more "seed" URLs, then HTTP request
    • huge data structure required
  •  real crawlers
    • different speeds
    • risk of server overload 
      • only 1 req/server
      • "politeness" delay b/w requests
  • excluded content
    • check site's robots.txt file 
    • to see whether parts or all of site should be crawled
  • duplicate content
    • unrecognized duplicates could be links to other duplicates
    • early detection necessary
  • continuous crawling
    • full crawls at fixed intervals might slow processing
    • instead install priority queue
  • spam rejection

Indexing algorithms
  • use inverted files for rapid indexing
  • 2 phases
    • scan text of each doc
    • inversion (?)

Real indexers
  • store addt'l info in postings
    • ex. term frequency, positions
  •  scaling up
    • doc partitioning
  • term lookup
  • compression for key structures
  • precomputing for common phrases
  • indexing anchor text w/ target & source (?)
    • useful for descriptions
  • popularity score of pages
    • derived from frequency of incoming links
    • ex. PageRank
  • query-independent score
    • internal ranking
    • ++ score, ++ retrieval probability

Query-processing algorithms
  • most common type of query
    • avg length 2.3 words
  • return docs containing all query words

 Real processors

  • simple-query processor usu. = poor results
  • increase in quality
    • scans to end and sorts lists by relevance
    • but too computationally time-consuming, expensive

 Increasing speed
  • skipping
  • early termination
    • can stop processing after short scan
  • better assignment of doc numbers (??)
  • caching


4. Shreeves S. L., Habing, T. G., Hagedorn, K.,  & Young, J. A. (2005). Current developments and future trends for the OAI Protocol for Metadata Harvesting. Library Trends, 53. Retrieved from http://hdl.handle.net/2142/1754

Open Archives Initiative Protocol for Metadata Harvesting (2001)
  • scalable solution for community metadata needs
  • implementation nonspecific
    • facilitate use in wide variety of institutions and domains
  • min. use: DC schema
    • other schemas possible
  • access to "invisible web" + aggregate sources from diff collections
  • 2 "entities" who use protocol
    • data providers, aka repositories 
    • service providers, aka harvesters
      • can build value-added services

Current trends and developments
  • user group-specific service providers
  • diff comms develop diff standards in addition to protocol
  • Open Language Archives Community
    • language resources
  • Sheet Music Consortium
    • particular problem b/c of sheet music, cover art, lyrics, etc.
    • allows users to annotate metadata
  • National Science Dig Lib
    • OAI protocol primary means
    • build + aggregate collections and services/infrastructure to support activities 
 Shortcomings of existing registries
  • usu. very sparse recs about indiv. reps
  • no search mechanism
  • ltd browsing
  • few registers have complete list of all available reps

Developing experimental OAI registry (UIUC)
  • completeness
    • inventory of existing registries
    • following and exploring links
    • search Google for OAI reps
  • discoverability
    • allow for diff views w/o any manual cataloging of OAI reps
    • automation of data harvesting and indexing
  • machine processing
    • turn registry into OAI rep

Future work
  • for better search and discovery, enhance collection-level desc
  • increase in automated maintenance of registry
  • increase in automated discovery of other registries
  • delegate creation and maintenance of virtual collections, incl. metadata
  • improve view of search results (contextualization)

ERRoL resolution (Extensible Repository Resource Locators)
  • "cool URLs" (Berners-Lee) to content and services linked to info in OAI rep
  • OAI-id for item 

Challenges
  • data provider implementations
    • many potentially useful features underutilized
  • metadata
    • ways of using encoding standards differ
    • leads to diff relevance for users
    • ++ formats, ++ complex metadata
  • lack of communication b/w service and data providers

Future directions
  • development of best practices
  • Static Repository Gateway (Los Alamos Natl Lab)
    • low technical entry barrier
  • mod_ai project
    • accessible content from Apache open-source servers
  • OAI rights
    • means of structured lang w/in protocol
  • controlled vocabs
  • gateway to ERRoL service

Week 10: Muddiest points

1. Must XML attributes and elements always be quoted? In HTML, for example, one can code the link as:

<a href = http://www.url.com/>site</a>

 or

<a href="http://www.url.com/">site</a>

2. What are some interoperability issues when using XML -- for instance, in using Unicode v. ASCII?  

Friday, November 7, 2014

Week 10: XML

1. Martin Bryan.  Introducing the Extensible Markup Language (XML): http://www.is-thought.co.uk/xmlintro.htm   
2. Extending you Markup: a XML tutorial by Andre Bergholz : http://xml.coverpages.org/BergholzTutorial.pdf 

3. XML Schema Tutorial http://www.w3schools.com/Schema/default.asp   


XML: subset of SGML (Standard Gen. Markup Lang.)
  • clearly mark boundaries of elements in DTD (Doc Type Def)
    • dec: <!DOCTYPE>
    • con: namespaces + DTD don't work well together
  • this delineation enforces strict implementation
    • ex. 1st-level heading implemented before 2nd-level, etc. 
  • extends link capabilities w/ 3 supp. lang
    • Xlink: 2 docs
    • XPointer: individual parts of XML doc
    • XPath: used by previous to describe loc paths
      • loc path: axis, node test, predicate
  • XML not designed to be standardized
    • multiple files for compound docs

XML docs: formal syntax for series of entities
  • ea. entity can contain 1+ elements
  • ea. element can contain 1+ attributes (process)
  • 3 types of markup
    • document instance (what kind)
    • optional: processing instruction (how to read)
    • optional: doc type declaration (formal markup declarations)

Use
  • markup tags (defined by trade org or other body)
    •  e.g. <to> content </to>
  • possible to define own sets
    • create DTD w/ formal id of relationships b/w elements
    • and also define attributes

Standard and non-standard text elements (??)
  • commonly used text: text entity
  • non-standard: system-dependent entities can be declared

Illustrations and other special elements
  • special notation either as entity or attribute
  • notation declaration
    • to designate action for unparsed data in ref file

 

XML schema
  • allows user to define data types
  • goal: to replace DTDs
  • 4 schema
    • DDML: doc def markup lang
    • DCD: doc content desc
    • SOX: schema for object-oriented XML
    • XML-Data (replaced by DCD)

Example

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.mypage.com/">

<xs:element name="content">
</xs:element>
</xs:schema>