Wednesday, 7 December 2011

Information Architecture

In our final DITA lecture we considered web site design; more specifically information architecture. We considered what makes a good web site by considering what end users love and hate about the web. Our lecturers listed the following:
LoveHate
Good DesignCan't Find it
AestheticsGratuitous Bells and Whistles
Big ideasInappropriate Tone
UtilityDesigner-Centredness
FindabilityUnder-Construction
PersonalisationNo attention to detail
Waiting

Much of this is common sense - for example, people will get frustrated if they cannot find what they need, or if they are overwhelmed with unnecessary graphics. I have certainly been irritated in the past by websites that make you watch (or click a button to skip) a video before going to the main site - especially if you intend to use it a lot.
We considered why it is important to organise information. For you info pros out there, the answer is incredibly obvious, but for the sake of argument let's spell it out. Not organising your information might seem quicker when you create it. A common cry where I work is 'why can't we just put things in our personal drives and be done with it?' The reason is twofold: first, when you come to look for something it will be like searching for a needle in a haystack. Second, if it's on your personal drive, we can't help you as much, because we can't see it. (I risk digressing here to the importance of a nice EDRMS in an organisation, so perhaps I will move on.) If this argument doesn't work, one can point to financial reasons - Jakob Nielsen was quoted in Rosenfeld and Morivlle's book 'Information Architecture' saying, 'low intranet usability costs the world economy $100 billion per year in lost employee productivity' (p.xii). This was back in 2002, so nearly a decade later I suspect this figure is a lot larger.

Organising information on web pages is actually very flexible. For example, it is possible (although perhaps not recommended!) to link an unlimited number of pages. In the 'real world' it is more difficult to do this; if you compare information architecture to architecture (buildings), you would not be able to link a room to an infinite number of rooms. A database is slightly different, as entity relationships tend not to link to a very large number of other entities. (This is due to the requirement for unique identifiers; other tables get added in - see the database blog for more details on this!)

Above we looked at a list where web loves and hates were identified. So how do we avoid falling into these traps? A cynical response, perhaps, is to not let the graphics design team get over excited(!) However, our lecturer pointed out that another trap people fall into is handing over the website design to the geeks, who make it over-technical. To avoid this when developing a web site, it is important to have people from lots of backgrounds on the design team, including: graphic design, technical types (for dealing with databases, programming, etc), project managers and information architects. Information architects generally have a background in information management - for librarians it can be a natural move as it is basically creating a structure, providing labels and making things organised.

When you've got your team you need to consider how you are going to achieve the web loves listed above (and avoid the web hates!). There are lots of guides on the web describing how to put together a good web site (for example, Smashing Magazine and Sharpened Productions). A common theme is to be user focused. It seems obvious, but think about who is using your website, why they are using it, and how they might use it. A general rule of thumb is to avoid making people think too much; navigation should be intuitive. This is where good architecture comes in. An information architect considers the relationships between pages, optimises access to information and of course will know all about indexing! In addition, they will be familiar with consistent labelling, for example not mixing up their nouns and verbs.

So what can we conclude? Good web design isn't just about having lots of graphics and helpful information (although this is still important!). With the expansion of the Web people are increasingly impatient and fast-moving, so good navigation systems and searchable web pages are a must. People visiting your web page shouldn't be forced to think too much, which is possible with consistent labelling. (To think about it another way, people shouldn't be scrutinising your labels too closely, they should just be using them).

Saturday, 3 December 2011

Mobile Information, the Semantic Web and the World of Open

Hello bloggees! My apologies for having fallen off the map of late, I have been very busy trying to get some reading done for my coursework. I know, excuses, excuses, but I am going to attempt to catch up on what's been happening in my DITA module in this post, which covers Web 2.0 technologies, potential Web 3.0 and related policy.

First of all, I will give you a quick overview of mobile information. Then I will touch on the semantic web, which is sometimes considered as 'Web 3.0'. I will conclude with a look at the world of open - this  covers open source systems and open data.

Mobile Information

Mobile devices offer the advantage that they are context aware, By this, I mean that they can identify where they are. This includes, but is not limited to GPS (Global Positioning System) on mobile devices. Having GPS allows your phone to offer map services and you can find companies near to where you are. Additionally, mobile devices often have a magnetic compass an accelerometer which means that the phone 'knows' which way it is facing.

Having cameras in your mobile device can also allow context awareness. Face and building recognition technology is inaccurate, but it is improving. If you combine this with GPS, you could add more detailed metadata to your photographs: in theory you could tag where, when and who you are photographing.

Bluetooth also offers context awareness on mobile devices. Bluetooth allows devices to receive information, and also to broadcast, for a distance of about 10 metres. It has been proposed as a technology for 'information fountains' - for example offering tourist information in the vicinity of attractions.

One downside to mobile devices is their size; the screens are very small (certainly considering the size of the screen on a laptop or PC). Applications are therefore stripped of some of their content so that they can be viewed on a smaller screen. Usually the same content is presented, but in a different or collapsed way. For example, if you open a Wikipedia page on a mobile device the information has been collapsed into menus so the loading time is smaller, and you do not have to scroll through a lot of information unnecessarily. Keyboards on mobile devices are also small - this can cause problems as fiddly keys lead to spelling mistakes and a reduced typing speed. This can be avoided to an extent on some devices, as there is the option to have different keypads for different applications.

The Semantic Web

The Semantic Web is one vision of what Web 3.0 might be like. While there is no fixed definition of Web 3.0 (some argue that it is merely a marketing plug) it is generally assumed that it will need to be one step further than Web 2.0. So if Web 1.0 was readable, Web 2.0 was readable and writable, Web 3.0 should be readable, writable and executable (or interact-able). The semantic web offers a way of doing this.

As we know, computers aren't that bright on their own. Unlike people, they are not intuitive and require strict instructions to execute. If you think about XML, this 'tells' the computer more information than HTML. The semantic web is a lot like this. However, it goes further than XML which only 'tells' the computer that something is a title; it also explains what a title is. An example of semantic web technologies is RDF or the Resource Description Framework.

At it's most basic, RDF creates metadata. But it also creates meta-metadata. An RDF statement is made up of a subject, an object and a predicate. The subject is the thing itself - for example it could be an image, a journal article, a video, and so on. The object is the metadata, the predicate is the relationship. For example:

Subject: this video
Object: Matt Damon
Predicate: has starring actor

Typically the subject is a URL (which makes sense as we are discussing Internet resources). Sometimes, the predicate and object are also URLs. However, these URLs need not point to anything specific - they could be URIs (Unique Resource Identifiers). Unlike a URL, URIs do not necessarily point to a tangible 'thing' on the Internet. A predicate that is a URI will describe the relationship - for instance in the example above rather than just saying 'has starring actor', it will explain to the computer unequivocally what this actually means so that it can be 'understood'.  

From this you can build a map of relationships of Web Ontology Language (OWL). This is a taxonomy with rules applied in a sort of web. This allows the computer to make links that you might not have been able to spot yourself, especially if you have a lot of information in the OWL. So using the example above perhaps you could look at all the videos that have Matt Damon as a starring actor, and you could map his career. Perhaps this is not the most useful thing you could do with an OWL but hopefully you get the idea!

The World of Open

In this lecture we considered open access publishing, open source software and open data. It was reiterated that the Internet is a disruptive technology, and has caused a change in the way people interact with information. It is unlikely that the replication and distribution of digital information would have been so widespread without the Internet.

The world of open is as much an ideological idea as a technical one. There are academics and programmers who are passionate about removing barriers to openness, including digital publishing, source code, government information, and so on.

Open access publishing is where full research articles are available to anyone for free, across the web. The impact of open access on the publishing world is debateable. There is an obvious advantage to using journals which are free for immediate access, but these journals have not been around long enough to build up a reputation. The real impact of open access publishing remains to be seen.

Open source software development looks to develop an infrastructure rather than finished products for commercial sale. The Internet has made is possible for software developers webwide to work collaboratively, uniting programmers with similar areas of interest regardless of where they are. In particular the Internet makes asynchronous working possible, as developers can communicate via email, for example.


In our lecture we discussed the motivation for Open Source Software; after all, it seems odd to be offering software structures for free! There are two types of stakeholders to consider, at micro and macro level. At micro level, developers may be building this software because it doesn't exist and they need it. There is also a financial incentive as individual developers can build a reputation and make a lot of money if their products are used widely. At a macro level, Open Source allows developing and maintenance costs to spread across several companies (this can also work with individuals).


Finally, I should touch on Open Data. The best way to look at this is with an example. The British Government have run several initiatives over the last few years, which are all driving at openness. The Freedom of Information Act (2000) requires departments to have a publication scheme, which is a list of available documents/information. Usually, departments are quick to publish released information after a request. The first initiative was Making Public Data Public, which was championed by Tim Berners-Lee and the Minister for Digital Britain, The Rt Hon Stephen Timms MP. Over time (and under various governments!) the openness agenda has been referred to by several names, and has branched out (at the moment it is unclear whether there is a huge different between Open Data and Transparency), but the underlying message is the same: public data should be made open and available for use unless there is a good reason not to release it. The idea is that the default should be openness, rather than only releasing certain data. Naturally there is still a lot of work to be done, but there is already an awful lot available here and here.


This concludes a rather brief overview of some Web 2.0 technologies and related policies. It will be interesting to see what happens next. Mobile technology is developing at a rapid pace (despite the latest iPhone exploding a few times) and it will be interesting to see what is dreamed up next. The semantic web and open data go together well, and I think they will perhaps develop together.At Cabinet Office they are already using semantic web technology on their Transparency website. 

Sunday, 13 November 2011

Web Services and APIs

Like this on Facebook. Read my Twitter feed. Have a look at the Google map below to see where we are. These are the kind of statements that web users are getting used to in a Web 2.0 environment. Usually, however, not a lot of thought goes into how these buttons and feeds are actually integrated into web pages. However, this is what we discussed this Monday in our DITA lecture.

APIs provide an interface, without any requirement for technical understanding of what goes on in the background, thereby reducing the need for technical understanding. The ideal is that people will be able to create APIs with no programming background whatsoever. Our lecturer described APIs as an underlying pillar of Web 2.0. APIs (generally it is agreed that this stands for Abstract or Application Programming Interface) are programming tools. PC Mag (no date) describes what APIs are and how they work:

A language and message format used by an application program to communicate with the operating system or some other control program such as a database management system (DBMS) or communications protocol. APIs are implemented by writing function calls in the program, which provide the linkage to the required subroutine for execution. 


A web service is a type of API. O'Reilly (2009) states that 'a web service is any piece of software that makes itself available over the Internet and uses a standardized XML messaging system'. A web service, as the name implies, is a service (software) which is available over the Internet. Web services use the same technology as web pages (the client-server relationship), and it even looks a bit like HTML because XML uses tags. However, web services are content only - unlike web pages there is no structure to an XML document. Additionally, where web pages are designed to be read by humans, web services are designed to be read by machines.

While XML means eXtensible Markup Language, it is not actually a language. Instead, it provides conventions for creating your own language with tags; it is sometimes described as a meta language. As it is not strictly a language, there is no international standard; to quote Pirates of the Caribbean, 'the code is more what you'd call "guidelines" than actual rules.'

An example of APIs

Let's say I don't understand the way that Facebook works, but I want to let people 'like' my page. The nice people at Facebook have put the code on the web so that you can do just that. I have done so, on my 'mash up' web page here. It is unnecessary to understand exactly what the code means, because it works. Although, if you are interested the coding you need is:

<div id="fb-root"></div>
<script>(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) {return;}
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1";
  fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'facebook-jssdk'));</script>

Then where on your web page you want the code to appear you need to add:

<div class="fb-like" data-href="http://www.student.city.ac.uk/~abkb846/public_html/MashUp.html" data-send="true" data-width="450" data-show-faces="true"></div>

You can put this on any web page, the only bit you need to change is the URL, which I have made bold above. The script above is a bit confusing for humans to read, but easy for machines to understand. It is 'asking' Facebook to run its 'Like' process on the web page, without the owner of the web page needing to understand how Facebook does this in the first place. The Facebook API is the interface which allows applications to communicate with each other (usually without the awareness of the user at all).

Conclusion


To sum up, the use of APIs could potentially open up programming to users, as the need for coding knowledge is reduced. APIs are incredibly useful for social media as they are great for communication. They open up communication between different programs and allow people to improve functionality of web sites. 



O'Reilly Media (2009) Top Ten FAQs for Web Services. Available at http://www.oreillynet.com/lpt/a/webservices/2002/02/12/webservicefaqs.html [Accessed 7 November 2011]
PC Mag (no date) Definition of API. Available at http://www.pcmag.com/encyclopedia_term/0,2542,t=application+programming+interface&i=37856,00.asp#fbid=9qFJ8KQn0H3 [Accessed 7 November 2011]

Wednesday, 2 November 2011

Some thoughts on Knowledge

There has been a bit of debate about what ‘knowledge management’ actually is. Some argue that it is the same thing as information management, whereas others see it as something different. The usual distinction is that knowledge is in people’s heads, whereas information is recorded.

TS Eliot

Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
(Choruses from The Rock, 1934)

The 'Eliot' model separates knowledge from information thus:
In our lecture, we discussed as a group what the pyramid could represent. There was a general consensus that, in this model, wisdom comes from knowledge, which in turn comes from information, which comes from data. The best way to represent this is to use an example.
120
This is data. On its own, how useful is it?
It is 120 miles to Birmingham from London.
This is information, which is arguably more useful than the data.
It is 120 miles to Birmingham, and the quickest route by car is to take the M40.
This is knowledge, which builds upon the information.
It is 120 miles to Birmingham, and the quickest route by car is to take the M40. However, if you are travelling during a busy period you might do better taking the train!
This is wisdom, which builds further on the knowledge.

So how does one acquire knowledge and wisdom with this model? Can it be shared? There was some debate about this. We eventually decided that wisdom seems to come with experience, and we couldn’t easily establish whether it could be shared – can someone else’s experiences become our wisdom, or is it then only knowledge or information?  This theory states that you can share knowledge – people can write it down, for example. There are those who do not subscribe to this theory, and argue that tacit knowledge cannot be passed on...

Karl Popper

Here we are going to dip a toe into the rather big pool of philosophy, and look at Popper’s worlds as states of information. I will try to simplify this if I can. He argued the case for three worlds:
  1.         ‘Real’ physical World
  2.         Inner World, our subjective, personal view
  3.         Communicable Information between the two, or the sum total of worlds 1 and 2
He argues that knowledge is in world 2, as it is tied up with one's own experiences, and is mental not physical. It is impossible to share actual knowledge, because it is unique to a person in world 2. Instead, it must be converted into information to be passed on. It can then be ‘converted’ back into knowledge by someone else, but their knowledge will be different from the original knowledge. This has also been expressed by Michael Polanyi, who said that we know more than we can tell. Another way of thinking about this is to try to explain to someone how you brush your teeth. It sounds easy, but strictly speaking you need to tell them how to hold the brush, how to brush and even when to breathe.

This has been a very brief thought, and I’m still not decided on knowledge management. It strikes me as a very difficult area, as knowledge is not a tangible thing – it’s difficult to measure, to value or even to prove whether it’s there. For now, I’ll finish with a video which I think demonstrates the difficulties involved in knowledge sharing. It’s a bit of fun really – 'expert' gamers trying to teach non-gamers how to play Battlefield. Be warned – it does contain some bad language and violence (as well as heavy product placement!).


Sunday, 30 October 2011

The World Wide Web and HTML


My blog, Information Overload!, can be accessed at http://kaysafus.blogspot.com/


In this blog I will discuss the World Wide Web (WWW, or Web) and Hypertext Markup Language (HTML) as a technology for the digital representation of information, in relation to Web 1.0. I will examine the technical details with a short background history, before considering the opportunities and limitations of using the Web and HTML.

The Web is not the same thing as the Internet. The Internet is the channel through which the Web can function. Email also uses the Internet as a channel. The Web was designed to link information stored on computers. Tim Berners-Lee, its inventor, worked at CERN and originally designed it to enable him to remember connections between people, computers and projects. His vision was to have a ‘single, global information space’ (Berners-Lee, 1999, p. 5). Specifically, his solution was to use hypertext to allow users to move between documents. In his proposal, Berners-Lee offered the solution of a ‘universal linked information system’ (Berners-Lee, 1990). He developed HTML as the primary language for web documents, allowing users to instruct browsers how to display content on a webpage using tags (Chowdhury, 2008).

HTML works by using tags to inform the Web browser how information is to be portrayed on the page. A tag is enclosed in triangular brackets, for example <HTML> informs the browser that this is where the HTML begins, and </HTML> informs it that the HTML ends. I have created a website which demonstrates some of the different tags available.

The term Web 1.0 describes the Web in its early stages, when it was primarily designed to display and share information, rather than allowing users to add their own content, for example by contributing to wikis. Early webpages tended to be rather plain, and merely displayed information and provided hyperlinks to relevant documents elsewhere on the Web. Over time, presentational tags were developed to improve the appearance of webpages, which I will discuss later.

The Web and HTML provide a number of opportunities for the representation of information. It was designed to end incompatibility between different computers (Berners-Lee, 1999). Webpages can be viewed on different machines, presenting information consistently, and allowing users to view and navigate between information in the same way. This is still important today with a huge range of devices and programs used to access the Web. The Web and HTML have been applied to share information globally effectively in all manner of fields, including Information Science. Most webpages use navigation with hyperlinks to assist users, such as guidance on using the catalogue and on Information Management policies on the National Archives webpages here.

Using HTML is fairly simple; therefore it is popular (Chowdhury, 2008). Unlike other computer languages, HTML is in plain English and does not require users to have an in-depth knowledge of programming. Unlike more complex technologies for representing information, such as using SQL to create databases, HTML offers users an easy method to display and share digital information.

Another reason that HTML can be very useful for information professionals is that it allows metadata tags (or meta tags) to improve the experience for a user without them even being aware of it. For example, adding keywords to a webpage describes the page’s content. Using meta tags effectively can offer powerful navigation and improve information retrieval (Rosenfeld & Morville, 2002).

However, there are also limitations to the Web and HTML. In Web 1.0, HTML was not a multi-lingual language. Many languages use scripts which cannot be represented in standard HTML, because it is ‘based on a very limited coded character set.’ (Yergeau, no date) So for example, the Omniglot webpage uses images to demonstrate Sanskrit writing, which means that users cannot copy and paste characters, or interact with them as easily as with Latin languages.

Additionally, while HTML was originally compiled of elements for describing the structure of information, such as paragraphs, hyperlinks and headings, it evolved to include presentational tags, including fonts, colours and tables. This has been attributed to the introduction of Mosaic (Meyer, 2000). Presentational tags made HTML coding confusing and untidy, and caused interoperability problems (Chowdhury, 2008). This disadvantage of HTML has been recognised, and the World Wide Web Consortium (W3C) introduced a recommendation called CSS. CSS stands for Cascading Style Sheets, and offers rich styling of web documents (Meyer, 2000). This removes the need for tags like <B> (for making text bold). The use of CSS created another opportunity, as HTML can return to its original purpose, which was as a structural language, and the presentational aspect of webpages can be addressed by using CSS. This is useful for information professionals, as it helps to separate out the design aspect. For example, it adds stronger argument that searching is an information specialist area rather than an IT or web design area, as the emphasis is on the structure and content of pages.

I have concluded that HTML and the Web are appropriate technologies for the digital representation of information, in relation to Web 1.0. With my webpage, I demonstrated how HTML tags and hyperlinks can be applied technically for the management of digital information. Today the Information Profession exploits HTML and the Web extensively. In particular, the Web is used for information sharing, for structuring information and for ease of navigation, as demonstrated on The National Archives webpages. Meta tagging is also incredibly useful to information professionals, notably for powerful searching and retrieval functions. HTML is an effective tool to provide access to information because it permits information to be displayed on a range of devices. Furthermore, HTML is relatively easy to learn as it uses plain English and does not require a thorough knowledge of programming, therefore allowing more people to use it.

I investigated the constraints of HTML and the Web as Web 1.0 technologies. I recognised the issue with displaying non-Latin languages as script. This was a problem for basic Web 1.0 webpages; it seems that these could only be represented on a webpage with images, which reduced options for manipulation of information. I highlighted the limitations of tagging, particularly how HTML coding became cluttered with presentational tags. I concluded that this problem became less significant with the introduction of Cascading Styling Sheets, which allowed programmers to separate structural and presentational code.


Berners-Lee, T. (1990) Information Management: A Proposal [online] Available at http://www.w3.org/History/1989/proposal.html [Accessed 22 October 2011]
Berners-Lee, T. (1999) Weaving the Web: the origins and future of the World Wide Web, London: Orion Business
Chowdhury, G. G. and Chowdhury, S. (2008) Organising Information: From the Shelf to the Web, London: Facet Publishing
Meyer, E.A. (2000) Cascading Style Sheets: The Definitive Guide, Sebastopol: O’Reilly Media Inc.
Omniglot (no date) Sanskrit [online] Available at http://www.omniglot.com/writing/sanskrit.htm [Accessed 22 October 2011]
Rosenfeld, L. and Morville, P. (2002) Information Architecture for the World Wide Web, Second Edition. Sebastopol: O’Reilly Media Inc.
The National Archives (no date) Catalogue [online] Available at http://www.nationalarchives.gov.uk/catalogue/default.asp?j=1 [Accessed 22 October 2011]
Yergeau, F. (no date)  A world-wide World Wide Web. [online] Available at http://www.w3.org/International/francois.yergeau.html [Accessed 22 October 2011]

Thursday, 27 October 2011

Web 2.0: an introduction

In this week's lecture we discussed Web 2.0. Strictly speaking, there isn't a hard and fast definition of Web 2.0 (which is awkward) but there seems to be a general feeling that it involves social media, and that it involves people interacting with each other. Tim O'Reilly coined the term for a conference back in the mid-2000s, so it seems sensible to go with his view on the subject. His blog goes into a lot of detail (see it here) on the subject, but to summarise, he says that:

  1. The Web is a platform (as opposed to their individual PC, people go online)
  2. It harnesses collective intelligence (through blogs, social networking, reviews, and so on)
  3. There is a focus on Data (hopefully that is fairly self-explanatory)
  4. It is an end to the software release cycle (Software is a service not a product and users should be treated as co-developers)
  5. It uses Lightweight Programming Models (keep it simple!)
  6. Software is above the level of a single device (you aren't limited to a PC)
  7. Rich user experiences (hopefully this one is self-explanatory too)

His blog goes into a lot of detail, and is an interesting read. I would certainly recommend it if you would like more depth!

There is some controversy (for want of a better word) around using the term Web 2.0, as Tim Berners-Lee argues that this was always the intention of the World Wide Web. The move to a focus on a network of people rather than of computers was always the grand plan. I can see his point, as he originally wanted to join up information and people with the invention of the Web. The difference now, perhaps, is that it has become so cheap to use the technologies (which were always available) that the Web has been opened up to much more people. High-speed Internet must have helped as well - I remember the days of signing in with a modem and being limited to one hour on the Internet because my Mum wanted to use the telephone. I think I would struggle now to limit my Internet usage like that, as I grab my laptop sporadically to look things up and check Facebook.

It's pretty amazing I've got this far without talking about Facebook! It seems to be commonly associated with Web 2.0, and I suspect everyone has at least heard of Facebook even if they don’t have an account. On Monday, I found it interesting that out of a whole lecture room, only 2 people (3 including our self-styled dissident lecturer, Andy) did not have a Facebook account. We talked about why people choose to have, and not to have, Facebook accounts. The crux of it seemed to be the privacy issue. We did touch on Google+, but for all the excitement about how their privacy settings are allegedly better, only a handful of people actually had accounts. In a way it's difficult to understand - if it does all the same stuff but they care more about your privacy, why doesn't everybody move? That's a big question for a little blog, so I'm not going to go into that here!

My notes for the session describe Facebook as a ‘standard bearer for social networking’. Perhaps that is a bit strong, but then again I did just say that nearly everyone has heard of Facebook and a lot of people have accounts… I think the jury is still out on such a bold statement! But let’s have a quick look at how it works (just in case you have been living on the moon and haven’t heard of it). It’s based on mutual agreement of people to be in contact with each other, so unlike blogs and web pages (in general) you agree with other people that you will be connected as ‘friends’ and have access to each other’s profiles. This is of course generalised, because you can play around with settings to change who can see what on your profile, but for the most part you need to agree to be connected to interact with each other. We discussed Facebook in a group in the lab session (which this week was more of a discussion session) and decided that Facebook was a virtual enhancement of your real life interactions. So you can comment on your friends’ photos, like their comments and chat with them on their wall, via private messaging, or via instant messaging. The other thing about Facebook is that it encourages you to use your real name. Some sites do not (think of Twitter, for instance, where your name tends not to be your name exactly – mine is @kaysafus for example).

In our discussion session, our group were tasked with comparing Facebook to Friends Reunited. The first thing we established was that everyone had heard of Facebook, but not many of us remembered Friends Reunited. None of us had actually used it, and we had to ask our lecturer to show us his profile to try and get an idea of what it was like. This made our discussion a little tricky at first – we were trying to establish where Friends Reunited had got it wrong, and it wasn’t immediately obvious because they seemed to do things in a similar way to Facebook. When we really started analysing it though, we realised that all of us had envisaged Friends Reunited as a site for slightly older people, as the name suggests you are looking for people you are no longer in touch with. Many of us were still at University (or not long out of it) when we joined Facebook, and we saw it as a way of keeping in touch with people we knew at the time, both at home and at University.

Another thing we liked about Facebook was that we never saw any advertising for it. Every single person in the room who had a Facebook account had been invited by a friend. With Friends Reunited, we got the impression that you had to set up a profile in order to be found – if someone is adding you as a friend then surely that defeats the object of Friends Reunited, as you do not need reuniting.

Naturally Web 2.0 is not limited to Facebook. Other examples include Twitter (a sort of mini interactive blog), Wikipedia (an online encyclopaedia where users can contribute) and blogs. I don’t think I really have space in this blog to explore all of them in depth, but it is worth noting that, as with Facebook, collaboration is the key – everyone can get involved. In particular with reference to Wikipedia, you don’t need to have any specific qualifications, or be a subject expert to contribute. Naturally this is a double-edged sword as you can get a rich information source, but it could contain errors. A fellow student admitted to making some erroneous changes to a page, which has not yet been recognised.

How this will all play out will be interesting. There are those (my lecturers included) who rather darkly suggest there may be another bubble-burst, as with the dot com bubble in the 1990s. Personally I’d prefer to be more optimistic – perhaps Web 2.0 will naturally evolve into whatever comes next, rather than collapsing. Aspects of Web 2.0 have become such a part of life for many users that it is difficult imagine not having them – people use Facebook, Twitter, YouTube, Wikipedia and other Web 2.0 sites frequently, and it had changed the way people behave online.

Something I find quite interesting is company reactions to Web 2.0, and their policies (or lack thereof). Perhaps it would be interesting to investigate how different types of workplaces approach their staff’s access to Web 2.0. Many businesses seem to be keen to have a web presence on these sites – lots of them have Twitter feeds, Facebook pages and so on, but I wonder what they think about staff using these sites at work. Above I mentioned that people’s online behaviour has changed, perhaps their behaviour at work has changed too – do people expect to be able to access these sites during work hours? In my current workplace we have a Twitter feed (which is Comms led, not KIM led) but no clear policies about whether staff can access their own Twitter accounts during work hours on work PCs. My company also has a YouTube account and Flickr account, for which there are no clear policies either. Then of course there is the issue of staff accessing these sites on their mobile phones (either via company wifi or not) which it would be difficult to have any control over. One company I used to work for blocked Internet access almost completely, and forbade staff from using mobile phones during office hours. Needless to say it was not a fun place to work. I would be interested to know whether this is still the case, but unfortunately I no longer keep in touch with any colleagues from that company.


I have offered a brief introduction to Web 2.0, with an emphasis that Web 2.0 is about networks of people, and interaction between them. I looked at Facebook as an example of a Web 2.0 site. Everyone has heard of it, lots of people use it, and to be honest it's a bit addictive (you won't be surprised, I'm sure, to know I have it open in another tab whilst writing this). I hinted that it has changed the way we behave online, which I hope to explore in another blog. If you would be particularly interested to read about it, leave a comment or send me a message and I'll try and put something together. So what comes next? As I said, I hope there won't be another bubble-burst, rather I would like to think we can build on Web 2.0 and make it even better. After all, if Web 2.0 is based on a network of people, perhaps we have a say in its future.

Information Retrieval: a very quick introduction

Information Retrieval, or IR is to do with information seeking behaviour, or searching for something relevant to the task at hand. In IR terms, relevance is defined as fulfilling the user's information needs (this is not so in philosophy, which I hope to blog about soon if I get time!). The difference between IR and querying a database, is that IR returns results ranked based on the probability it matches to your search, whereas a database will return an exact match (or nothing). Lancaster (1968) says that an information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of  documents relating to his request. (Taken from article on Information Retrieval here). It has also been defined as ' finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers)' (see more here).


There are lots of different ways of defining IR, but there are 3 formal definitions based on the perspective of the process. First, there is the user view, which is the user's 'anomalous state of knowledge' or ASK. This means they have knowledge gaps and are seeking information. Second is the systems view, which is the hardware and software for IR. Finally there is a sources view, which is the presentation of the information (usually not your information) using technology.


Searching for information happens in lots of different ways. Searching for a website, is not the same as searching for a new flat, is not the same as searching for some information about goldfish. Or so says Broder, who argues that these queries are split up into Navigational queries, Transactional queries and Informational queries respectively.


We experimented with this in the lab, trying out different types of queries using Google and Bing. We experimented with different searches, using Boolean logic - terms like AND, OR and NOT can make a huge difference to your search. The same is true of using quotation marks to search for an exact phrase. It is suggested that many users will try to formulate this in natural language, although personally I wouldn't state that categorically without doing a bit more research!


Sometimes your first search does't give you results that you deem relevant. This is when it becomes necessary to modify your query. This can be adding in extra words, removing or changing operators or trying synonyms to see if you get different results. You can evaluate the effectiveness of a search qualitatively or quantitatively. Qualitative analysis is from a user perspective - does it satisfy their information need? To establish this is quite time consuming, as you need to do consumer surveys and questionnaires to find this out. Quantitative experiments test the speed of retrieval (efficiency) and how many relevant documents were retrieved (effectiveness). 


There are two ways to measure effectiveness - precision and recall. Precision is the proportion of documents which are relevant. For example if you return 5 results and 3 are relevant, you have 60% relevance. The formula is relevant documents received / total documents retrieved. For recall, the formula is relevant documents retrieved / total number of relevant documents in database. This is more difficult to to calculate,but is useful to know as it tells you how good the system is at detecting relevant documents in the index.


The Information Retrieval Nirvana would be 100% recall with 100% precision. Unfortunately, this tends not to happen! Instead, there is an inverse relationship between the two. If a user searches for something very specific, they are likely to get a good level of precision but they might not get such good recall because they are not capturing everything they want. Likewise, if a user casts their net very widely they will retrieve a lot of relevant information, but they might also get a lot that they are not interested in, so precision is low.


This has been a bit of a crash course in Information Retrieval (I mean, I didn't even mention indexing), but hopefully readers are inspired to go and find out more. If you have a knowledge gap, this would be a great opportunity to practice IR as a user. Do a Google search (or Bing, or Yahoo, or whatever) and play around with Boolean operators. I should be covering IR in a lot more detail next semester, so keep an eye out for another, more detailed, blog then!

Thursday, 13 October 2011

Databases and SQL

Monday's lecture was about databases, in particular looking at SQL (pronounced either as the letters, or as 'sequel' depending on how you feel about it). We had an introduction to databases to bring everybody up to speed - I was surprised how much I remembered from GCSE IT. My teacher would have been proud. We discussed when a database is a good idea, and when it's not such a good idea. A rule of thumb seems to be that it's a good plan when it's your data as opposed to someone else's. For that it might be a better idea to think about information retrieval (that's next week's lecture!).

It is a good idea to have a centrally managed and designed database. Problems of the past still haunt our offices today - it's now so easy to make a database a lot of people still use local ones rather than having a nicely linked up central database. This causes all manner of problems due to inaccurate data (not to mention legal issues like Data Protection!). Instead, organisations should have their data stored in one place, with each department accessing only the data that is relevant to them. The Database Management System (DBMS) controls access to the data by various departments / people.

So to be clear, the DBMS is the software which lets the users get at the data, whereas a database, to quote my course handout, is an 'integrated collection of data shareable between users and application systems'. So it's the stuff behind the scenes rather than the shiny interface the users get to see. Using a DBMS is advantageous as it assures information integrity (insofar as it gets updated everywhere when it's updated once). It also removes the reliance on a certain program as it is a collection of data which can be used by more than one application.

So what should it look like? Perhaps the way to explain this best is to say what you don't want. You don't want duplication within a table, and the best way to make sure this doesn't happen is to look at relationships and break up your data into different tables. I'll give you an example. Let's say we have some form of music database, with song titles, album names, artists and release date. So you'd have a table that looks like this:


Song_Title Album_Name Artist Release_Year Genre
Don't Marry Her Blue is the Colour The Beautiful South 1996 Pop
Perfect 10 Quench Beautiful South 1998 Pop
Be Here to Love Me Feels Like Home Norah Jones 2004 Chill Out
Sunrise Feels Like Home Jones, Norah 2004 Chillout
Never Gonna Give You Up Greatest Hits Risk Astley 2002 80s

There's a few things to notice about this table. Firstly, did you notice how I put underscores in the titles? This is because it's generally not a good idea to put space in your titles - lots of programs don't like it. Just because Microsoft Access lets you do it, it doesn't mean it's a good idea! So use underscores, make it all one word, remember the user doesn't see this stuff.

This is probably a good time to mention primary keys. Did you notice that there isn't any one field that you could use to uniquely identify a song? Every table needs one of these, and the good news is most DBMSs can create these for you. Primary keys are important when it comes to making separate tables, which I will come to in a minute!

So back to the music, as it were.  There are lots of duplications, for example the album name, the artist, the release date and the genre! This can be fixed by dividing the data up into the following tables:


Did you also notice the errors? The Beautiful South perform two of these songs but the band name is different in two rows. This is why having duplication in one table can be such a bad idea, and this is why we need more than one table.

Song

Song_ID Song_Title Album_ID
1 Don't Marry Her 1
2 Perfect 10 2
3 Be Here to Love Me 3
4 Sunrise 3
5 Never Gonna Give You Up 4

Album


Album_ID Album_Name Artist_IDRelease_Year
1 Blue is the Colour 1 1996
2 Quench 1 1998
3 Feels Like Home 2 2004
4 Greatest Hits 4 2002
5 Nobody's Prefect 5 2001

Artist

Artist_ID Artist_Name Genre_ID
1 The Beautiful South 1
2 Norah Jones 3
3 Jimmy Eat World2
4 Rick Astley5
5 The Cheapskates 4

Genre

Genre_IDGenre
1Pop
2Rock
3Chill Out
4Ska
580s

This removes the duplication issue. Each table has a primary key, or unique identifier. In some of the tables, a primary key from a different table is in a column to match tables together. For example, in the table 'Artist' there is a column for 'Genre_ID'. When an identifier is in another table, it is known as a foreign key.

Now to look at some SQL and query our database! This is the most important bit. When searching you want to get out helpful data, and that means getting what you want and not getting what you don't want. It sounds obvious, but people often sift through lots of rubbish. Think about searching Google - it's a great tool but if you don't put in a sensible search you can get a lot of irrelevant results. If you don't believe me, try searching 'tablet' - you want to see what you could get instead of an iPad, but you end up with lots of things you didn't want, don't you?

Anyway, before I get sidetracked and start thinking about iPads, let's get back to SQL. To search, the basic syntax you need looks like this:

SELECT fields
FROM tables;

So if we were searching the database above for an album and its release date, we'd need:

SELECT Album_Name, Release_Year
FROM Album;

Which would give you:


Album_NameRelease_Year
Blue is the Colour1996
Quench1998
Feels Like Home2004
Greatest Hits2002
Nobody's Prefect2001

When using select, there is no limit to the number of fields you can put in there. However, if you want everything then you can use an asterisk:

SELECT *
FROM Artist;

Which would give you:


Artist_IDArtist_NameGenre_ID
1The Beautiful South1
2Norah Jones3
3Jimmy Eat World2
4Rick Astley5
5The Cheapskates4



This is good if you want to see everything, but realistically an end user doesn't want to see the foreign keys in this table, they want the data, and this is where you need to start making your queries a bit more interesting. It isn't as obvious as it first looks, which can cause people some problems. If you wanted the genre of all the albums, you might this of tying in:

SELECT Artist_Name, Genre,
FROM Artist, Genre;

It seems logical, but this would actually give you a table with everything in it - every artist next to every genre. This is because computers aren't smart enough to know that you actually want the relevant Genre. It just pulls everything out of the database and dumps it in front of you. Instead, you need to tell it what you want by typing in this:


SELECT Artist_Name, Genre,
FROM Artist, Genre
WHERE Artist.Genre_ID = Genre.Genre_ID;

This will bring back a list where the Genre ID matches on both tables, and you get your answer. There's some new syntax in this code. There is a comparison operator '=' in the WHERE clause. There are a few of these, including (but not limited to)

= equal to
< less than
> greater than
<= less than or equal to
>= greater than or equal to
!= not equal to

The important thing to know about using the WHERE clause is that numbers do not require quotation marks, but any other characters do. So if you are looking for a date you would only type in:

WHERE Release_Year = 1998

If you are looking for text then you need quotation marks:

WHERE Song_Title = 'Sunrise'

Another thing to be aware of when using the WHERE clause is a wildcard search. This means you can truncate a word, so if you know that 'Nobody's Prefect' is in there somewhere, but you aren't sure if the database uses the apostrophe or not, you could search for this:

WHERE Album_Name = 'Nobody%'

This brings you everything which has Nobody at the beginning. You can use the % at the beginning and end if you like, so for song titles with the letter P in them you could search for '%p%' (although I'm not sure why you would need to!).

Finally, you can use WHERE more than once, for example you want songs from a certain artist from a certain year. You want to know if there is anything in the database by The Beautiful South after 1997. The syntax would be:

SELECT Artist_Name, Album_Name, Release_Year
FROM Album, Artist
WHERE Artist.Artist_ID = Album.Artist_ID
AND Artist_Name = 'The Beautiful South'
AND Release_Year >= 1997;

Did you notice how some of the fields say the table they are in and some don't? This is when the same field is in more than one table (remember foreign keys?) You need to define where you want the data from otherwise you will confuse the computer and it will probably tell you it can't do it.

So to conclude querying two tables, the syntax generally looks something like:

SELECT (whatever fields you want)
FROM tableA, tableB
WHERE tableA.primarykey = tableB.foreignkey;

If you forget to do this then you will probably get a lot more results than you had counted on, so be careful!

A final note on results; if you get results where the same thing appears on more than one row (this is possible even with neat data tables) then you can make sure it only appears once. For example, if you wanted to know which albums were released before 2000, you would get The Beautiful South twice, as both their albums were release before 2000. To prevent this from happening, you can add in 'distinct', which means it will only list it once, whether the results bring them back twice, or two thousand times! So the syntax would be:

SELECT DISTINCT Artist_Name
FROM Album, Artist
WHERE Artist.Artist_ID = Album.Artist_ID
AND Release_Year < 2000;

Apologies for the length of this post - it's all those tables! I have tried to present this clearly, but the best way to understand using SQL is to have a go. When doing this in the lab myself I got very cross about adding in commas - they only go in between items not at the end (so above you would not put FROM Album, Artist, because there is a comma at the end and the computer doesn't like it).

Monday, 10 October 2011

Cascading Style Sheets

Cascading style sheets, or CSS, can format document content using markup languages, such as html script (see my previous post for a brief introduction to html). The idea of CSS is to separate the document content and document format. This is advantageous as it means that the coding for the design doesn’t have to be mixed up with the content of a web page. It is also useful as you can standardise a selection of web pages by using the same style sheet, or even apply your own formatting to existing web pages. This would be useful should you consistently need text to be larger, for example.

The W3 web page has a helpful introduction to CSS here. It explains the three different ways of adding style information to documents:
  • Linking Style Sheet
  • Internal Style Sheet
  • Inline Style Sheet1
The differences between these three ways of applying style sheets is where the coding is ‘kept’. Internal and Inline style sheets are both within the same page – Internal is within the head of the document using the tag <style>; Inline is directly within the html document. Linking Style Sheets are completely separate documents, which would make them the best candidates for standardising web pages (as mentioned above). This means you can edit the formatting across a number of documents, so if you were in a corporate environment and your logo changed you could apply it universally rather than page by page. This has been described as 'brand cohesiveness' (Nielsen, 1997)2.

W3 has some guidelines on using CSS, including a list of terminology. The recommendations define that CSS1 is 'human readable and writable, and expresses style in common desktop publishing terminology'3. For example, if you wanted to apply a certain colour to a title or paragraph, you would use the property 'color' (note the American spelling!). This is easy for a human to understand. The value would be written in hexadecimal (perhaps less easy for a human to understand) which looks something like this: #b22222 (this is 'frebrick' red). For a full list of colours on the w3 website, click here.

To explain what I mean by 'property' and 'value' I have borrowed a helpful diagram from the W3 wiki:

CSS Syntax

The selector is the thing you are styling, for example p would be a paragraph.
The property name/property is the format item, so 'color' would be colour, 'font-size' would be font size, etc.
The value is what you are assigning to the property. For font size it is recommended that you use a percentage. This means that it will take the font size that a user normally uses, and make it larger/smaller, for example font-size:80%

So an example would be that you would like the body of the text to be in italic. That would look like this:

BODY {font-style: italic}

The best way really to demonstrate all of this would be with an actual style sheet. I have managed to get a web site working on the City server. I have applied a style sheet which I adapted from a guide sheet we were provided with in the lab session. My website is here. The CSS for the page is here. The bit of code that tells my web page to use the CSS is in the <head> and it looks like this:

<LINK REL="stylesheet" HREF="http://www.student.city.ac.uk/~abkb846/public_html/kaystyle.css" TYPE="text/css">

Using CSS certainly seems useful, as you can apply formatting rules quit easily without getting mixed up in the main information. Blogspot, for examples, does not seem to do it this way - if you change a font it makes the html really messy (if you don't believe me have a look at the source code. All I did was change the font!) It's certainly something I will be trying to learn more about and have a go at!

1What is CSS? CSS Training, W3 Wiki [online] accessed 9 October 2011
2Nielsen, J. Effective Use of Style Sheets, Use It [online] accessed 9 October 2011
3Cascading Style Sheets, level 1, W3C Recommendation 17 Dec 1996, revised 11 Apr 2008 [online] accessed 9 October 2011

Tuesday, 4 October 2011

DITA week 2: The Internet, html and the WWW

Hello bloggees! I have no idea if that is an actual term, but I have decided to use it anyway. If you have a better suggestion as to how I should address you, please comment below as I am still learning netiquette. As we all are, as it is ever developing. Maybe they should start a finishing school for the net...


I digress. So, to get back on topic: today's DITA lecture was about the Internet and the World Wide Web, with a focus on html in the lab. Please note, that while many people use 'the Internet' and 'the World Wide Web' as interchangeable terms, I will not be doing so. I will explain why below. I will then go on to explain what we did in the lab, including some examples, which I hope will be suitably snazzy.


Anyone who has studied a whirlwind history of information or computer science will know the origins on the Internet, and of the World Wide Web. However, I feel it would be beneficial to assume that my readers are new to the topic. So if you don't want to read about the history  and background then you can skip ahead to the html bit here.


The Internet dates back to the 1960s, where it was developed as a way of sharing information via a network of networks. According to ISOC, in August 1962 J.C.R. Licklider 'envisioned a globally interconnected set of computers through which everyone could quickly access data', his "Galactic Network" concept.1 He worked at DARPA (Defense Advanced Research Projects Agency) at the time, so the Internet actually had military origins. He was inspired by 'Project Lincoln', an early warning network to guard against Soviet nuclear bomb attack in the 1950s. The project eventually resulted in SAGE, Semi-Automatic Ground Environment, which included 'the world’s first long-distance network, which allowed the computers to transfer data'.2 I won't go into huge detail here because I appreciate that this blog will get very long (I could probably dedicate an entire blog to the history of the Internet!), but if you are interested the DARPA site is interesting and I have put a link in my references at the bottom of the blog. So, to cut a long story short, a networking project was launched, the concept of sending packets was dreamed up, an architectural description of a network of networks was put together in the 1970s by Kahn and Cerf, people started picking up on it, and it was deemed a beautiful thing.


The World Wide Web, by contrast, has 'only' been around since the 1990s (although Tim Berners-Lee started working on the idea in the 1980s). In his book, Weaving the Web, Berners-Lee describes his vision of the Web as 'a universal, all-encompassing space'.3 It was originally designed as a directory and method for information sharing at CERN. It was soon picked up by academics, and then spread until it was in common usage, as it is today.


To make it really clear I will borrow an analogy our lecturer shared with us. The Internet is the road, and the World Wide Web is like a car 'travelling' on it. There are other 'vehicles', the most common example being email.


1Leiner, B.M., Cerf, V.G., Clark, D.D., Kahn, R.E., Kleinrock, L., Lynch, D.C., Postel, J., Roberts, L.G., Wolff, S. A Brief History of the Internet. [online] Accessed 4 October 2011

2Waldrop, M. DARPA and the Internet Revolution. [online] Accessed 4 October 2011

3Berners-Lee, T. (1999) Weaving the Web : the origins and future of the World Wide Web. London: Orion Business, p. 5