Information Overload!: October 2011

Sunday, 30 October 2011

The World Wide Web and HTML

My blog, Information Overload!, can be accessed at http://kaysafus.blogspot.com/

My webpage, Using HTML, can be accessed at http://www.student.city.ac.uk/~abkb846/public_html/HTMLindex.html

In this blog I will discuss the World Wide Web (WWW, or Web) and Hypertext Markup Language (HTML) as a technology for the digital representation of information, in relation to Web 1.0. I will examine the technical details with a short background history, before considering the opportunities and limitations of using the Web and HTML.

The Web is not the same thing as the Internet. The Internet is the channel through which the Web can function. Email also uses the Internet as a channel. The Web was designed to link information stored on computers. Tim Berners-Lee, its inventor, worked at CERN and originally designed it to enable him to remember connections between people, computers and projects. His vision was to have a ‘single, global information space’ (Berners-Lee, 1999, p. 5). Specifically, his solution was to use hypertext to allow users to move between documents. In his proposal, Berners-Lee offered the solution of a ‘universal linked information system’ (Berners-Lee, 1990). He developed HTML as the primary language for web documents, allowing users to instruct browsers how to display content on a webpage using tags (Chowdhury, 2008).

HTML works by using tags to inform the Web browser how information is to be portrayed on the page. A tag is enclosed in triangular brackets, for example <HTML> informs the browser that this is where the HTML begins, and </HTML> informs it that the HTML ends. I have created a website which demonstrates some of the different tags available.

The term Web 1.0 describes the Web in its early stages, when it was primarily designed to display and share information, rather than allowing users to add their own content, for example by contributing to wikis. Early webpages tended to be rather plain, and merely displayed information and provided hyperlinks to relevant documents elsewhere on the Web. Over time, presentational tags were developed to improve the appearance of webpages, which I will discuss later.

The Web and HTML provide a number of opportunities for the representation of information. It was designed to end incompatibility between different computers (Berners-Lee, 1999). Webpages can be viewed on different machines, presenting information consistently, and allowing users to view and navigate between information in the same way. This is still important today with a huge range of devices and programs used to access the Web. The Web and HTML have been applied to share information globally effectively in all manner of fields, including Information Science. Most webpages use navigation with hyperlinks to assist users, such as guidance on using the catalogue and on Information Management policies on the National Archives webpages here.

Using HTML is fairly simple; therefore it is popular (Chowdhury, 2008). Unlike other computer languages, HTML is in plain English and does not require users to have an in-depth knowledge of programming. Unlike more complex technologies for representing information, such as using SQL to create databases, HTML offers users an easy method to display and share digital information.

Another reason that HTML can be very useful for information professionals is that it allows metadata tags (or meta tags) to improve the experience for a user without them even being aware of it. For example, adding keywords to a webpage describes the page’s content. Using meta tags effectively can offer powerful navigation and improve information retrieval (Rosenfeld & Morville, 2002).

However, there are also limitations to the Web and HTML. In Web 1.0, HTML was not a multi-lingual language. Many languages use scripts which cannot be represented in standard HTML, because it is ‘based on a very limited coded character set.’ (Yergeau, no date) So for example, the Omniglot webpage uses images to demonstrate Sanskrit writing, which means that users cannot copy and paste characters, or interact with them as easily as with Latin languages.

Additionally, while HTML was originally compiled of elements for describing the structure of information, such as paragraphs, hyperlinks and headings, it evolved to include presentational tags, including fonts, colours and tables. This has been attributed to the introduction of Mosaic (Meyer, 2000). Presentational tags made HTML coding confusing and untidy, and caused interoperability problems (Chowdhury, 2008). This disadvantage of HTML has been recognised, and the World Wide Web Consortium (W3C) introduced a recommendation called CSS. CSS stands for Cascading Style Sheets, and offers rich styling of web documents (Meyer, 2000). This removes the need for tags like (for making text bold). The use of CSS created another opportunity, as HTML can return to its original purpose, which was as a structural language, and the presentational aspect of webpages can be addressed by using CSS. This is useful for information professionals, as it helps to separate out the design aspect. For example, it adds stronger argument that searching is an information specialist area rather than an IT or web design area, as the emphasis is on the structure and content of pages.

I have concluded that HTML and the Web are appropriate technologies for the digital representation of information, in relation to Web 1.0. With my webpage, I demonstrated how HTML tags and hyperlinks can be applied technically for the management of digital information. Today the Information Profession exploits HTML and the Web extensively. In particular, the Web is used for information sharing, for structuring information and for ease of navigation, as demonstrated on The National Archives webpages. Meta tagging is also incredibly useful to information professionals, notably for powerful searching and retrieval functions. HTML is an effective tool to provide access to information because it permits information to be displayed on a range of devices. Furthermore, HTML is relatively easy to learn as it uses plain English and does not require a thorough knowledge of programming, therefore allowing more people to use it.

I investigated the constraints of HTML and the Web as Web 1.0 technologies. I recognised the issue with displaying non-Latin languages as script. This was a problem for basic Web 1.0 webpages; it seems that these could only be represented on a webpage with images, which reduced options for manipulation of information. I highlighted the limitations of tagging, particularly how HTML coding became cluttered with presentational tags. I concluded that this problem became less significant with the introduction of Cascading Styling Sheets, which allowed programmers to separate structural and presentational code.

Berners-Lee, T. (1990) Information Management: A Proposal [online] Available at http://www.w3.org/History/1989/proposal.html [Accessed 22 October 2011]

Berners-Lee, T. (1999) Weaving the Web: the origins and future of the World Wide Web, London: Orion Business

Chowdhury, G. G. and Chowdhury, S. (2008) Organising Information: From the Shelf to the Web, London: Facet Publishing

Meyer, E.A. (2000) Cascading Style Sheets: The Definitive Guide, Sebastopol: O’Reilly Media Inc.

Omniglot (no date) Sanskrit [online] Available at http://www.omniglot.com/writing/sanskrit.htm [Accessed 22 October 2011]

Rosenfeld, L. and Morville, P. (2002) Information Architecture for the World Wide Web, Second Edition. Sebastopol: O’Reilly Media Inc.

The National Archives (no date) Catalogue [online] Available at http://www.nationalarchives.gov.uk/catalogue/default.asp?j=1 [Accessed 22 October 2011]

Yergeau, F. (no date) A world-wide World Wide Web. [online] Available at http://www.w3.org/International/francois.yergeau.html [Accessed 22 October 2011]

Thursday, 27 October 2011

Web 2.0: an introduction

In this week's lecture we discussed Web 2.0. Strictly speaking, there isn't a hard and fast definition of Web 2.0 (which is awkward) but there seems to be a general feeling that it involves social media, and that it involves people interacting with each other. Tim O'Reilly coined the term for a conference back in the mid-2000s, so it seems sensible to go with his view on the subject. His blog goes into a lot of detail (see it here) on the subject, but to summarise, he says that:

The Web is a platform (as opposed to their individual PC, people go online)
It harnesses collective intelligence (through blogs, social networking, reviews, and so on)
There is a focus on Data (hopefully that is fairly self-explanatory)
It is an end to the software release cycle (Software is a service not a product and users should be treated as co-developers)
It uses Lightweight Programming Models (keep it simple!)
Software is above the level of a single device (you aren't limited to a PC)
Rich user experiences (hopefully this one is self-explanatory too)

His blog goes into a lot of detail, and is an interesting read. I would certainly recommend it if you would like more depth!

There is some controversy (for want of a better word) around using the term Web 2.0, as Tim Berners-Lee argues that this was always the intention of the World Wide Web. The move to a focus on a network of people rather than of computers was always the grand plan. I can see his point, as he originally wanted to join up information and people with the invention of the Web. The difference now, perhaps, is that it has become so cheap to use the technologies (which were always available) that the Web has been opened up to much more people. High-speed Internet must have helped as well - I remember the days of signing in with a modem and being limited to one hour on the Internet because my Mum wanted to use the telephone. I think I would struggle now to limit my Internet usage like that, as I grab my laptop sporadically to look things up and check Facebook.

It's pretty amazing I've got this far without talking about Facebook! It seems to be commonly associated with Web 2.0, and I suspect everyone has at least heard of Facebook even if they don’t have an account. On Monday, I found it interesting that out of a whole lecture room, only 2 people (3 including our self-styled dissident lecturer, Andy) did not have a Facebook account. We talked about why people choose to have, and not to have, Facebook accounts. The crux of it seemed to be the privacy issue. We did touch on Google+, but for all the excitement about how their privacy settings are allegedly better, only a handful of people actually had accounts. In a way it's difficult to understand - if it does all the same stuff but they care more about your privacy, why doesn't everybody move? That's a big question for a little blog, so I'm not going to go into that here!

My notes for the session describe Facebook as a ‘standard bearer for social networking’. Perhaps that is a bit strong, but then again I did just say that nearly everyone has heard of Facebook and a lot of people have accounts… I think the jury is still out on such a bold statement! But let’s have a quick look at how it works (just in case you have been living on the moon and haven’t heard of it). It’s based on mutual agreement of people to be in contact with each other, so unlike blogs and web pages (in general) you agree with other people that you will be connected as ‘friends’ and have access to each other’s profiles. This is of course generalised, because you can play around with settings to change who can see what on your profile, but for the most part you need to agree to be connected to interact with each other. We discussed Facebook in a group in the lab session (which this week was more of a discussion session) and decided that Facebook was a virtual enhancement of your real life interactions. So you can comment on your friends’ photos, like their comments and chat with them on their wall, via private messaging, or via instant messaging. The other thing about Facebook is that it encourages you to use your real name. Some sites do not (think of Twitter, for instance, where your name tends not to be your name exactly – mine is @kaysafus for example).

In our discussion session, our group were tasked with comparing Facebook to Friends Reunited. The first thing we established was that everyone had heard of Facebook, but not many of us remembered Friends Reunited. None of us had actually used it, and we had to ask our lecturer to show us his profile to try and get an idea of what it was like. This made our discussion a little tricky at first – we were trying to establish where Friends Reunited had got it wrong, and it wasn’t immediately obvious because they seemed to do things in a similar way to Facebook. When we really started analysing it though, we realised that all of us had envisaged Friends Reunited as a site for slightly older people, as the name suggests you are looking for people you are no longer in touch with. Many of us were still at University (or not long out of it) when we joined Facebook, and we saw it as a way of keeping in touch with people we knew at the time, both at home and at University.

Another thing we liked about Facebook was that we never saw any advertising for it. Every single person in the room who had a Facebook account had been invited by a friend. With Friends Reunited, we got the impression that you had to set up a profile in order to be found – if someone is adding you as a friend then surely that defeats the object of Friends Reunited, as you do not need reuniting.

Naturally Web 2.0 is not limited to Facebook. Other examples include Twitter (a sort of mini interactive blog), Wikipedia (an online encyclopaedia where users can contribute) and blogs. I don’t think I really have space in this blog to explore all of them in depth, but it is worth noting that, as with Facebook, collaboration is the key – everyone can get involved. In particular with reference to Wikipedia, you don’t need to have any specific qualifications, or be a subject expert to contribute. Naturally this is a double-edged sword as you can get a rich information source, but it could contain errors. A fellow student admitted to making some erroneous changes to a page, which has not yet been recognised.

How this will all play out will be interesting. There are those (my lecturers included) who rather darkly suggest there may be another bubble-burst, as with the dot com bubble in the 1990s. Personally I’d prefer to be more optimistic – perhaps Web 2.0 will naturally evolve into whatever comes next, rather than collapsing. Aspects of Web 2.0 have become such a part of life for many users that it is difficult imagine not having them – people use Facebook, Twitter, YouTube, Wikipedia and other Web 2.0 sites frequently, and it had changed the way people behave online.

Something I find quite interesting is company reactions to Web 2.0, and their policies (or lack thereof). Perhaps it would be interesting to investigate how different types of workplaces approach their staff’s access to Web 2.0. Many businesses seem to be keen to have a web presence on these sites – lots of them have Twitter feeds, Facebook pages and so on, but I wonder what they think about staff using these sites at work. Above I mentioned that people’s online behaviour has changed, perhaps their behaviour at work has changed too – do people expect to be able to access these sites during work hours? In my current workplace we have a Twitter feed (which is Comms led, not KIM led) but no clear policies about whether staff can access their own Twitter accounts during work hours on work PCs. My company also has a YouTube account and Flickr account, for which there are no clear policies either. Then of course there is the issue of staff accessing these sites on their mobile phones (either via company wifi or not) which it would be difficult to have any control over. One company I used to work for blocked Internet access almost completely, and forbade staff from using mobile phones during office hours. Needless to say it was not a fun place to work. I would be interested to know whether this is still the case, but unfortunately I no longer keep in touch with any colleagues from that company.

I have offered a brief introduction to Web 2.0, with an emphasis that Web 2.0 is about networks of people, and interaction between them. I looked at Facebook as an example of a Web 2.0 site. Everyone has heard of it, lots of people use it, and to be honest it's a bit addictive (you won't be surprised, I'm sure, to know I have it open in another tab whilst writing this). I hinted that it has changed the way we behave online, which I hope to explore in another blog. If you would be particularly interested to read about it, leave a comment or send me a message and I'll try and put something together. So what comes next? As I said, I hope there won't be another bubble-burst, rather I would like to think we can build on Web 2.0 and make it even better. After all, if Web 2.0 is based on a network of people, perhaps we have a say in its future.

Information Retrieval: a very quick introduction

Information Retrieval, or IR is to do with information seeking behaviour, or searching for something relevant to the task at hand. In IR terms, relevance is defined as fulfilling the user's information needs (this is not so in philosophy, which I hope to blog about soon if I get time!). The difference between IR and querying a database, is that IR returns results ranked based on the probability it matches to your search, whereas a database will return an exact match (or nothing). Lancaster (1968) says that an information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request. (Taken from article on Information Retrieval here). It has also been defined as ' finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers)' (see more here).

There are lots of different ways of defining IR, but there are 3 formal definitions based on the perspective of the process. First, there is the user view, which is the user's 'anomalous state of knowledge' or ASK. This means they have knowledge gaps and are seeking information. Second is the systems view, which is the hardware and software for IR. Finally there is a sources view, which is the presentation of the information (usually not your information) using technology.

Searching for information happens in lots of different ways. Searching for a website, is not the same as searching for a new flat, is not the same as searching for some information about goldfish. Or so says Broder, who argues that these queries are split up into Navigational queries, Transactional queries and Informational queries respectively.

We experimented with this in the lab, trying out different types of queries using Google and Bing. We experimented with different searches, using Boolean logic - terms like AND, OR and NOT can make a huge difference to your search. The same is true of using quotation marks to search for an exact phrase. It is suggested that many users will try to formulate this in natural language, although personally I wouldn't state that categorically without doing a bit more research!

Sometimes your first search does't give you results that you deem relevant. This is when it becomes necessary to modify your query. This can be adding in extra words, removing or changing operators or trying synonyms to see if you get different results. You can evaluate the effectiveness of a search qualitatively or quantitatively. Qualitative analysis is from a user perspective - does it satisfy their information need? To establish this is quite time consuming, as you need to do consumer surveys and questionnaires to find this out. Quantitative experiments test the speed of retrieval (efficiency) and how many relevant documents were retrieved (effectiveness).

There are two ways to measure effectiveness - precision and recall. Precision is the proportion of documents which are relevant. For example if you return 5 results and 3 are relevant, you have 60% relevance. The formula is relevant documents received / total documents retrieved. For recall, the formula is relevant documents retrieved / total number of relevant documents in database. This is more difficult to to calculate,but is useful to know as it tells you how good the system is at detecting relevant documents in the index.

The Information Retrieval Nirvana would be 100% recall with 100% precision. Unfortunately, this tends not to happen! Instead, there is an inverse relationship between the two. If a user searches for something very specific, they are likely to get a good level of precision but they might not get such good recall because they are not capturing everything they want. Likewise, if a user casts their net very widely they will retrieve a lot of relevant information, but they might also get a lot that they are not interested in, so precision is low.

This has been a bit of a crash course in Information Retrieval (I mean, I didn't even mention indexing), but hopefully readers are inspired to go and find out more. If you have a knowledge gap, this would be a great opportunity to practice IR as a user. Do a Google search (or Bing, or Yahoo, or whatever) and play around with Boolean operators. I should be covering IR in a lot more detail next semester, so keep an eye out for another, more detailed, blog then!

Thursday, 13 October 2011

Databases and SQL

Monday's lecture was about databases, in particular looking at SQL (pronounced either as the letters, or as 'sequel' depending on how you feel about it). We had an introduction to databases to bring everybody up to speed - I was surprised how much I remembered from GCSE IT. My teacher would have been proud. We discussed when a database is a good idea, and when it's not such a good idea. A rule of thumb seems to be that it's a good plan when it's your data as opposed to someone else's. For that it might be a better idea to think about information retrieval (that's next week's lecture!).

It is a good idea to have a centrally managed and designed database. Problems of the past still haunt our offices today - it's now so easy to make a database a lot of people still use local ones rather than having a nicely linked up central database. This causes all manner of problems due to inaccurate data (not to mention legal issues like Data Protection!). Instead, organisations should have their data stored in one place, with each department accessing only the data that is relevant to them. The Database Management System (DBMS) controls access to the data by various departments / people.

So to be clear, the DBMS is the software which lets the users get at the data, whereas a database, to quote my course handout, is an 'integrated collection of data shareable between users and application systems'. So it's the stuff behind the scenes rather than the shiny interface the users get to see. Using a DBMS is advantageous as it assures information integrity (insofar as it gets updated everywhere when it's updated once). It also removes the reliance on a certain program as it is a collection of data which can be used by more than one application.

So what should it look like? Perhaps the way to explain this best is to say what you don't want. You don't want duplication within a table, and the best way to make sure this doesn't happen is to look at relationships and break up your data into different tables. I'll give you an example. Let's say we have some form of music database, with song titles, album names, artists and release date. So you'd have a table that looks like this:

Song_Title	Album_Name	Artist	Release_Year	Genre
Don't Marry Her	Blue is the Colour	The Beautiful South	1996	Pop
Perfect 10	Quench	Beautiful South	1998	Pop
Be Here to Love Me	Feels Like Home	Norah Jones	2004	Chill Out
Sunrise	Feels Like Home	Jones, Norah	2004	Chillout
Never Gonna Give You Up	Greatest Hits	Risk Astley	2002	80s

There's a few things to notice about this table. Firstly, did you notice how I put underscores in the titles? This is because it's generally not a good idea to put space in your titles - lots of programs don't like it. Just because Microsoft Access lets you do it, it doesn't mean it's a good idea! So use underscores, make it all one word, remember the user doesn't see this stuff.

This is probably a good time to mention primary keys. Did you notice that there isn't any one field that you could use to uniquely identify a song? Every table needs one of these, and the good news is most DBMSs can create these for you. Primary keys are important when it comes to making separate tables, which I will come to in a minute!

So back to the music, as it were. There are lots of duplications, for example the album name, the artist, the release date and the genre! This can be fixed by dividing the data up into the following tables:

Did you also notice the errors? The Beautiful South perform two of these songs but the band name is different in two rows. This is why having duplication in one table can be such a bad idea, and this is why we need more than one table.

Song

Song_ID	Song_Title	Album_ID
1	Don't Marry Her	1
2	Perfect 10	2
3	Be Here to Love Me	3
4	Sunrise	3
5	Never Gonna Give You Up	4

Album

Album_ID	Album_Name	Artist_ID	Release_Year
1	Blue is the Colour	1	1996
2	Quench	1	1998
3	Feels Like Home	2	2004
4	Greatest Hits	4	2002
5	Nobody's Prefect	5	2001

Artist

Artist_ID	Artist_Name	Genre_ID
1	The Beautiful South	1
2	Norah Jones	3
3	Jimmy Eat World	2
4	Rick Astley	5
5	The Cheapskates	4

Genre

Genre_ID	Genre
1	Pop
2	Rock
3	Chill Out
4	Ska
5	80s

This removes the duplication issue. Each table has a primary key, or unique identifier. In some of the tables, a primary key from a different table is in a column to match tables together. For example, in the table 'Artist' there is a column for 'Genre_ID'. When an identifier is in another table, it is known as a foreign key.

Now to look at some SQL and query our database! This is the most important bit. When searching you want to get out helpful data, and that means getting what you want and not getting what you don't want. It sounds obvious, but people often sift through lots of rubbish. Think about searching Google - it's a great tool but if you don't put in a sensible search you can get a lot of irrelevant results. If you don't believe me, try searching 'tablet' - you want to see what you could get instead of an iPad, but you end up with lots of things you didn't want, don't you?

Anyway, before I get sidetracked and start thinking about iPads, let's get back to SQL. To search, the basic syntax you need looks like this:

SELECT fields
FROM tables;

So if we were searching the database above for an album and its release date, we'd need:

SELECT Album_Name, Release_Year
FROM Album;

Which would give you:

Album_Name	Release_Year
Blue is the Colour	1996
Quench	1998
Feels Like Home	2004
Greatest Hits	2002
Nobody's Prefect	2001

When using select, there is no limit to the number of fields you can put in there. However, if you want everything then you can use an asterisk:

SELECT *
FROM Artist;

Which would give you:

Artist_ID	Artist_Name	Genre_ID
1	The Beautiful South	1
2	Norah Jones	3
3	Jimmy Eat World	2
4	Rick Astley	5
5	The Cheapskates	4

This is good if you want to see everything, but realistically an end user doesn't want to see the foreign keys in this table, they want the data, and this is where you need to start making your queries a bit more interesting. It isn't as obvious as it first looks, which can cause people some problems. If you wanted the genre of all the albums, you might this of tying in:

SELECT Artist_Name, Genre,
FROM Artist, Genre;

It seems logical, but this would actually give you a table with everything in it - every artist next to every genre. This is because computers aren't smart enough to know that you actually want the relevant Genre. It just pulls everything out of the database and dumps it in front of you. Instead, you need to tell it what you want by typing in this:

SELECT Artist_Name, Genre,
FROM Artist, Genre
WHERE Artist.Genre_ID = Genre.Genre_ID;

This will bring back a list where the Genre ID matches on both tables, and you get your answer. There's some new syntax in this code. There is a comparison operator '=' in the WHERE clause. There are a few of these, including (but not limited to)

= equal to
< less than
> greater than
<= less than or equal to
>= greater than or equal to
!= not equal to

The important thing to know about using the WHERE clause is that numbers do not require quotation marks, but any other characters do. So if you are looking for a date you would only type in:

WHERE Release_Year = 1998

If you are looking for text then you need quotation marks:

WHERE Song_Title = 'Sunrise'

Another thing to be aware of when using the WHERE clause is a wildcard search. This means you can truncate a word, so if you know that 'Nobody's Prefect' is in there somewhere, but you aren't sure if the database uses the apostrophe or not, you could search for this:

WHERE Album_Name = 'Nobody%'

This brings you everything which has Nobody at the beginning. You can use the % at the beginning and end if you like, so for song titles with the letter P in them you could search for '%p%' (although I'm not sure why you would need to!).

Finally, you can use WHERE more than once, for example you want songs from a certain artist from a certain year. You want to know if there is anything in the database by The Beautiful South after 1997. The syntax would be:

SELECT Artist_Name, Album_Name, Release_Year
FROM Album, Artist
WHERE Artist.Artist_ID = Album.Artist_ID
AND Artist_Name = 'The Beautiful South'
AND Release_Year >= 1997;

Did you notice how some of the fields say the table they are in and some don't? This is when the same field is in more than one table (remember foreign keys?) You need to define where you want the data from otherwise you will confuse the computer and it will probably tell you it can't do it.

So to conclude querying two tables, the syntax generally looks something like:

SELECT (whatever fields you want)
FROM tableA, tableB
WHERE tableA.primarykey = tableB.foreignkey;

If you forget to do this then you will probably get a lot more results than you had counted on, so be careful!

A final note on results; if you get results where the same thing appears on more than one row (this is possible even with neat data tables) then you can make sure it only appears once. For example, if you wanted to know which albums were released before 2000, you would get The Beautiful South twice, as both their albums were release before 2000. To prevent this from happening, you can add in 'distinct', which means it will only list it once, whether the results bring them back twice, or two thousand times! So the syntax would be:

SELECT DISTINCT Artist_Name
FROM Album, Artist
WHERE Artist.Artist_ID = Album.Artist_ID
AND Release_Year < 2000;

Apologies for the length of this post - it's all those tables! I have tried to present this clearly, but the best way to understand using SQL is to have a go. When doing this in the lab myself I got very cross about adding in commas - they only go in between items not at the end (so above you would not put FROM Album, Artist, because there is a comma at the end and the computer doesn't like it).

Monday, 10 October 2011

Cascading Style Sheets

Cascading style sheets, or CSS, can format document content using markup languages, such as html script (see my previous post for a brief introduction to html). The idea of CSS is to separate the document content and document format. This is advantageous as it means that the coding for the design doesn’t have to be mixed up with the content of a web page. It is also useful as you can standardise a selection of web pages by using the same style sheet, or even apply your own formatting to existing web pages. This would be useful should you consistently need text to be larger, for example.

The W3 web page has a helpful introduction to CSS here. It explains the three different ways of adding style information to documents:

Linking Style Sheet
Internal Style Sheet
Inline Style Sheet¹

The differences between these three ways of applying style sheets is where the coding is ‘kept’. Internal and Inline style sheets are both within the same page – Internal is within the head of the document using the tag <style>; Inline is directly within the html document. Linking Style Sheets are completely separate documents, which would make them the best candidates for standardising web pages (as mentioned above). This means you can edit the formatting across a number of documents, so if you were in a corporate environment and your logo changed you could apply it universally rather than page by page. This has been described as 'brand cohesiveness' (Nielsen, 1997)².

W3 has some guidelines on using CSS, including a list of terminology. The recommendations define that CSS1 is 'human readable and writable, and expresses style in common desktop publishing terminology'³. For example, if you wanted to apply a certain colour to a title or paragraph, you would use the property 'color' (note the American spelling!). This is easy for a human to understand. The value would be written in hexadecimal (perhaps less easy for a human to understand) which looks something like this: #b22222 (this is 'frebrick' red). For a full list of colours on the w3 website, click here.

To explain what I mean by 'property' and 'value' I have borrowed a helpful diagram from the W3 wiki:

CSS Syntax

The selector is the thing you are styling, for example p would be a paragraph.
The property name/property is the format item, so 'color' would be colour, 'font-size' would be font size, etc.
The value is what you are assigning to the property. For font size it is recommended that you use a percentage. This means that it will take the font size that a user normally uses, and make it larger/smaller, for example font-size:80%

So an example would be that you would like the body of the text to be in italic. That would look like this:

BODY {font-style: italic}

The best way really to demonstrate all of this would be with an actual style sheet. I have managed to get a web site working on the City server. I have applied a style sheet which I adapted from a guide sheet we were provided with in the lab session. My website is here. The CSS for the page is here. The bit of code that tells my web page to use the CSS is in the <head> and it looks like this:

<LINK REL="stylesheet" HREF="http://www.student.city.ac.uk/~abkb846/public_html/kaystyle.css" TYPE="text/css">

Using CSS certainly seems useful, as you can apply formatting rules quit easily without getting mixed up in the main information. Blogspot, for examples, does not seem to do it this way - if you change a font it makes the html really messy (if you don't believe me have a look at the source code. All I did was change the font!) It's certainly something I will be trying to learn more about and have a go at!

¹What is CSS? CSS Training, W3 Wiki [online] accessed 9 October 2011

²Nielsen, J. Effective Use of Style Sheets, Use It [online] accessed 9 October 2011

³Cascading Style Sheets, level 1, W3C Recommendation 17 Dec 1996, revised 11 Apr 2008 [online] accessed 9 October 2011

Tuesday, 4 October 2011

DITA week 2: The Internet, html and the WWW

Hello bloggees! I have no idea if that is an actual term, but I have decided to use it anyway. If you have a better suggestion as to how I should address you, please comment below as I am still learning netiquette. As we all are, as it is ever developing. Maybe they should start a finishing school for the net...

I digress. So, to get back on topic: today's DITA lecture was about the Internet and the World Wide Web, with a focus on html in the lab. Please note, that while many people use 'the Internet' and 'the World Wide Web' as interchangeable terms, I will not be doing so. I will explain why below. I will then go on to explain what we did in the lab, including some examples, which I hope will be suitably snazzy.

Anyone who has studied a whirlwind history of information or computer science will know the origins on the Internet, and of the World Wide Web. However, I feel it would be beneficial to assume that my readers are new to the topic. So if you don't want to read about the history and background then you can skip ahead to the html bit here.

The Internet dates back to the 1960s, where it was developed as a way of sharing information via a network of networks. According to ISOC, in August 1962 J.C.R. Licklider 'envisioned a globally interconnected set of computers through which everyone could quickly access data', his "Galactic Network" concept.¹ He worked at DARPA (Defense Advanced Research Projects Agency) at the time, so the Internet actually had military origins. He was inspired by 'Project Lincoln', an early warning network to guard against Soviet nuclear bomb attack in the 1950s. The project eventually resulted in SAGE, Semi-Automatic Ground Environment, which included 'the world’s first long-distance network, which allowed the computers to transfer data'.² I won't go into huge detail here because I appreciate that this blog will get very long (I could probably dedicate an entire blog to the history of the Internet!), but if you are interested the DARPA site is interesting and I have put a link in my references at the bottom of the blog. So, to cut a long story short, a networking project was launched, the concept of sending packets was dreamed up, an architectural description of a network of networks was put together in the 1970s by Kahn and Cerf, people started picking up on it, and it was deemed a beautiful thing.

The World Wide Web, by contrast, has 'only' been around since the 1990s (although Tim Berners-Lee started working on the idea in the 1980s). In his book, Weaving the Web, Berners-Lee describes his vision of the Web as 'a universal, all-encompassing space'.³ It was originally designed as a directory and method for information sharing at CERN. It was soon picked up by academics, and then spread until it was in common usage, as it is today.

To make it really clear I will borrow an analogy our lecturer shared with us. The Internet is the road, and the World Wide Web is like a car 'travelling' on it. There are other 'vehicles', the most common example being email.

So how does it work? The Internet uses a client-server architecture. Clients send requests via the network to servers, which respond. So for example, if you are trying to access a document over the web, you send a request to the server. One way to look at this is to examine a URL (Uniform Resource Locator) to see how it is constructed:

http://www.cscan.org/presentations/08-11-06-MikeEvans-Web.pdf

(Incidentally this is a good presentation about the evolution of the Web!)

The first part of the address, http:// identifies the link is a WWW document which should be transferred using HTTP (hypertext transfer protocol).

Next, www.cscan.org is the server DNS (domain name system).

Finally, presentations/08-11-06-MikeEvans-Web.pdf is the file path to where the document is.

In the lab part of the session (sadly only for an hour this week) we looked at html and had a go at creating web pages. Unfortunately my work is not yet available as a web page, but never fear! There are some examples of html coding (with some explanation of what they do) below, but first I feel I should give a little introduction to explain what all the brackets are about!

Html is made up of tags, which look something like this: . Note the difference between these tags - the one on the left opens and the one on the right closes whatever the instruction is. In this example, makes the text strong, which is usually (but not always) bold in the browser. Most html tags require closing, so a recommended way to ensure you do this is to open and close when typing, then go back and put in the stuff in the middle. So, let's look at what all the tags do:

<html> - this specifies where the html code begins and ends
<head> - this contains information which is not displayed, and is useful for metadata
<title> - a fairly self explanatory one, this changes what you see at the top of your browser window
<body> - this begins and ends all the stuff in the web page

Adding links and images is slightly more fancy. The best way to demonstrate this, I think, is to show you what it will look like as well. Isn't that fancy? So to start with, let's do a link. The plan is to link to a BBC news article about pandas. But the link itself isn't very visually appealing: http://www.bbc.co.uk/news/world-12160538

To solve this problem, I'm going to put a link behind the words of the title, in this case 'Why do we love pandas?' So, here we are:

Why do we love pandas?

But how did I do that? Well, it's not too complicated. All you need is your link and the html know-how. It looks like this:

<a href="http://www.bbc.co.uk/news/world-12160538"> Why do we love pandas?>

As you can see, the web link is the anchor <a> referenced href in quotation marks "". The text I want to show is at the end.

What about an image? There are two choices - you can get one off the web or you can get one locally. I'm going to post a picture of a smiley chair which I've borrowed from here: http://pleated-jeans.com/wp-content/uploads/2011/09/thecomicrelief.tumblr.png

In order to do this, I'm going to use the following html:

<img src="http://pleated-jeans.com/wp-content/uploads/2011/09/thecomicrelief.tumblr.png" width="200" height="170" alt="Image of a smiley chair">

No doubt you've spotted the address, and I'm sure you can use your detective skills to identify what 'width' and 'height' are for. But what is alt for? It's what will come up if the picture doesn't load for any reason. It's also handy for software which reads the web page for you out loud. So without further ado, here is the smiley chair:

Image of a smiley chair

But maybe you're interested in formatting your web page and making it shiny? Of course you are. There's more than one way to do this. You can apply html coding in the web page code itself, for example you can use and (emphasis). Using bold and italic are frowned upon but 'proper' web designers, as strong and emphasis often come up this way, but will also be accessible to other users, for example those with visual impairments.

Another way of formatting your web page is to use CSS (Cascading Style Sheets) but, I'm afraid that will have to wait for another blog. I had every intention of getting into the habit of writing 1000 words or less as practice for my coursework, but unfortunately I seem to have been unsuccessful (1,288 words and counting). So for now, I will bid you farewell. In my next blog I hope we can pick up on CSS, but I haven't planned that far ahead, and I may delve into some other Information Science topics before then, we will have to wait and see!

¹Leiner, B.M., Cerf, V.G., Clark, D.D., Kahn, R.E., Kleinrock, L., Lynch, D.C., Postel, J., Roberts, L.G., Wolff, S. A Brief History of the Internet. [online] Accessed 4 October 2011

²Waldrop, M. DARPA and the Internet Revolution. [online] Accessed 4 October 2011

³Berners-Lee, T. (1999) Weaving the Web : the origins and future of the World Wide Web. London: Orion Business, p. 5