Information Overload!: City University

Showing posts with label City University. Show all posts

Thursday, 13 October 2011

Databases and SQL

Monday's lecture was about databases, in particular looking at SQL (pronounced either as the letters, or as 'sequel' depending on how you feel about it). We had an introduction to databases to bring everybody up to speed - I was surprised how much I remembered from GCSE IT. My teacher would have been proud. We discussed when a database is a good idea, and when it's not such a good idea. A rule of thumb seems to be that it's a good plan when it's your data as opposed to someone else's. For that it might be a better idea to think about information retrieval (that's next week's lecture!).

It is a good idea to have a centrally managed and designed database. Problems of the past still haunt our offices today - it's now so easy to make a database a lot of people still use local ones rather than having a nicely linked up central database. This causes all manner of problems due to inaccurate data (not to mention legal issues like Data Protection!). Instead, organisations should have their data stored in one place, with each department accessing only the data that is relevant to them. The Database Management System (DBMS) controls access to the data by various departments / people.

So to be clear, the DBMS is the software which lets the users get at the data, whereas a database, to quote my course handout, is an 'integrated collection of data shareable between users and application systems'. So it's the stuff behind the scenes rather than the shiny interface the users get to see. Using a DBMS is advantageous as it assures information integrity (insofar as it gets updated everywhere when it's updated once). It also removes the reliance on a certain program as it is a collection of data which can be used by more than one application.

So what should it look like? Perhaps the way to explain this best is to say what you don't want. You don't want duplication within a table, and the best way to make sure this doesn't happen is to look at relationships and break up your data into different tables. I'll give you an example. Let's say we have some form of music database, with song titles, album names, artists and release date. So you'd have a table that looks like this:

Song_Title	Album_Name	Artist	Release_Year	Genre
Don't Marry Her	Blue is the Colour	The Beautiful South	1996	Pop
Perfect 10	Quench	Beautiful South	1998	Pop
Be Here to Love Me	Feels Like Home	Norah Jones	2004	Chill Out
Sunrise	Feels Like Home	Jones, Norah	2004	Chillout
Never Gonna Give You Up	Greatest Hits	Risk Astley	2002	80s

There's a few things to notice about this table. Firstly, did you notice how I put underscores in the titles? This is because it's generally not a good idea to put space in your titles - lots of programs don't like it. Just because Microsoft Access lets you do it, it doesn't mean it's a good idea! So use underscores, make it all one word, remember the user doesn't see this stuff.

This is probably a good time to mention primary keys. Did you notice that there isn't any one field that you could use to uniquely identify a song? Every table needs one of these, and the good news is most DBMSs can create these for you. Primary keys are important when it comes to making separate tables, which I will come to in a minute!

So back to the music, as it were. There are lots of duplications, for example the album name, the artist, the release date and the genre! This can be fixed by dividing the data up into the following tables:

Did you also notice the errors? The Beautiful South perform two of these songs but the band name is different in two rows. This is why having duplication in one table can be such a bad idea, and this is why we need more than one table.

Song

Song_ID	Song_Title	Album_ID
1	Don't Marry Her	1
2	Perfect 10	2
3	Be Here to Love Me	3
4	Sunrise	3
5	Never Gonna Give You Up	4

Album

Album_ID	Album_Name	Artist_ID	Release_Year
1	Blue is the Colour	1	1996
2	Quench	1	1998
3	Feels Like Home	2	2004
4	Greatest Hits	4	2002
5	Nobody's Prefect	5	2001

Artist

Artist_ID	Artist_Name	Genre_ID
1	The Beautiful South	1
2	Norah Jones	3
3	Jimmy Eat World	2
4	Rick Astley	5
5	The Cheapskates	4

Genre

Genre_ID	Genre
1	Pop
2	Rock
3	Chill Out
4	Ska
5	80s

This removes the duplication issue. Each table has a primary key, or unique identifier. In some of the tables, a primary key from a different table is in a column to match tables together. For example, in the table 'Artist' there is a column for 'Genre_ID'. When an identifier is in another table, it is known as a foreign key.

Now to look at some SQL and query our database! This is the most important bit. When searching you want to get out helpful data, and that means getting what you want and not getting what you don't want. It sounds obvious, but people often sift through lots of rubbish. Think about searching Google - it's a great tool but if you don't put in a sensible search you can get a lot of irrelevant results. If you don't believe me, try searching 'tablet' - you want to see what you could get instead of an iPad, but you end up with lots of things you didn't want, don't you?

Anyway, before I get sidetracked and start thinking about iPads, let's get back to SQL. To search, the basic syntax you need looks like this:

SELECT fields
FROM tables;

So if we were searching the database above for an album and its release date, we'd need:

SELECT Album_Name, Release_Year
FROM Album;

Which would give you:

Album_Name	Release_Year
Blue is the Colour	1996
Quench	1998
Feels Like Home	2004
Greatest Hits	2002
Nobody's Prefect	2001

When using select, there is no limit to the number of fields you can put in there. However, if you want everything then you can use an asterisk:

SELECT *
FROM Artist;

Which would give you:

Artist_ID	Artist_Name	Genre_ID
1	The Beautiful South	1
2	Norah Jones	3
3	Jimmy Eat World	2
4	Rick Astley	5
5	The Cheapskates	4

This is good if you want to see everything, but realistically an end user doesn't want to see the foreign keys in this table, they want the data, and this is where you need to start making your queries a bit more interesting. It isn't as obvious as it first looks, which can cause people some problems. If you wanted the genre of all the albums, you might this of tying in:

SELECT Artist_Name, Genre,
FROM Artist, Genre;

It seems logical, but this would actually give you a table with everything in it - every artist next to every genre. This is because computers aren't smart enough to know that you actually want the relevant Genre. It just pulls everything out of the database and dumps it in front of you. Instead, you need to tell it what you want by typing in this:

SELECT Artist_Name, Genre,
FROM Artist, Genre
WHERE Artist.Genre_ID = Genre.Genre_ID;

This will bring back a list where the Genre ID matches on both tables, and you get your answer. There's some new syntax in this code. There is a comparison operator '=' in the WHERE clause. There are a few of these, including (but not limited to)

= equal to
< less than
> greater than
<= less than or equal to
>= greater than or equal to
!= not equal to

The important thing to know about using the WHERE clause is that numbers do not require quotation marks, but any other characters do. So if you are looking for a date you would only type in:

WHERE Release_Year = 1998

If you are looking for text then you need quotation marks:

WHERE Song_Title = 'Sunrise'

Another thing to be aware of when using the WHERE clause is a wildcard search. This means you can truncate a word, so if you know that 'Nobody's Prefect' is in there somewhere, but you aren't sure if the database uses the apostrophe or not, you could search for this:

WHERE Album_Name = 'Nobody%'

This brings you everything which has Nobody at the beginning. You can use the % at the beginning and end if you like, so for song titles with the letter P in them you could search for '%p%' (although I'm not sure why you would need to!).

Finally, you can use WHERE more than once, for example you want songs from a certain artist from a certain year. You want to know if there is anything in the database by The Beautiful South after 1997. The syntax would be:

SELECT Artist_Name, Album_Name, Release_Year
FROM Album, Artist
WHERE Artist.Artist_ID = Album.Artist_ID
AND Artist_Name = 'The Beautiful South'
AND Release_Year >= 1997;

Did you notice how some of the fields say the table they are in and some don't? This is when the same field is in more than one table (remember foreign keys?) You need to define where you want the data from otherwise you will confuse the computer and it will probably tell you it can't do it.

So to conclude querying two tables, the syntax generally looks something like:

SELECT (whatever fields you want)
FROM tableA, tableB
WHERE tableA.primarykey = tableB.foreignkey;

If you forget to do this then you will probably get a lot more results than you had counted on, so be careful!

A final note on results; if you get results where the same thing appears on more than one row (this is possible even with neat data tables) then you can make sure it only appears once. For example, if you wanted to know which albums were released before 2000, you would get The Beautiful South twice, as both their albums were release before 2000. To prevent this from happening, you can add in 'distinct', which means it will only list it once, whether the results bring them back twice, or two thousand times! So the syntax would be:

SELECT DISTINCT Artist_Name
FROM Album, Artist
WHERE Artist.Artist_ID = Album.Artist_ID
AND Release_Year < 2000;

Apologies for the length of this post - it's all those tables! I have tried to present this clearly, but the best way to understand using SQL is to have a go. When doing this in the lab myself I got very cross about adding in commas - they only go in between items not at the end (so above you would not put FROM Album, Artist, because there is a comma at the end and the computer doesn't like it).

Monday, 10 October 2011

Cascading Style Sheets

Cascading style sheets, or CSS, can format document content using markup languages, such as html script (see my previous post for a brief introduction to html). The idea of CSS is to separate the document content and document format. This is advantageous as it means that the coding for the design doesn’t have to be mixed up with the content of a web page. It is also useful as you can standardise a selection of web pages by using the same style sheet, or even apply your own formatting to existing web pages. This would be useful should you consistently need text to be larger, for example.

The W3 web page has a helpful introduction to CSS here. It explains the three different ways of adding style information to documents:

Linking Style Sheet
Internal Style Sheet
Inline Style Sheet¹

The differences between these three ways of applying style sheets is where the coding is ‘kept’. Internal and Inline style sheets are both within the same page – Internal is within the head of the document using the tag <style>; Inline is directly within the html document. Linking Style Sheets are completely separate documents, which would make them the best candidates for standardising web pages (as mentioned above). This means you can edit the formatting across a number of documents, so if you were in a corporate environment and your logo changed you could apply it universally rather than page by page. This has been described as 'brand cohesiveness' (Nielsen, 1997)².

W3 has some guidelines on using CSS, including a list of terminology. The recommendations define that CSS1 is 'human readable and writable, and expresses style in common desktop publishing terminology'³. For example, if you wanted to apply a certain colour to a title or paragraph, you would use the property 'color' (note the American spelling!). This is easy for a human to understand. The value would be written in hexadecimal (perhaps less easy for a human to understand) which looks something like this: #b22222 (this is 'frebrick' red). For a full list of colours on the w3 website, click here.

To explain what I mean by 'property' and 'value' I have borrowed a helpful diagram from the W3 wiki:

CSS Syntax

The selector is the thing you are styling, for example p would be a paragraph.
The property name/property is the format item, so 'color' would be colour, 'font-size' would be font size, etc.
The value is what you are assigning to the property. For font size it is recommended that you use a percentage. This means that it will take the font size that a user normally uses, and make it larger/smaller, for example font-size:80%

So an example would be that you would like the body of the text to be in italic. That would look like this:

BODY {font-style: italic}

The best way really to demonstrate all of this would be with an actual style sheet. I have managed to get a web site working on the City server. I have applied a style sheet which I adapted from a guide sheet we were provided with in the lab session. My website is here. The CSS for the page is here. The bit of code that tells my web page to use the CSS is in the <head> and it looks like this:

<LINK REL="stylesheet" HREF="http://www.student.city.ac.uk/~abkb846/public_html/kaystyle.css" TYPE="text/css">

Using CSS certainly seems useful, as you can apply formatting rules quit easily without getting mixed up in the main information. Blogspot, for examples, does not seem to do it this way - if you change a font it makes the html really messy (if you don't believe me have a look at the source code. All I did was change the font!) It's certainly something I will be trying to learn more about and have a go at!

¹What is CSS? CSS Training, W3 Wiki [online] accessed 9 October 2011

²Nielsen, J. Effective Use of Style Sheets, Use It [online] accessed 9 October 2011

³Cascading Style Sheets, level 1, W3C Recommendation 17 Dec 1996, revised 11 Apr 2008 [online] accessed 9 October 2011

Tuesday, 4 October 2011

DITA week 2: The Internet, html and the WWW

Hello bloggees! I have no idea if that is an actual term, but I have decided to use it anyway. If you have a better suggestion as to how I should address you, please comment below as I am still learning netiquette. As we all are, as it is ever developing. Maybe they should start a finishing school for the net...

I digress. So, to get back on topic: today's DITA lecture was about the Internet and the World Wide Web, with a focus on html in the lab. Please note, that while many people use 'the Internet' and 'the World Wide Web' as interchangeable terms, I will not be doing so. I will explain why below. I will then go on to explain what we did in the lab, including some examples, which I hope will be suitably snazzy.

Anyone who has studied a whirlwind history of information or computer science will know the origins on the Internet, and of the World Wide Web. However, I feel it would be beneficial to assume that my readers are new to the topic. So if you don't want to read about the history and background then you can skip ahead to the html bit here.

The Internet dates back to the 1960s, where it was developed as a way of sharing information via a network of networks. According to ISOC, in August 1962 J.C.R. Licklider 'envisioned a globally interconnected set of computers through which everyone could quickly access data', his "Galactic Network" concept.¹ He worked at DARPA (Defense Advanced Research Projects Agency) at the time, so the Internet actually had military origins. He was inspired by 'Project Lincoln', an early warning network to guard against Soviet nuclear bomb attack in the 1950s. The project eventually resulted in SAGE, Semi-Automatic Ground Environment, which included 'the world’s first long-distance network, which allowed the computers to transfer data'.² I won't go into huge detail here because I appreciate that this blog will get very long (I could probably dedicate an entire blog to the history of the Internet!), but if you are interested the DARPA site is interesting and I have put a link in my references at the bottom of the blog. So, to cut a long story short, a networking project was launched, the concept of sending packets was dreamed up, an architectural description of a network of networks was put together in the 1970s by Kahn and Cerf, people started picking up on it, and it was deemed a beautiful thing.

The World Wide Web, by contrast, has 'only' been around since the 1990s (although Tim Berners-Lee started working on the idea in the 1980s). In his book, Weaving the Web, Berners-Lee describes his vision of the Web as 'a universal, all-encompassing space'.³ It was originally designed as a directory and method for information sharing at CERN. It was soon picked up by academics, and then spread until it was in common usage, as it is today.

To make it really clear I will borrow an analogy our lecturer shared with us. The Internet is the road, and the World Wide Web is like a car 'travelling' on it. There are other 'vehicles', the most common example being email.

So how does it work? The Internet uses a client-server architecture. Clients send requests via the network to servers, which respond. So for example, if you are trying to access a document over the web, you send a request to the server. One way to look at this is to examine a URL (Uniform Resource Locator) to see how it is constructed:

http://www.cscan.org/presentations/08-11-06-MikeEvans-Web.pdf

(Incidentally this is a good presentation about the evolution of the Web!)

The first part of the address, http:// identifies the link is a WWW document which should be transferred using HTTP (hypertext transfer protocol).

Next, www.cscan.org is the server DNS (domain name system).

Finally, presentations/08-11-06-MikeEvans-Web.pdf is the file path to where the document is.

In the lab part of the session (sadly only for an hour this week) we looked at html and had a go at creating web pages. Unfortunately my work is not yet available as a web page, but never fear! There are some examples of html coding (with some explanation of what they do) below, but first I feel I should give a little introduction to explain what all the brackets are about!

Html is made up of tags, which look something like this: . Note the difference between these tags - the one on the left opens and the one on the right closes whatever the instruction is. In this example, makes the text strong, which is usually (but not always) bold in the browser. Most html tags require closing, so a recommended way to ensure you do this is to open and close when typing, then go back and put in the stuff in the middle. So, let's look at what all the tags do:

<html> - this specifies where the html code begins and ends
<head> - this contains information which is not displayed, and is useful for metadata
<title> - a fairly self explanatory one, this changes what you see at the top of your browser window
<body> - this begins and ends all the stuff in the web page

Adding links and images is slightly more fancy. The best way to demonstrate this, I think, is to show you what it will look like as well. Isn't that fancy? So to start with, let's do a link. The plan is to link to a BBC news article about pandas. But the link itself isn't very visually appealing: http://www.bbc.co.uk/news/world-12160538

To solve this problem, I'm going to put a link behind the words of the title, in this case 'Why do we love pandas?' So, here we are:

Why do we love pandas?

But how did I do that? Well, it's not too complicated. All you need is your link and the html know-how. It looks like this:

<a href="http://www.bbc.co.uk/news/world-12160538"> Why do we love pandas?>

As you can see, the web link is the anchor <a> referenced href in quotation marks "". The text I want to show is at the end.

What about an image? There are two choices - you can get one off the web or you can get one locally. I'm going to post a picture of a smiley chair which I've borrowed from here: http://pleated-jeans.com/wp-content/uploads/2011/09/thecomicrelief.tumblr.png

In order to do this, I'm going to use the following html:

<img src="http://pleated-jeans.com/wp-content/uploads/2011/09/thecomicrelief.tumblr.png" width="200" height="170" alt="Image of a smiley chair">

No doubt you've spotted the address, and I'm sure you can use your detective skills to identify what 'width' and 'height' are for. But what is alt for? It's what will come up if the picture doesn't load for any reason. It's also handy for software which reads the web page for you out loud. So without further ado, here is the smiley chair:

Image of a smiley chair

But maybe you're interested in formatting your web page and making it shiny? Of course you are. There's more than one way to do this. You can apply html coding in the web page code itself, for example you can use and (emphasis). Using bold and italic are frowned upon but 'proper' web designers, as strong and emphasis often come up this way, but will also be accessible to other users, for example those with visual impairments.

Another way of formatting your web page is to use CSS (Cascading Style Sheets) but, I'm afraid that will have to wait for another blog. I had every intention of getting into the habit of writing 1000 words or less as practice for my coursework, but unfortunately I seem to have been unsuccessful (1,288 words and counting). So for now, I will bid you farewell. In my next blog I hope we can pick up on CSS, but I haven't planned that far ahead, and I may delve into some other Information Science topics before then, we will have to wait and see!

¹Leiner, B.M., Cerf, V.G., Clark, D.D., Kahn, R.E., Kleinrock, L., Lynch, D.C., Postel, J., Roberts, L.G., Wolff, S. A Brief History of the Internet. [online] Accessed 4 October 2011

²Waldrop, M. DARPA and the Internet Revolution. [online] Accessed 4 October 2011

³Berners-Lee, T. (1999) Weaving the Web : the origins and future of the World Wide Web. London: Orion Business, p. 5

Monday, 26 September 2011

And so it begins...

Welcome to my very first blog post! It's an exciting time.

Why am I blogging now? I'm back at university studying for an MSc in Information Science, and part of the coursework involves blogging. So it seemed wise to get a bit of practice in. Who knows, maybe I will keep it up!

Our lecturer recommended we get in the habit of blogging what we learned this week. So, what have I learned in my first lecture?

An Introduction to Computers and Digital Information

The aim of the first lecture was to 'introduce the nature and potential of computer and digital information represented at a number of levels and to provide practical experience of using and managing digital data represented in a number of forms and formats.'

To begin with, we discussed the impact of information. After all, we live in an 'information society', and terms are thrown around in popular culture and the news. So it's important to try and figure out what everybody is talking about. Especially when we consider how much information is 'out there', just on the web. Up to 5 exabytes (that's 1 million, million, million bytes if you're curious) were generated in a year in 2003 (Lyman et al, 2003). And in digital terms, 2003 was a long time ago, so this is only increasing. As an information scientist (in the making) it is vital to understand the technology which helps us to organise, manipulate and use data.

Studying at City University, I will be learning about the 'information' part of IT. So I'll be looking at computers and software as tools. Furthermore, I will be looking at architectures not just individual machines and programs. The analogy we were given is comparing a bricklayer to an architect - the information scientist (the 'architect') needs a wider understanding of the impact of IT on information (whereas computer scientists are the 'bricklayers' building the programs).

Looking at bits, bytes and so on was a nice recap from Computer Science A level (which feels like a long time ago, but apparently the knowledge is all still in there somewhere). In our lecture we looked at bits (binary digits) and formats (the agreed structures like ANSI/ASCII). Fortunately I won't have to add up in binary like I did at A level!

As well as formats we considered files - named collections of related digital information. Every file name has meaning attached to it, 'telling' the computer which program to use to read the file. A file-centred approach considers files as single entities with extensions to show the format. Alternatively, a document-centred approach is when documents are built from files, for example a blog post made up from text, video, images, etc. More on this next week!

All in all an enjoyable lecture! And hopefully an enjoyable first blog. Perhaps I will have to put my serious face on in future as I will be throwing in references to related reading. But for now, here is a picture of the sunshine I drew in paint.