Showing posts with label City University. Show all posts
Showing posts with label City University. Show all posts

Thursday, 13 October 2011

Databases and SQL

Monday's lecture was about databases, in particular looking at SQL (pronounced either as the letters, or as 'sequel' depending on how you feel about it). We had an introduction to databases to bring everybody up to speed - I was surprised how much I remembered from GCSE IT. My teacher would have been proud. We discussed when a database is a good idea, and when it's not such a good idea. A rule of thumb seems to be that it's a good plan when it's your data as opposed to someone else's. For that it might be a better idea to think about information retrieval (that's next week's lecture!).

It is a good idea to have a centrally managed and designed database. Problems of the past still haunt our offices today - it's now so easy to make a database a lot of people still use local ones rather than having a nicely linked up central database. This causes all manner of problems due to inaccurate data (not to mention legal issues like Data Protection!). Instead, organisations should have their data stored in one place, with each department accessing only the data that is relevant to them. The Database Management System (DBMS) controls access to the data by various departments / people.

So to be clear, the DBMS is the software which lets the users get at the data, whereas a database, to quote my course handout, is an 'integrated collection of data shareable between users and application systems'. So it's the stuff behind the scenes rather than the shiny interface the users get to see. Using a DBMS is advantageous as it assures information integrity (insofar as it gets updated everywhere when it's updated once). It also removes the reliance on a certain program as it is a collection of data which can be used by more than one application.

So what should it look like? Perhaps the way to explain this best is to say what you don't want. You don't want duplication within a table, and the best way to make sure this doesn't happen is to look at relationships and break up your data into different tables. I'll give you an example. Let's say we have some form of music database, with song titles, album names, artists and release date. So you'd have a table that looks like this:


Song_Title Album_Name Artist Release_Year Genre
Don't Marry Her Blue is the Colour The Beautiful South 1996 Pop
Perfect 10 Quench Beautiful South 1998 Pop
Be Here to Love Me Feels Like Home Norah Jones 2004 Chill Out
Sunrise Feels Like Home Jones, Norah 2004 Chillout
Never Gonna Give You Up Greatest Hits Risk Astley 2002 80s

There's a few things to notice about this table. Firstly, did you notice how I put underscores in the titles? This is because it's generally not a good idea to put space in your titles - lots of programs don't like it. Just because Microsoft Access lets you do it, it doesn't mean it's a good idea! So use underscores, make it all one word, remember the user doesn't see this stuff.

This is probably a good time to mention primary keys. Did you notice that there isn't any one field that you could use to uniquely identify a song? Every table needs one of these, and the good news is most DBMSs can create these for you. Primary keys are important when it comes to making separate tables, which I will come to in a minute!

So back to the music, as it were.  There are lots of duplications, for example the album name, the artist, the release date and the genre! This can be fixed by dividing the data up into the following tables:


Did you also notice the errors? The Beautiful South perform two of these songs but the band name is different in two rows. This is why having duplication in one table can be such a bad idea, and this is why we need more than one table.

Song

Song_ID Song_Title Album_ID
1 Don't Marry Her 1
2 Perfect 10 2
3 Be Here to Love Me 3
4 Sunrise 3
5 Never Gonna Give You Up 4

Album


Album_ID Album_Name Artist_IDRelease_Year
1 Blue is the Colour 1 1996
2 Quench 1 1998
3 Feels Like Home 2 2004
4 Greatest Hits 4 2002
5 Nobody's Prefect 5 2001

Artist

Artist_ID Artist_Name Genre_ID
1 The Beautiful South 1
2 Norah Jones 3
3 Jimmy Eat World2
4 Rick Astley5
5 The Cheapskates 4

Genre

Genre_IDGenre
1Pop
2Rock
3Chill Out
4Ska
580s

This removes the duplication issue. Each table has a primary key, or unique identifier. In some of the tables, a primary key from a different table is in a column to match tables together. For example, in the table 'Artist' there is a column for 'Genre_ID'. When an identifier is in another table, it is known as a foreign key.

Now to look at some SQL and query our database! This is the most important bit. When searching you want to get out helpful data, and that means getting what you want and not getting what you don't want. It sounds obvious, but people often sift through lots of rubbish. Think about searching Google - it's a great tool but if you don't put in a sensible search you can get a lot of irrelevant results. If you don't believe me, try searching 'tablet' - you want to see what you could get instead of an iPad, but you end up with lots of things you didn't want, don't you?

Anyway, before I get sidetracked and start thinking about iPads, let's get back to SQL. To search, the basic syntax you need looks like this:

SELECT fields
FROM tables;

So if we were searching the database above for an album and its release date, we'd need:

SELECT Album_Name, Release_Year
FROM Album;

Which would give you:


Album_NameRelease_Year
Blue is the Colour1996
Quench1998
Feels Like Home2004
Greatest Hits2002
Nobody's Prefect2001

When using select, there is no limit to the number of fields you can put in there. However, if you want everything then you can use an asterisk:

SELECT *
FROM Artist;

Which would give you:


Artist_IDArtist_NameGenre_ID
1The Beautiful South1
2Norah Jones3
3Jimmy Eat World2
4Rick Astley5
5The Cheapskates4



This is good if you want to see everything, but realistically an end user doesn't want to see the foreign keys in this table, they want the data, and this is where you need to start making your queries a bit more interesting. It isn't as obvious as it first looks, which can cause people some problems. If you wanted the genre of all the albums, you might this of tying in:

SELECT Artist_Name, Genre,
FROM Artist, Genre;

It seems logical, but this would actually give you a table with everything in it - every artist next to every genre. This is because computers aren't smart enough to know that you actually want the relevant Genre. It just pulls everything out of the database and dumps it in front of you. Instead, you need to tell it what you want by typing in this:


SELECT Artist_Name, Genre,
FROM Artist, Genre
WHERE Artist.Genre_ID = Genre.Genre_ID;

This will bring back a list where the Genre ID matches on both tables, and you get your answer. There's some new syntax in this code. There is a comparison operator '=' in the WHERE clause. There are a few of these, including (but not limited to)

= equal to
< less than
> greater than
<= less than or equal to
>= greater than or equal to
!= not equal to

The important thing to know about using the WHERE clause is that numbers do not require quotation marks, but any other characters do. So if you are looking for a date you would only type in:

WHERE Release_Year = 1998

If you are looking for text then you need quotation marks:

WHERE Song_Title = 'Sunrise'

Another thing to be aware of when using the WHERE clause is a wildcard search. This means you can truncate a word, so if you know that 'Nobody's Prefect' is in there somewhere, but you aren't sure if the database uses the apostrophe or not, you could search for this:

WHERE Album_Name = 'Nobody%'

This brings you everything which has Nobody at the beginning. You can use the % at the beginning and end if you like, so for song titles with the letter P in them you could search for '%p%' (although I'm not sure why you would need to!).

Finally, you can use WHERE more than once, for example you want songs from a certain artist from a certain year. You want to know if there is anything in the database by The Beautiful South after 1997. The syntax would be:

SELECT Artist_Name, Album_Name, Release_Year
FROM Album, Artist
WHERE Artist.Artist_ID = Album.Artist_ID
AND Artist_Name = 'The Beautiful South'
AND Release_Year >= 1997;

Did you notice how some of the fields say the table they are in and some don't? This is when the same field is in more than one table (remember foreign keys?) You need to define where you want the data from otherwise you will confuse the computer and it will probably tell you it can't do it.

So to conclude querying two tables, the syntax generally looks something like:

SELECT (whatever fields you want)
FROM tableA, tableB
WHERE tableA.primarykey = tableB.foreignkey;

If you forget to do this then you will probably get a lot more results than you had counted on, so be careful!

A final note on results; if you get results where the same thing appears on more than one row (this is possible even with neat data tables) then you can make sure it only appears once. For example, if you wanted to know which albums were released before 2000, you would get The Beautiful South twice, as both their albums were release before 2000. To prevent this from happening, you can add in 'distinct', which means it will only list it once, whether the results bring them back twice, or two thousand times! So the syntax would be:

SELECT DISTINCT Artist_Name
FROM Album, Artist
WHERE Artist.Artist_ID = Album.Artist_ID
AND Release_Year < 2000;

Apologies for the length of this post - it's all those tables! I have tried to present this clearly, but the best way to understand using SQL is to have a go. When doing this in the lab myself I got very cross about adding in commas - they only go in between items not at the end (so above you would not put FROM Album, Artist, because there is a comma at the end and the computer doesn't like it).

Monday, 10 October 2011

Cascading Style Sheets

Cascading style sheets, or CSS, can format document content using markup languages, such as html script (see my previous post for a brief introduction to html). The idea of CSS is to separate the document content and document format. This is advantageous as it means that the coding for the design doesn’t have to be mixed up with the content of a web page. It is also useful as you can standardise a selection of web pages by using the same style sheet, or even apply your own formatting to existing web pages. This would be useful should you consistently need text to be larger, for example.

The W3 web page has a helpful introduction to CSS here. It explains the three different ways of adding style information to documents:
  • Linking Style Sheet
  • Internal Style Sheet
  • Inline Style Sheet1
The differences between these three ways of applying style sheets is where the coding is ‘kept’. Internal and Inline style sheets are both within the same page – Internal is within the head of the document using the tag <style>; Inline is directly within the html document. Linking Style Sheets are completely separate documents, which would make them the best candidates for standardising web pages (as mentioned above). This means you can edit the formatting across a number of documents, so if you were in a corporate environment and your logo changed you could apply it universally rather than page by page. This has been described as 'brand cohesiveness' (Nielsen, 1997)2.

W3 has some guidelines on using CSS, including a list of terminology. The recommendations define that CSS1 is 'human readable and writable, and expresses style in common desktop publishing terminology'3. For example, if you wanted to apply a certain colour to a title or paragraph, you would use the property 'color' (note the American spelling!). This is easy for a human to understand. The value would be written in hexadecimal (perhaps less easy for a human to understand) which looks something like this: #b22222 (this is 'frebrick' red). For a full list of colours on the w3 website, click here.

To explain what I mean by 'property' and 'value' I have borrowed a helpful diagram from the W3 wiki:

CSS Syntax

The selector is the thing you are styling, for example p would be a paragraph.
The property name/property is the format item, so 'color' would be colour, 'font-size' would be font size, etc.
The value is what you are assigning to the property. For font size it is recommended that you use a percentage. This means that it will take the font size that a user normally uses, and make it larger/smaller, for example font-size:80%

So an example would be that you would like the body of the text to be in italic. That would look like this:

BODY {font-style: italic}

The best way really to demonstrate all of this would be with an actual style sheet. I have managed to get a web site working on the City server. I have applied a style sheet which I adapted from a guide sheet we were provided with in the lab session. My website is here. The CSS for the page is here. The bit of code that tells my web page to use the CSS is in the <head> and it looks like this:

<LINK REL="stylesheet" HREF="http://www.student.city.ac.uk/~abkb846/public_html/kaystyle.css" TYPE="text/css">

Using CSS certainly seems useful, as you can apply formatting rules quit easily without getting mixed up in the main information. Blogspot, for examples, does not seem to do it this way - if you change a font it makes the html really messy (if you don't believe me have a look at the source code. All I did was change the font!) It's certainly something I will be trying to learn more about and have a go at!

1What is CSS? CSS Training, W3 Wiki [online] accessed 9 October 2011
2Nielsen, J. Effective Use of Style Sheets, Use It [online] accessed 9 October 2011
3Cascading Style Sheets, level 1, W3C Recommendation 17 Dec 1996, revised 11 Apr 2008 [online] accessed 9 October 2011

Tuesday, 4 October 2011

DITA week 2: The Internet, html and the WWW

Hello bloggees! I have no idea if that is an actual term, but I have decided to use it anyway. If you have a better suggestion as to how I should address you, please comment below as I am still learning netiquette. As we all are, as it is ever developing. Maybe they should start a finishing school for the net...


I digress. So, to get back on topic: today's DITA lecture was about the Internet and the World Wide Web, with a focus on html in the lab. Please note, that while many people use 'the Internet' and 'the World Wide Web' as interchangeable terms, I will not be doing so. I will explain why below. I will then go on to explain what we did in the lab, including some examples, which I hope will be suitably snazzy.


Anyone who has studied a whirlwind history of information or computer science will know the origins on the Internet, and of the World Wide Web. However, I feel it would be beneficial to assume that my readers are new to the topic. So if you don't want to read about the history  and background then you can skip ahead to the html bit here.


The Internet dates back to the 1960s, where it was developed as a way of sharing information via a network of networks. According to ISOC, in August 1962 J.C.R. Licklider 'envisioned a globally interconnected set of computers through which everyone could quickly access data', his "Galactic Network" concept.1 He worked at DARPA (Defense Advanced Research Projects Agency) at the time, so the Internet actually had military origins. He was inspired by 'Project Lincoln', an early warning network to guard against Soviet nuclear bomb attack in the 1950s. The project eventually resulted in SAGE, Semi-Automatic Ground Environment, which included 'the world’s first long-distance network, which allowed the computers to transfer data'.2 I won't go into huge detail here because I appreciate that this blog will get very long (I could probably dedicate an entire blog to the history of the Internet!), but if you are interested the DARPA site is interesting and I have put a link in my references at the bottom of the blog. So, to cut a long story short, a networking project was launched, the concept of sending packets was dreamed up, an architectural description of a network of networks was put together in the 1970s by Kahn and Cerf, people started picking up on it, and it was deemed a beautiful thing.


The World Wide Web, by contrast, has 'only' been around since the 1990s (although Tim Berners-Lee started working on the idea in the 1980s). In his book, Weaving the Web, Berners-Lee describes his vision of the Web as 'a universal, all-encompassing space'.3 It was originally designed as a directory and method for information sharing at CERN. It was soon picked up by academics, and then spread until it was in common usage, as it is today.


To make it really clear I will borrow an analogy our lecturer shared with us. The Internet is the road, and the World Wide Web is like a car 'travelling' on it. There are other 'vehicles', the most common example being email.


1Leiner, B.M., Cerf, V.G., Clark, D.D., Kahn, R.E., Kleinrock, L., Lynch, D.C., Postel, J., Roberts, L.G., Wolff, S. A Brief History of the Internet. [online] Accessed 4 October 2011

2Waldrop, M. DARPA and the Internet Revolution. [online] Accessed 4 October 2011

3Berners-Lee, T. (1999) Weaving the Web : the origins and future of the World Wide Web. London: Orion Business, p. 5

Monday, 26 September 2011

And so it begins...

Welcome to my very first blog post! It's an exciting time.

Why am I blogging now? I'm back at university studying for an MSc in Information Science, and part of the coursework involves blogging. So it seemed wise to get a bit of practice in. Who knows, maybe I will keep it up!

Our lecturer recommended we get in the habit of blogging what we learned this week. So, what have I learned in my first lecture? 

An Introduction to Computers and Digital Information

The aim of the first lecture was to 'introduce the nature and potential of computer and digital information represented at a number of levels and to provide practical experience of using and managing digital data represented in a number of forms and formats.'

To begin with, we discussed the impact of information. After all, we live in an 'information society', and terms are thrown around in popular culture and the news. So it's important to try and figure out what everybody is talking about. Especially when we consider how much information is 'out there', just on the web. Up to 5 exabytes (that's 1 million, million, million bytes if you're curious) were generated in a year in 2003 (Lyman et al, 2003). And in digital terms, 2003 was a long time ago, so this is only increasing. As an information scientist (in the making) it is vital to understand the technology which helps us to organise, manipulate and use data.

Studying at City University, I will be learning about the 'information' part of IT. So I'll be looking at computers and software as tools. Furthermore, I will be looking at architectures not just individual machines and programs. The analogy we were given is comparing a bricklayer to an architect - the information scientist (the 'architect') needs a wider understanding of the impact of IT on information (whereas computer scientists are the 'bricklayers' building the programs).

Looking at bits, bytes and so on was a nice recap from Computer Science A level (which feels like a long time ago, but apparently the knowledge is all still in there somewhere). In our lecture we looked at bits (binary digits) and formats (the agreed structures like ANSI/ASCII). Fortunately I won't have to add up in binary like I did at A level!

As well as formats we considered files - named collections of related digital information. Every file name has meaning attached to it, 'telling' the computer which program to use to read the file. A file-centred approach considers files as single entities with extensions to show the format. Alternatively, a document-centred approach is when documents are built from files, for example a blog post made up from text, video, images, etc. More on this next week!

All in all an enjoyable lecture! And hopefully an enjoyable first blog. Perhaps I will have to put my serious face on in future as I will be throwing in references to related reading. But for now, here is a picture of the sunshine I drew in paint.