Feedback-post

Well, here is the last post you will be seeing in this blog.

First of all, I want to point out that it’s a tad strange that we’re supposed to give feedback on the course in our blogs. This is due to the fact that it cannot be given anonymously, which of course might lead to somewhat dishonest feedback. Since we are all human, and don’t want it to affect our grades in a negative way. Anyway, I’ll try to be as honest as possible.

First of all, I had quite high expectations of the course when I first walked into the classroom. They were not, to be honest, really met. I was expecting more on how to use digital material, what kind of it should be used and some philosophizing on it’s impact on the humanities. We got some of this, I’m not denying that, but to me, personally, it was left in the shadow of talk about technical details about how the Internet and resources on it works. Obviously some information about this would be necessary, but it needn’t be this much of it. If we were trying to understand the technical details of each and every part of a website and the resources found on it, we would have started studying something entirely different. And I believe that most of us have a rudimentary understanding of how things work anyway, at least I do.

Anyhow, what was good about the course was, firstly that there was some musing on the fact that academic research and the sources and resources for it is changing. Showing us, the students that is, what kinds of resources the Internet holds for research will probably prove to have been quite useful. Some of the resources shown to us during the course I had no idea existed. Secondly, discussions on SEO and building your own internet profile, and the benefits and importance of these, was a very good thing to be included in the program for the course.

It must be stated that I was very much against the idea of blogging as an assignment on a course in the beginning, but I’ve since come to the conclusion that I’d much rather do that, than write a “lecture diary” or whatever it should be called. And it’s been done mostly in a timely fashion too, which wouldn’t have been the case with a “lecture diary”, which would have been written the night before the deadline. Added to that, it has been quite effective considering the nature of the course and the assignments, since they have had be done on the computer anyway. And writing in english for the first time in ages has also been quite rewarding, since it has served to keep me in touch with the language.

Anyhow, it’s been interesting. Live long and prosper!
/T

Search engine optimization

Extra assignment, since I missed a lecture again. This will be somewhat long and tedious, not to mention technical, so if you want to read the short version, it can be found in my earlier post. Anyway, here goes.

According to Wikipedia, Search engine optimization (SEO) is “the process of improving the visibility of a website or a web page in search engines’ “natural,” or un-paid (“organic” or “algorithmic”), search results”. This is a good place to start the analysis, even though it’s a quite simplified statement. What this actually means, is that SEO is meant to result in one’s website being “higher up”, or as early as possible, in the list of pages a search engine comes up with as a result of a search made on a certain word or phrase. In general, the higher up a site is on the list, the more visitors it gets, which is obviously important for the creator of the site. Thus, SEO is important for anyone who wants to be seen on the Internet, or wants to control what is seen. SEO is in effect an Internet marketing strategy, which takes into account how search engines work, what people search for and with what words and phrases, and which search engines are preferred by the target audience.

A short(ish) history of SEO follows.

SEO started in the 1990’s, when webmasters and content providers started optimizing the content of their websites to give more, both in relevance and amount, results on the search engines of the day. These were primarily Altavista (presently owned by Yahoo) and Infoseek (which presently doesn’t exist). The algorithms controlling these search engines were quite primitive when comparing to the search engines of today, basing the search results on, early on webmaster provided, metadata about the site in question, primarily keywords. Keywords were also at some point taken from the content of the pages, but this was easily exploited, as the webmasters were able to fill the code of a page with the keywords they wanted to be found with.

Then we arrive at the point, where one usually arrives when discussing search engines and which seems like a cataclysmic event, where the creators of Google created a more complex algorithm in 1998. This algorithm was no longer based solely on metadata provided by the page, but also on other factors, the most important of which was inbound links between sites. They called the number calculated by the algorithm “PageRank“. PageRank made it harder for webmasters to manipulate the search engines, but true nerds as they were, they came up with a way to do it eventually.

Alas, the companies behind the largest search engines were also stuffed with true nerds. By 2004, the largest search engines had developed systems of ranking websites that used hundreds of different signals, and in 2007 Google used more than 200 signals for a site. Nowadays the search engine providers don’t disclose how the algorithms work, which, added to their complexity, has made abusing them much much harder. It still happens though, since some parts of how the algorithms work are known to webmasters.

Back to the actual subject

The point of this post is, however, not to say that SEO is per se wrong. Abusing the search engines on the other hand is. SEO is completely natural, and when done “ethically” helps Internet users find what they are actually looking for.

How does one then get one’s site noticed? How to optimize the search results?

If one is a company and money is not an issue, the largest search engines (Google, Bing, Yahoo!…) provide a paid service, which ensures that one’s site is found on the search engine on certain keywords. This does however not ensure any particular rank for the site. Ranking is based on, among other things, the interlinkedness of the page in question, and how often it is visited. Pages ranked high among the search results usually include Wikipedia, large companies, official institutions, blogs, twitter and such. Obviously Facebook. Thus, if one wants to get noticed on the Internet, one should google some other names, and check which sites pop up, and start using these. They will of course be Twitter, Facebook, LinkedIn, some much used blog site etc. Cross linking between pages on one’s own site also affects the ranking of a certain page.

White hats and black hats

There are two types of SEO, as classified by industry commentators. White hat and black hat. White hat SEO is essentially methods of optimization which conform to the guidelines and involves no deception. The latter is a very important point, since search engine guidelines are not all encompassing. No deception means that the content which the search engine ranks is the same content which the user sees. White hat SEO is therefore considered more about making content available to users, than tricking search engines, and is the preferred way to market oneself. White hat SEO also does not result in bans from search engines.

Black hat SEO on the other hand is the very opposite. It aims at improving ranking by methods disapproved by the search engines, or involves deception. Some black hat methods include using hidden or invisible text and giving different sites for a human user and a search engine. Black hat SEO may result in the lowering of a site’s rank, or the removal altogether of the site from a search engine’s list of results.

Conclusions and endnotes

Even though every aspect of SEO wasn’t discussed here, I believe I made a pretty good overview of what it is all about. Of course, this focused more on the technical and more businesslike aspects of the phenomenon than private ones. All the same, the point is, one should use SEO, but think of how one uses it and to what ends. One would do well to remember that SEO is very important for a private person as well, not only for companies, since it may affect future employment, the success of an academic career and a myriad of other things. So, to sum up, keep in mind that everything is not OK on the Internet, but everything stays there!

That might be enough for this time, bye!
/T

Internet profiles

Alas, I was once again unable to attend the lecture, which is a shame since it was the last one on the course. Nonetheless, writing on the subjects discussed is obviously required. The actual point with the lecture, I gather, was more or less how to optimize results people might find when googling (or from other kinds of web searches) one’s name. More on Search engine optimization here.

Googling people turns up very diverse results, ranging from very professional and, for the person in question, beneficial results, to quite embarrassing things. And of course there are those people who don’t turn up at all, or so far back in the listed results as to not be generally found. I seem to be one of those persons, at least I didn’t find anything about myself on the first page of results Google came up with on my name. Anyway, that’s beside the point. The point is, it can make a huge difference what kind of results come from googling a person. It may, for an example, have an impact on possible employers. In theory, as far as I know, it’s illegal to google a possible employee when he or she applies for a job, and at the very least it’s illegal to let a possible search affect the outcome of the application. This is of course all theoretical, since it can’t be proven that a possible employer has googled each and every one of the applicants for a job. Thus, it’s common practise.

Since it has an effect on one’s future career and what not, it’s quite vital to keep up a “good” internet profile. If one doesn’t make an effort to control what the internet tells about oneself, someone else will make sure it tells something. And this someone doesn’t necessarily affect one’s internet profile in a positive way, more likely what turns up is more or less embarrassing. Because of this, it’s important to know what kinds of pages Google ranks highly for searches, so one can affect the search results of one’s name. Some of these pages would be: Twitter, LinkedIn and various blog sites. Also personal websites which are actually visited by people, and contain one’s contact information usually come up high on the search results. Not to mention Facebook. This is one of the myriad reasons why one should keep one’s Facebook account as closed as possible, so that not everyone can see the embarrassing pictures of oneself from 4 AM outside a bar with a cigaret in one’s mouth, zigzagging along the pavement towards a taxi. Not a very good impression on a possible employer.

Another subject which was apparently discussed during the lecture, is how one should write on the internet. To make a long story short, one should remember to: 1. write short texts, 2. in a language people understand, 3. get straight to the point, 4. make up good headlines (for googling etc.), and last but not least 5. write on subjects which interest people! Also, advertising oneself to people is extremely important, since nobody will find your blog or whatever if someone else hasn’t read it already.

In any case, that’s pretty much that. Over and out.
/T

Textmining and Babylon

Here again, possibly for the last time? Not promising anything though, we might still get some “homework”.

Anyhow, last time we discussed, among other things, textmining and academic publishing. What we mostly talked about was the problems in these things. Textmining is pretty much the analyzing of text, using computers to calculate, among other things, patterns and trends in what words are used, how sentences are structured and so on. This is quite an interesting tool for the study of languages, but it has a few downsides to it. First and foremost, it’s computers after all, which means that they operate using information input by a human being, and can’t think of anything new by themselves. Thus, some information is bound to get lost, for example the computer won’t be able to put words and sentences in their context, but will only analyze them as such.

Another problem with textmining, and the google books tool Ngram viewer (for a result on a search of the word “kill”, click here), is that the farther one goes back in time, the less representative of the general population is the produced literature. In the 19th century, for an example, the publishing of books was very much reserved to the intellectual elite of society. Thus, the language used by these writers can not be seen to represent the actual language of the time. Studies of how a certain language functioned can then not be based (at least not entirely) on results gained from this kind of work.

The third major problem with textmining would obviously be more linked to the present time. Think copyrights. The Google books project (which the Ngram viewer bases its information on) has been unable to digitize a huge amount of contemporary literature, because it is protected by copyrights. Thus, the results one might get from textmining, using the Ngram viewer, is not even representative of contemporary literature.

Conclusion about textmining and the Ngram viewer: Fun to play around with, and may give indications for what one could find if one did actual research on a subject. But must not be seen as a tool which can, in and by itself, give credible results.

This is a good example of digital tools designed to help in academic research. None of them should be used without critical thought about how they work, what they actually do, and what the implications of this are. The results gained from using tools like these must be analyzed by a human being, who is aware of the problems and limitations within all such tools. Otherwise the information gained and the results of the research will be distorted, if not outright false!

And a short note about the other subject already touched here, copyrights. When publishing, think about what sort of copyrights you reserve for yourself, what rights you give for usage of your material, and remember that publishers are really not watching out for your interests as a writer. They want to make money, period. Babylon.

/T

Tools on the internet

Some tools and how they may be used in the humanitary sciences:

Textdiff: Useful stuff to check out what has been changed in a document between versions. Not much else I see as useful in this tool. Since the same thing can be done in MsOffice, I don’t see the point in using this if one owns Office.

Dipity: Can be used to create a timeline for oneself, or a project perhaps. Somewhat useful in marketing yourself and showing other people what you have done and when, but otherwise quite pointless. Especially with facebook nowadays having the timeline.

Wordie: A fun little tool for creating your own booklet covers for example. Could also be used for homepages or the like. More fun than useful though.

Wraggelabs emporium: This actually seems worthwhile using, if one is interested in the stuff they have concentrated on. Many interesting tools there, some of which seem useful. Though the site and the tools are mainly developed for the australian historian and researcher.

So as one can see, there are quite many different types of tools available out there. Some of these tools are more useful than others, but this of course depends on who is using them and for what end. Personally I find most of these not very useful, but in time some of them will most likely expand and become something more than they are now. While waiting for that…

And my sincerest apologies for this post being a day late.

/T

Digital archives vs. books

Booyah!

Last lecture, we were divided into pairs and small groups, each given a specific Internet database to do some quick research on. We checked out who had made them, when, why… All the usual stuff. And on top of that we gave some though as to how we would refer to a resource on the site, how long the site is likely to last and such questions.

My group got Stockholmskällan as our subject, and we concluded that the site will most likely last for quite some time, since it’s based on official instances and their archives. We also pondered how we would refer to a resource on the site, and concluded that the guidelines for usage of the Internet as a source are still somewhat mixed up and don’t really reflect the reality.

The question of how long Internet sites last is an interesting one though. Most of us can probably remember a number of sites which don’t exist anymore, whether we know it or not. And there is of course the related question of how long a site gets updated. Some sites out there might still exist but haven’t been updated for the last, say, 5 years. “Michele” writes about this in her Blog about crafts in her post from 2009. She comes to the conclusion that 50% of the websites she has linked to are still out there. Most likely a higher percentage than average, she says, because she’s quite picky about her sites. These sites were mostly linked to in 2006-2007.

An interesting comparison one might make is books. If, while reading Book A, you find a footnote to Book B, the probability that Book B still exists and is attainable is quite much higher. I would even go as far as to say (without any empirical data, so this is just an assumption) that it’s almost granted that you will be able to find the book in a library or bookstore out there.

This is, of course, completely natural. Books are physically printed, and once it’s done, they exist until someone destroys them. Internet sites on the other hand exist only on the Internet, once they are no longer maintained, they disappear. If one wants to store a website or the information on it after it’s no longer maintained, it’s a huge project. Books are simply out there once they are printed, not much effort is needed to preserve them for an amount of time. Books are also generally more highly regarded, which has much to do with the history of writing and reading. But that is a subject best left untouched here, or this post will never end.

Books constitute capital. A library book lasts as long as a house, for hundreds of years. It is not, then, an article of mere consumption but fairly of capital, and often in the case of professional men, setting out in life, it is their only capital.

Thomas Jefferson

/T

Finding literature in practice

Extra assignment: Look for literature on the subject which you were opposing in the proseminar. Mine concerned the Swedish People’s Party in Finland (SFP) and its connections to the church. We were told to search for literature by a websearch, databases on the Internet, and by actually going to the library and physically looking.

WEBSEARCH

Googling pretty much came up with wikipedia sites for the SFP and the church. Also some contemporary news articles. A couple of books also turned up on amazon when I googled around.

DATABASES

Searching on HELKA turned out to be quite effective, gave me pretty much the same books which were used in the proseminar I was opposing. ARTO was somewhat helpful, but either turned up gazillions of results or very very few. Some could have been useful though. Nelli was simply… frustrating. I’ve hated Nelli since I first used it though.

PHYSICAL WALKING TO LIBRARY

I went to the National Library (NL) to look for literature, but had to conclude that I either had to ask the librarian, or use hours upon hours before finding anything useful. So I asked, and the librarian told me where I should look if I wanted to find something I could use. Even though I now knew which shelves I might find something in, browsing books was somewhat frustrating, since most of it was extremely useless to me. Finally I found some literature that could have been useful, but it was pretty much the same stuff I had found on HELKA in 2 minutes.

CONCLUSION

Googling is a good way to get a basic view of a subject, but won’t give much deeper stuff. Unless you use Google books or Google scholar maybe. Then you might find something actually useful. The library databases are usually a good way to find something, if you know what you’re looking for, and often pay off. Walking to the library is a slow way to find something, unless you can tell the librarian precisely what you’re looking for. But you will have to walk there eventually anyway! Haha!

/T

Search engines and digital natives

The internet is the first thing that humanity has built that humanity doesn’t understand

Eric Schmidt

Last Monday we discussed the usage of search engines for academic research, namely how they should be used for the best results and how they differ from each other. I believe the quote of Schmidt you can see above can also be used on this topic. People of our generation are often called “digital natives” but still, even when we are assumed to nearly instinctively know how to use resources on the Internet, we often find ourselves clueless.

An example of the aforementioned phenomenon is how students use search engines when looking for sources for writing papers and doing research, which Steve Kolowich writes about in his blogpost What students don’t know. He tells us about a study concerning the habits of students at the University of Illinois. They don’t really know how to use search engines, but are expected to. Because of this, nobody really teaches them how to use them, and many of them don’t know they can go and ask a librarian, or they feel stupid if they go and ask for help. Which is kind of a sad phenomenon.

In some respects I can identify with the problems these students have. Now and then I get extremely frustrated when I don’t find literature or sources concerning a subject I’m supposed to study. It’s however in no way unusual that when I complain to somebody about it, they find the stuff I need in an instant. Which makes it, if possible, even more frustrating when I realize I just didn’t have the necessary skills for finding a certain book (or whatever it might be). Every time this happens, I learn something new though, and one day I will probably be proficient at using the library and archive search engines. Every time I need to ask a librarian, I too feel stupid, and inside my head I can see the librarian looking at me like “how stupid are you actually?”. Which is silly of course, they are actually there partly to assist us.

I have to disagree with people who think our generation is dumber than earlier ones because we are born in the Internet age. Nicholas Carr writes in his article Is Google Making Us Stupid? that his way of thinking has changed since he has been using the Internet more and more for reading, and that he is now unable to concentrate on reading long texts. He also assumes this to be the case with the Internet generation. I, for one, don’t find it hard to concentrate on reading a book (unless it’s boring).

Carr seems to think that changing thought patterns (which he believes are caused by use of the Internet) is a bad thing, and a sign of people getting stupid. This is quite an assumption to make. Especially since he seems to have no empirical data whatsoever to support his claim. I personally think that he’s just one of those people who are afraid of change, and take a hostile stance to everything new that they don’t understand.

To sum up, I still find myself learning how to use search engines and databases. Quite often I find myself failing when using these. BUT! This is no reason to think they are, in themselves, harmful to people. We merely need to learn how to use them, and spread that knowledge to younger people, not assuming they instinctively know how to use everything that can be found on the Internet. Neither should failure at using a tool discourage us from trying again!

Success consists of going from failure to failure without loss of enthusiasm.

Winston Churchill

/T

Relational databases

Last time I was regrettably not able to attend the course, which always has consequences. This time the consequence was an extra assignment. Those of us who didn’t attend were told to create a model of a relational database, and write a post on the subject. Some of the stuff on relational databases I didn’t quite understand, but here goes:

Relational databases are, in principle, databases built on tables. A relation is basically the table in question. The relation consists of tuples (rows) and attributes (columns). One tuple consists of the information on a single individual, document or whatever other subject the database is built for. One attribute on the other hand consists of a specified type of information about the subject in question.

Let’s take a simplified example. Imagine a personnel database for a company. In this database, each tuple (row) would represent one of the company’s employees. Each attribute (column) would represent an important piece of information about the employee, for example the employee-number, social security number or name. An example of this follows:
Now, one needs to define a key, which is something unique one can use to find a specific subject in the database. In this case, one could use either the employee number or the social security number (assuming no two subjects can ever have identical information in these attributes). The defining of keys for search-functions in the database is crucial to success. If this fails, the database will not work as it’s supposed. Period.

Several keys may of course be used, so that one can search for subjects on different bases, if one for example doesn’t know the person’s employee number, one can find the person using his/her social security number. Secondary keys may also be inserted, such as for names, which will result (if the database and search-function are built well) in perhaps several results when searching on this.

When all this is done, and linked properly, one is well on one’s way to creating a working relational database. Or at least the foundation is built. Sh*tloads of work is still ahead, but the assignment was not to create a step-by-step guide on relational databases. I hope this little talk still shows that I’ve done my reading on the subject.

The following are examples on tools for building your own relational database:

4D
Microsoft Access

Thank you, and good night!
/T

 

Digitally Born Material

Salut.

On monday a few new topics were discussed, mostly concerning material which is “born digital”, which means it was published firstly, or only on the internet. Which brings up a completely different issue than the ones discussed about digitized material. Examples on material which is born digital: E-mails, facebook messages, blogs, youtube videos, 3D-models, computergames… The list is endless.

Now, the issues with digitally born material (DBM) is that the digital world develops much faster than old fashioned writing and publishing, and that even digital material rots in time. DBM gets outdated extremely quickly, firstly because the physical medium lasts perhaps 10 years (max) before it’s out of date. Take LP:s or floppy disks. Secondly because the formats of files get outdated as quickly or even more so. That’s why one has to keep in mind, if one wants to store DBM, that one either has to migrate or emulate to succeed.

There is also the risk of bit rot, which is yet to be explained. Nobody really knows how and why it happens.

It should also be noted that traditional archiving is pretty much out of the question when it comes to DBM. It would be incredibly stupid to print out gazillions of pages of information that is already out there and can be stored much more easily and accessibly than placing it in folders. Add to that the fact that some information that’s found on the internet, such as sound and video, can’t be printed on paper in any smart way.

Auf Wiederhören.