Getting the data

In addition to traditional data-gathering methods such as interviewing and observing, the Internet, digitalization, and the recent development of computational techniques have opened up new possibilities. The use of mobile phones, credit cards, customer-loyalty cards, social media, web browsing – all leave behind digital traces that can be used as research material on our social behavior. The massive accumulation of this type of data has led some to talk about big data and its possibilities for changing our society. However, although not just anyone can get access to such data files, there are many new intriguing ways for gathering data from the Internet that are accessible to all. For example, data from social media forms an important part of communication research today.

We will not cover the techniques for computational data mining here, but refer to some examples of collecting digital data easily. For example, data on real social networks can be collected from social network services such as Twitter and Facebook. For example, an Excel plug-in called NodeXL enables users with direct connections to social networks, such as Twitter, Facebook, YouTube, Flickr, Wikis, and emails to mine data (take look at these handouts on how to use NodeXL. An easy option for mining network data from Facebook is also provided by an app called Nettvizz. Many more tools and techniques for data collection can be found at DIRT: Digital Research Tools Wiki that has gathered together hundreds of tools for conducting research in the digital environment. It lists tools from data mining to reference management, from quantitative to qualitative data, and from commercial to open software.

If one is interested in extracting information from websites, web scraping is the order of the day. OutWit Hub is an example of a tool for scraping. By this tool a researcher can collect data from web pages directly and automatically without painful copy-pasting. Scrapers can be used directly via web browsers. OutWit Hub is available for Mozilla Firefox, and for Google Chrome there is another option that can be used. Software requiring programming skills, such as R, offers several possibilities for gathering different types of online content. Take a look at this nice example of gathering data on consumer attitudes towards an airline company through Twitter.

Last but not least, it goes without saying that there is tons of open and free data online that has already been gathered by someone and is made available for anyone to analyze. Data collected by governments, NGOs, and think tanks is readily available. Sources for open data sets can be found through governmental data pages and, for example, through a platform held by the Open knowledge foundation, an organizations promoting for the idea of open data.