Open-access Online data collection strategies used in qualitative research of the health field: a scoping review

Estrategias de colección de datos online en la investigación cualitativa del área de salud: revisión de escopo


Objective:  To identify and map the online data collection strategies used in qualitative researches in the health field.

Methods:  This is a scoping review guided by the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) from the Joanna Briggs Institute. We analyzed scientific articles, theses and dissertations from 12 databases. The analysis was made by descriptive statistics.

Results:  The final sample consisted of 121 researches. It was found that the number of publications increased sharply in the last five years, with predominance of studies from the United Kingdom. The highlight fields were psychology (28.1%), medicine (25.6%) and nursing (12.4%). The publications used 10 online data collection strategies: Online questionnaires, online forums, Facebook, websites, blogs, e-mail, online focus group, Twitter, chats, and YouTube.

Conclusions:  Online data collection strategies are constantly expanding and increasingly used in the health area.

Keywords: Qualitative research; Health sciences; Internet; Internet access; Online social networking; Social media


Objetivo:  Identificar y mapear las estrategias de colección de datos online utilizadas en la investigación cualitativa del área de la salud.

Métodos:  Esta es una revisión de escopo guiada por los supuestos del Joanna Briggs Institute de acuerdo con Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR). Analizamos artículos científicos, tesis y disertaciones, a partir de 12 bases de datos. El análisis se realizó mediante estadística descriptiva.

Resultados:  La muestra final consistió en 121 investigaciones. Se encontró que las publicaciones se acentuaron en los últimos cinco años, con predominio de estudios del Reino Unido, las áreas más destacadas fueron la psicología (28,1%), la medicina (25,6%) y enfermería (12,4%). Fueran utilizados 10 estrategias de colección de datos online: cuestionario en línea, foro en línea, Facebook, sitios web, blogs, correo electrónico, grupo focal en línea, Twitter, chats y YouTube.

Conclusiones:  Se puede afirmar que las estrategias de colección de datos online se están expandiendo constantemente en el área de la salud.

Palabras clave: Investigación cualitativa; Ciencias de la salud; Internet; Acceso a internet; Redes sociales en línea; Medios de comunicación sociales


Objetivo:  Identificar e mapear as estratégias de coleta de dados online utilizadas nas pesquisas qualitativas da área da saúde.

Métodos:  Trata-se de scoping review norteada pelos pressupostos do Joanna Briggs Institute segundo Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR). Foram analisados artigos, teses e dissertações, identificados a partir de 12 bases de dados. A análise se deu por estatística descritiva simples.

Resultados:   A amostra final constituiu-se de 121 pesquisas. Verificou-se que as publicações acentuaram-se nos últimos cinco anos, com predominância de estudos do Reino Unido, as áreas de destaques foram psicologia (28,1%), medicina (25,6%) e enfermagem (12,4%). Foram utilizadas 10 estratégias de coleta de dados online: questionário online, fórum online, facebook, sites, blogs, e-mail, grupo focal online, twitter, chats e youtube.

Conclusões:  Pode-se afirmar que as estratégias de coleta de dados online estão em constante expansão e utilização na área da saúde.

Palavras-chave: Pesquisa qualitativa; Ciências da saúde; Internet; Acesso à internet; Redes sociais online; Mídias sociais


The qualitative methodology has been widely used in studies in the health field, since it is capable of incorporating meanings and intentions as things inherent to the acts, relationships and social structures of the subjects being studied. As such, it makes a detailed analysis of human constructs and relationships possible1.

Therefore, the researcher who uses a qualitative method seeks a more detailed understanding of the subjectivity of the subject, as well as of the theme being studied, considering their context.

With the advance of knowledge, the surfacing of information and communication technologies (ICTs) and the increased ease of access to digital resources, the use of online media and computer-mediated communication has been increasingly common in researches. The internet is a great example of this, transforming types of behavior and communication, and, due to how easy to use it is, it becomes a resource in the collection of qualitative data2.

After all, through the Internet, it is possible to carry out in-depth studies about virtual space relationships, the interface of people's daily lives. Therefore, it can generate new types of knowledge and data collection, in addition to exploring the daily lives of specific groups that have certain features in common3.

In this interface, the surfacing of virtual communities stands out, as well as that of social networks, which expanded the understanding of the communication field and use of cyberspace.

This type of communication is basically electronic and mainly based on words (texts) and/or images. That helps researches that seek to use it as a data collection strategy, as it makes possible to carry out studies about themes such as online identity and sociability.

It is also possible to collect data in online discussion forums, sites in which a certain group, with common features, gathers to debate a certain theme. That makes a dialogic approach possible, one that focus on the meaning of the field of interest being studied4. This type of data collection becomes even more pertinent for researches in the field of health and Nursing, since online forums and communities are increasingly used as sources of data collection. They are even used by patients and their relatives as sources of therapeutic support2.

Therefore, data collection reaches a new standard, in which respondents have access to the research in an online space which can be accessed whenever desired - in the case of asynchronous strategies, in which the researcher and the research subject do not need to be connected simultaneously - and more comfortably. Comfort is also a factor in synchronous strategies, even when simultaneous access is required, since the simultaneity happens online and not in the physical space, meaning the subject can be in the environment of their choosing3-4. Additionally, the researcher can directly monitor the progress of the research, as data is uploaded into digital platforms1-2,4.

Therefore, it becomes evident that the online space is, simultaneously, a new space to collect qualitative data and a necessary field of investigation if one wishes to understand how human relations happen in digital environments, especially concerning its use as a source of information about health. Therefore, this is a theme that needs to be understood by qualitative researchers. The first step to do so is understanding how each online data collection strategy has been incorporated into qualitative investigations.

Researchers have highlighted that investigations based on online data collection strategies are a reality that brings them both benefits and challenges1-4. However, no study was found that presented how researchers of qualitative health field investigations have been incorporating these innovations into their researches, thus showing how necessary the current investigation is.

Therefore, it becomes pertinent to map online data collection strategies that have been used in qualitative health field researches. This mapping can be the base for researchers to effectively incorporate these resources in their investigations.

As a result, this research aims at answering the following research question: what are the online data collection strategies used in qualitative researches in the health field? Therefore, this study aimed at identifying and mapping online data collection strategies used in qualitative health field researches.


This is a scoping review - a type of literature review aimed at mapping the main concepts and limitations of a certain field of research, as well as the evidences for professional practice - guided by the prescriptions of the Reviewer's Manual5 from the Joanna Briggs Institute (JBI) and presented according to the recommendations of the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Its protocol was registered at Open Science Framework.

A research protocol was created including the items: type of study, objective, sample composition, research question formulation, inclusion criteria, exclusion criteria, data collection, data extraction and data synthesis.

The sample was made up of qualitative samples of the field of health which used digital means as data collection mechanisms.

The research formulation used the PCC mnemonic device, in which: P (Population) - qualitative research; C (Concept) - online data collection strategies; and C (Context) - the health field. As a result, the research question found was: "what online data collection strategies have been used in qualitative researches in the health field?".

To find scoping reviews or other protocols that are similar to those defined in the objectives of this study, a research was carried out in November 2018, in the data bases JBI, Clinical Online Network of Evidence for Care and Therapeutics (COnNECT+), Database of Abstracts of Reviews of Effects (DARE), The Cochrane Library and the International Prospective Register of Ongoing Systematic Reviews (PROSPERO). No studies were found whose objective was identifying and mapping online data collection strategies from qualitative researches in the health field.

With regards to the elaboration of a research strategy, the bases PubMed Central (PMC) and Cumulative Index to Nursing and Allied Health Literature (CINAHL) were mapped according to the main English descriptors for studies which approached the theme, found at the Medical Subject Headings (MeSH) website. To identify the descriptors in Portuguese, the Descritores em Ciências da Saúde (DeCS), from the Biblioteca Virtual em Saúde (BVS), was used.

As a result, the following research strategies were used: 1) MeSH: [(“Qualitative Research” OR “Qualitative Studies”) AND (“Online research” OR “Online focus groups” OR “Online interview” OR Internet OR “Online forum”)], using the C (Health Sciences) as a search filter; 2) DeCS: [(“Pesquisa Qualitativa” OR “Método qualitativo”) AND (“Método online” OR Online OR “Grupo focal online” OR Internet OR “Entrevista online” OR “Comunidades Virtuais” OR “Pesquisa online”)] - the C (Ciências da Saúde) was used as a search filter.

In December 2018, data was collected from the databases CINAHL, Web of Science, Scopus, Literatura Latino-americana e do Caribe em Ciências da Saúde (LILACS) and Electronic Theses Online Service (ERIC). Grey literature (theses and dissertations) was researched using the databases Catálogo de Teses e Dissertações da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Europe E-Theses Portal (DART), Electronic Theses Online Service (EThOS), Repositório Científico de Acesso Aberto de Portugal (RCAAP), the National ETD Portal and Theses Canada.

The research included qualitative researches, published in-full, in English, Portuguese, Spanish or French, that used online data collection strategies with subjects. The research excluded editorials, experience reports, theoretical essays, integrative reviews, and researches that used other data collection mechanisms. No temporal limit was delimited, since the objective was to trace a temporal line with regards to online data collection strategies in researches from the health field.

In the stage of study selection, at first, titles and abstracts were evaluated, to verify whether the works were suitable considering the inclusion and exclusion criteria. This was done independently by two researchers, and a third one did the same in case of conflicting opinions.

Pre-selected studies were recovered in full. It stands out that, at this point, the pool was checked for duplicates, and nine researches were excluded.

Studies were then read in full. Those that did not answer the research question were excluded, a total of 125 works. From these, 64 studies had a quantitative approach, 32 did not use online data collection strategies, 24 did not belong to the field of health (being offeredas results by the databases despite the use of filters) and 5 were not research articles (they were reflections and reviews). Therefore, 121 researches were a part of the final sample of this scoping review.

Data was tabulated using spreadsheets built in Microsoft Excel, including the following variables: type of study, year, country of origin, author's field of knowledge, type of research, data collection procedures, online data collection strategy, data analysis procedures, research subjects, benefits and limitations of the use of the online research strategy. Data was extracted and then analyzed using simple descriptive statistics (n; %).


The final sample included 121 researches, representing 0.05% of all researches found at first and 49.2% of the studies that passed the pre-selection stage and were read in full (Figure 1).

Figure 1 -
PRISMA ScR flowchart (adapted) of the research study selection

Most works in the sample were scientific articles, whose numbers increased throughout the years. Publications from the last five years stand out, as well as those carried out by Psychology, Medicine, and Nursing researchers (Table 1).

Table 1 -
Characterization of the researches analyzed. Rio Grande do Norte, Brazil, 2019

The researches were carried out in 20 different countries. Most of them were carried out in the United Kingdom, the United States, Canada and Brazil (Figure 2).

Figure 2 -
Countries in which the researches analyzed were developed (in absolute numbers)

Regarding the type of research described by the authors of the researches analyzed, most investigations had a qualitative approach (88.4%), while the others claimed to have used a mixed-approach to data analysis (11.6%).

Some investigations (22.3%) also presented a typology for research in addition to their own typology, among which ethnographic studies stood out (9.9%).

With regards to the data gathering procedures used, most researches exclusively used online data collection strategies (83.5%), while the others (16.5%) used online strategies paired with others, which were: interviews in person (10.8%); physical focus groups (5.0%); telephone interview (3.3%); document analysis (1.7%); and observation (0.8%).

It is important to highlight that many researches used more than one data collection strategy. That includes both those whose strategies were exclusively online and those that combined other approaches, which explains the results above 100%.

The online data collection strategies were: online questionnaires (27.3%); online forums (27.3%); Facebook (14.9%); websites (9.9%); blogs (9.1%); e-mails (8.3%); online focus groups (5.8%); Twitter (4.1%); chat rooms (2.5%); and Youtube (0.8%).

Content analysis (47.1%) and thematic analysis (38.8%) stood out as the most common data analysis procedures. The other techniques used were data based theory (4.1%); discourse analysis (2.5%); phenomenological analysis (1.7%); comparative analysis (0.8%); and lexicographical analysis (0.8%). The other researches (4.1%) did not describe the analysis procedures they used.

It is relevant to highlight that some studies (21.5%) used software to aid in the analysis of the qualitative data, which were: Nvivo (18.2%); Atlas.ti (1.7%); Dedoose (0.8%); and CQPweb (0.8%).

With regards to the participants of the researches, it is important to distinguish the studies that analyzed people (63.5%) from those that analyzed posts (33.1%), groups and sites (2.5%), and videos (0.8%).

Most investigations that analyzed people (38.0%) carried out their data collection from online chats, forums and focus groups, with a mean of 57 subjects per group (a minimum of 5 and a maximum of 250 people). On the other hand, other researches (24.8%) collected data from online questionnaires and e-mails, with a mean of 254 participants (a minimum of 4 and a maximum of 1740) subjects. One research did not state how many participants it had.

This work considered, as part of the group of investigations that analyzed posts, those that assessed posts, tweets, comments, and testimonies in websites and blogs. In these cases, the mean number of posts in the samples was 7267 posts (a minimum of 10 and a maximum of 228130).

The chart also includes the benefits and limitations reported by the authors with regards to the use of online data collection strategies for qualitative researches (Chart 1).

Chart 1 -
Benefits and difficulties in the use of online data collection strategies in qualitative researches

In general, the benefits involve the possibility of accessing large samples, with people from different places, in addition to the fact that the investigation process is neutral, since there is no involvement between the researcher and the subjects being researched. Regarding limitations, a possible selection bias stands out, since in these studies only the data of people with Internet access is collected.


The predominance of studies classified as white literature (scientific articles) is a positive aspect, since these are documents that are easy to find, disseminate, and obtain. They also result from professional production/edition mechanisms, meaning they are highly visible6. Research published about these resources will, as a result, offer easier access to other researchers, optimizing their use - a fundamental feature of investigations that take advantage of innovative data collection methods.

The publications, which have been trending upwards through the years (especially in the last five), come from 20 different countries and have been carried out by qualitative researchers from 15 different fields of knowledge. That shows that there is a willingness to innovate in qualitative researches which are growing, contemporary, and multidisciplinary. That is in accordance to the perception that being proficient in the use of new investigation techniques is one of the key elements of reaching new standards with regards to research7.

The use of non-conventional data collection methods involves creativity, planning, and proficient use of technique. As a result, the research is methodologically sound, which is paramount for the production of consistent, reliable, and replicable data8.

In this landscape, the Internet is increasingly seen as valuable tool to collect information, through the use of its navigation and interaction resources9. In general, quantitative data collection on the Internet is well documented, especially using electronic forms. However, the discussion of how well it can generate qualitative data is still incipient, despite the growing trend that can be seen in this scoping review.

Internet-based qualitative researchers have generally been ethnographic, using participative methods9. This strategy was called "nethnography" by its authors10-15.

This is an emerging qualitative data collection method, which makes it possible for researchers to obtain a natural and immersive view about online interactions10. The nethnographic approach makes it possible for a broad set of opinions to be gathered, demands significantly less resources than in-person interviews or focus groups, and substantially diminishes the influence of the researcher in the results, since there are no researchers present when the comments and/or testimonies are published4,10. This strategy was especially common in researches that used forums and blogs as data collection environments.

Regarding the approach of the researches evaluated, only 11.6% used a mixed data analysis approach16-29. The others were exclusively qualitative. This is not a problem in itself, but shows that the use of mixed approaches in researches in the health field is still incipient30.

Mixed approaches are understood in many ways, even receiving many names - mixed methods and combined methods, for instance -, and consist in the use of quantitative and qualitative strategies in the same research project. That is justified when the phenomenon being analyzed is complex and broad3.

However, it should be highlighted that the choice of a mixed data analysis approach must have solid scientific bases, so that both the quantitative and qualitative aspects of a research can offer essential information to contribute in the understanding of the phenomenon investigated. This strategy was not used very frequently in this study.

Although this combination of data analysis approaches was not used often, a significant number of researches used a combination of data collection techniques, sometimes mixing online and in-person ones9,27,31-37. Using data from different techniques complementarily was highlighted by researchers as a way to enrich analyses9. This strategy, called method triangulation by some investigators, stems from the understanding that the adoption of multiple methods can show multiple facets of a research38.

In this context, the complexity of the modern world and, therefore, of its objects of study, demands strategies (both for data collection and analysis) that are complex and capable of dealing with the multiple points of view and perspectives of a phenomenon that can be considered in a variety of ways, and oftentimes cannot be seen as a whole from one single vantage point39.

The preoccupation with analyses that are methodologically sound concerning scientific interpretation and the complexity of the data analyzed can also be seen in the explanations presented by the researches evaluated about their own data analysis procedures.

On one hand, content analysis stood out as a theoretical reference for the studies12-13,15-19,22,27,31-34,37,40-82. This is an internationally recognized method, disseminated by Laurence Bardin through his work L'analyse de contenu, which shows, systematizes and express the content of messages, aiming at logically deducing the data analyzed83.

On the other hand, a large group of investigations used Computer-Assisted Qualitative Data Analysis Software (CAQDAS) to aid in data analysis 9-10,18-19,22,32,36,42,45-46,50-51,53,55,57,61,63,65,72,78-79,84-88. This shows an interest to innovate research methods to deal with current demands due to the methodological care of qualitative investigations and to the creativity of the researcher. The use of these software also offers benefits, especially the optimization of data organization, diminishing the time it takes to code large blocks of text. It also aids in the performance of different types of analyses89.

With regards to the samples of the investigations (subjects, posts, or others), they were in accordance to the benefits explained by each online data collection strategy being used.

Studies that used chat rooms, forums, and online focus groups had smaller samples with subjects, and stated that this strategy has an unique advantage as it allows sensitive subjects to be investigated, including those that deal with private issues which, often, are difficult to evaluate in person87.

On the other hand, researches that collected data from online questionnaires, e-mails, or posts, had samples with a very high number of subjects or posts, indicating as their main benefit the possibility of accessing large samples, with broad geographic coverage70,90.

Ten different online data collection strategies were used in the researches analyzed, including: asynchronous tools (online questionnaires, online forums, websites, blogs, e-mail and YouTube); synchronous resources (online focus groups and chat rooms); and social networks (Facebook and Twitter).

Concerning the benefits of online data collection strategies, some stood out: the possibility of accessing large samples from different places; the low cost, when compared to in-person techniques, both for the researcher and for the subjects; the neutrality of the research process, which increases the internal reliability of the study, since it is possible to collect the data with no potential influence from the researcher46.

As to their limitations, potential selection biases were common, since participation in the study is restricted to subjects with Internet access, and even, depending on the online data strategy used, to those who habitually use certain online tools (forums, social networks, Facebook, etc.)10.

The superficiality of responses and the impossibility of accessing demographic data from the subjects were also limitations reported by researches that used different online data collection techniques43.

It is important to highlight the limits and benefits of synchronous and asynchronous online data collection strategies. The synchronous strategy had as its main benefit the possibility of interacting with subjects from different places and the diminution of the group influence effect, which generally occurs when group data is collected in-person91.

On the other hand, the slow and superficial responses found in this type of online data collection was pointed out as limitations. Researchers9) reported that the online interviews through chat rooms took twice as long as in-person ones and produced much less words: a 120 minutes online interview produced nearly seven pages of text, while an in-person 90 minutes long interview produced from 30 to 40 pages of text. It was also found that the exchange of answers and responses was clearly influenced by the reading, typing, and reflecting abilities of the respondents.

Concerning asynchronous online data collection strategies, the main benefit mentioned was the time afforded the participants to consider their responses. On the other hand, the impossibility of debating, the lack of spontaneity of the answers, and the high rate of participants who did not answer were reported as limitations36.

The importance of the planning stage of qualitative researches stands out. The researcher should guide the investigation using coherent and adequate theoretical references, and it is paramount that they have a detailed knowledge on the data collection process used for the research.

Online data collection strategies, as a result, are a fertile ground for qualitative researches, in accordance to this era in which technology is increasingly a part of people's lives. Qualitative investigators, therefore, must use them creatively and with methodological care.


This study analyzed 121 researches published from 2003 to 2018, from 20 countries and 15 fields of knowledge. Ten different online data collection tools were used: online questionnaires, online forums, Facebook, websites, blogs, e-mail, online focus groups, Twitter, chat rooms, and YouTube.

The researches highlighted, as benefits of the use of these strategies, the possibility of accessing large samples, the large geographical coverage, and the neutrality of the research process. As to their limitations, a possible selection bias stands out, since in these studies only the data of people with Internet access is collected.

As a limitation of this study, the quality of the abstracts analyzed in the first stage of the scoping review should be mentioned, since it could have led certain researches to not be selected due to the absence of descriptions of methodological procedures. It also stands out that the results presented must be understood within the context of the databases used and of the time period in which data collection was carried out.

This study hopes to contribute to the discussion about the theme through the mapping of the online data collection strategies that have been used in qualitative investigations in the field of health and represent new possibilities for qualitative researchers.

These findings may subsidize qualitative researches that aim to use the strategies identified to contribute for the construction of innovative knowledge in the field of health and in Nursing. This could lead to improvements in the teaching of scientific methodologies that incorporates this knowledge; the production of knew researches based on online strategies; and the practice of healthcare, which can benefit from the findings of said investigations.


