Processing and enrichment

The database made available does not correspond to the raw data collected using the Maptionnaire tool. Several filtering, processing and data enrichment operations were carried out to produce the database made available.

Cleaning of Raw Data

A major filtering of the respondents was carried out, followed by corrections to harmonise the names of the variables and the modalities entered.

Filtering Collected Data

Temporal Filtering

Only responses sent between 16 November 2021 and 30 June 2022, the official survey period, were retained. The various tests of the questionnaire carried out by members of the project before the official launch of data collection have therefore been deleted.

Deletion of Experimental Respondents

During the official collection period, tests or checks of the questionnaire were carried out by teachers from the various target universities. These responses were deleted automatically when they were easy to detect, or manually depending on the information provided by the various project members.

Deletion of Insufficiently Completed Responses

In order to ensure that the database is as homogeneous as possible and to produce meaningful results, respondents who did not answer certain questions were removed from the database. Several cases are concerned:

  • Students who did not indicate their home university
  • Students who left the questionnaire before the end
  • Students who did not answer all the questions
  • Students who did not answer the central question of the questionnaire on their region of the world (mental map)
Figure 1: Number of non-responses per question before and after filtering of respondents
NoteImpact of Filtering

This qualitative filtering of the database resulted in a very large number of responses being deleted: 1,190 responses were excluded from the database, which was reduced from 3,220 to 2,030 individuals surveyed.

Correction and Harmonisation

Several anomalies caused by the Maptionnaire tool and its administration had to be corrected. Two aspects of the raw database were concerned:

  • Column names (variable coding)
  • The coding of the modalities entered, whether in the input language or in the pivot language

Variables recoding

To make the database easier to manage, the codification of the names of variable (codename) have been completely changed in order to harmonise them and make them intelligible. All the variables have been renamed as follows:

  • A letter for the theme (according to the order of the questionnaire) – e.g. A
  • A two-digit number for the page number of the questionnaire – e.g. 03
  • A two-digit number for the question number on the page – e.g. 02
  • Three letters to explain the themed categorye.g. cad (for framing [cadrage] questions)
  • Six characters for the unique identifier of the question – e.g. medium (question on the device used)
  • Three characters for the type of value (modality) entered in the databse – e.g. opt
  • Two or more characters for the input language or the described modalitye.g. en or e.g. computer

For example, the answer “computer” to the question “On what type of device do you answer?” technical medium are you using to complete this questionnaire?]{.missing} is stored in the column named A0302_cad_medium_opt_computer.

Underscores (“_”) have been used to simplify the reading of codenames. A dictionary of variables supplied with the database is used to map survey questions to variables (codenames) in the database.

Table 1: Extrait du dictionnaire de variables de la base de données des répondant·es
CODENAME Variable_name_or_question Code_quest Code_thema Letter_thema Page_of_questionnaire Num_question_thema Question_ID Input_mode
X0001_met_respID_aut Respondent ID X0001 metadt X 00 01 respID automatic
X0004_met_firact_aut First Active X0004 metadt X 00 04 firact automatic
A0101_cad_langua Language chosen A0101 cadrag A 01 01 langua Drop-down list
A0302_cad_medium_lab_en What technical medium are you using to complete this questionnaire?...7 A0302 cadrag A 03 02 medium Drop-down list
A0303_cad_enviro_lab_en Please describe the situational context in which you are completing this questionnaire: A0303 cadrag A 03 03 enviro Drop-down list
A0405_cad_univer_lab_city City of university A0405 cadrag A 04 05 univer Drop-down list
A0405_cad_univer_lab_country Country of university A0405 cadrag A 04 05 univer Drop-down list
A0406_cad_fields_lab_rec What is your principal field of study? A0406 cadrag A 04 06 fields Unique choice
A0407_cad_levels_lab_en What is your current level of academic studies? A0407 cadrag A 04 07 levels Drop-down list
A0508_cad_gender_lab_en You are: A0508 cadrag A 05 08 gender Unique choice
A0609_cad_birthc_lab_rec What is your country of birth? Recoded for pseudonymization A0609 cadrag A 06 09 birthc Recoding
A0610_cad_age_rec Recoded age categories for pseudonymization A0610 cadrag A 06 10 birthy Recoding
A0611_cad_iddoc1_lab_rec From what country(ies) do you hold ID documents ? Country 1: A0611 cadrag A 06 11 iddoc1 Recoding
A0612_cad_iddoc2_lab_rec From what country(ies) do you hold ID documents ? Country 2: A0612 cadrag A 06 12 iddoc2 Recoding
A0613_cad_belong_lab_rec Do you have another sense of national belonging? If so, please specify: A0613 cadrag A 06 13 belong Recoding
B0701_lan_langu5_lab_rec What language(s) did you speak before you were 5 years old? B0701 langue B 07 01 langu5 Multiple choices recoded
B0701_lan_otlan5_lab_rec What language(s) did you speak before you were 5 years old? B0701 langue B 07 01 otlan5 Open field recoded
B0702_lan_dailan_lab_rec Which of these languages ​​do you speak on a daily basis? |B0702 |langue |B |07 |02 |dailan |Multiple choices recode
B0702_lan_otdail_lab_rec Which of these languages ​​do you speak on a daily basis? |B0702 |langue |B |07 |02 |otdail |Open field recoded
C0901_mob_living_lab_en Have you ever lived in a country other than the one you currently live (for more than 4 months)? C0901 mobiok C 09 01 living Unique choice
C1002_mob_livfam_lab_rec If for family reasons (C0901_mob_living_lab_en), specify in which country(ies): C1002 mobiok C 10 02 livfam Open field recoded
C1002_mob_livstu_lab_rec If for studies (C0901_mob_living_lab_en), specify in which country(ies): C1002 mobiok C 10 02 livstu Open field recoded
C1002_mob_livpro_lab_rec If for professional reasons (C0901_mob_living_lab_en), specify in which country(ies): C1002 mobiok C 10 02 livpro Open field recoded
C1002_mob_livoth_lab_rec If for others reasons (C0901_mob_living_lab_en), specify in which country(ies): C1002 mobiok C 10 02 livoth Open field recoded
C1103_mob_travel_lab_en Have you travelled to a country other than the one you currently live (and stayed there for more than 3 days)? C1103 mobiok C 11 03 travel Unique choice
C1204_mob_trlist_lab_rec If so (question « C1103_mob_travel_lab_en »), in which country(ies) ? (5 countries maximum) C1204 mobiok C 12 04 trlist Open field recoded
D1301_mob_living_lab_en Would you like to live in a country (for a period of more than 4 months) other than the one in which you currently live? D1301 mobimg D 13 01 living Unique choice
D1402_mob_livpro_lab_rec If so (question « D1301_mob_living_lab_en »), name the countries considered to work ? (5 countries maximum) D1402 mobimg D 14 02 livpro Open field recoded
D1402_mob_livstu_lab_rec If so (question « D1301_mob_living_lab_en »), name the countries considered to study ? (5 countries maximum) D1402 mobimg D 14 02 livstu Open field recoded
D1503_mob_travel_lab_en Would you like to visit a country (for a period of more than 3 days) other than the one in which you currently live? D1503 mobimg D 15 03 travel Unique choice
D1604_mob_trlist_lab_rec If so (question « D1503_mob_travel_lab_en »), in which country(ies) ? (5 countries maximum) D1604 mobimg D 16 04 trlist Open field recoded
E1801_map_cresid_lab_en Center the map on the country where you currently live: E1801 maping E 18 01 cresid Unique choice
F2101_med_topics_lab_en Which of the following topics have you been interested in during the last week? F2101 medias F 21 01 topics Multiple choices recoded
F2202_med_intern_lab_en Using the scales, indicate your interest in the international news F2202 medias F 22 02 intern Unique choice
F2202_med_locale_lab_en Using the scales, indicate your interest in the local news F2202 medias F 22 02 locale Unique choice
F2202_med_nation_lab_en Using the scales, indicate your interest in the national news F2202 medias F 22 02 nation Unique choice
F2303_med_medias_lab_en What media have you used to stay informed over the last week? F2303 medias F 23 03 medias Unique choice
F2304_med_langue_lab_rec Do you read the media in one or more of the following languages? F2304 medias F 23 04 langue Multiple choices recoded
F2304_med_lanoth_lab_rec If « other » (question « F2304_med_langue_lab_rec »), specify your answer : F2304 medias F 23 04 lanoth Open field recoded
G2501_cul_lgsort_lab_en During the last month, have you watched (film, series, video) or read any publication (book, comic, novel, etc G2501 cultur G 25 01 lgsort Unique choice
G2602_cul_lgview_lab_rec In which language(s) have you watched films, series or videos? G2602 cultur G 26 02 lgview Multiple choices recoded
G2602_cul_otview_lab_rec If « other » (question « G2602_cul_lgview_lab_rec »), specify your answer : G2602 cultur G 26 02 otview Open field recoded
G2603_cul_lgread_lab_rec In what language(s) have you read (books, comics, novels, etc.)? G2603 cultur G 26 03 lgread Multiple choices recoded
G2603_cul_otread_lab_rec If « other » (question « G2603_cul_lgread_lab_rec »), specify your answer : G2603 cultur G 26 03 otread Open field recoded
G2704_cul_orsort_lab_en During the last month, have you watched (film, series, video) or read publications (book, comic, novel, etc) written or produced by a foreign author* ? G2704 cultur G 27 04 orsort Unique choice
G2805_cul_orread_lab_rec From what country(ies) are the authors of the works you have read ? Origin(s) of books, comics, novels etc.: G2805 cultur G 28 05 orread Open field recoded
G2805_cul_orview_lab_rec From what country(ies) are the authors of the works you have watched ? Origin(s) of films, series or videos authors : G2805 cultur G 28 05 orview Open field recoded
H3001_eur_words1_lab_rec What words do you associate with 'Europe' (Word or expression n°1) H3001 europe H 30 01 words1 Open field recoded
H3001_eur_words2_lab_rec What words do you associate with 'Europe' (Word or expression n°2) H3001 europe H 30 01 words2 Open field recoded
H3001_eur_words3_lab_rec What words do you associate with 'Europe' (Word or expression n°3) H3001 europe H 30 01 words3 Open field recoded
H3001_eur_words1_lab_en What words do you associate with 'Europe' (Word or expression n°1) H3001 europe H 30 01 words1 Open field recoded
H3001_eur_words2_lab_en What words do you associate with 'Europe' (Word or expression n°2) H3001 europe H 30 01 words2 Open field recoded
H3001_eur_words3_lab_en What words do you associate with 'Europe' (Word or expression n°3) H3001 europe H 30 01 words3 Open field recoded
H3001_eur_words3_lab_fr What words do you associate with 'Europe' (Word or expression n°3) H3001 europe H 30 01 words3 Open field recoded
H3001_eur_words1_lab_fr What words do you associate with 'Europe' (Word or expression n°1) H3001 europe H 30 01 words1 Open field recoded
H3001_eur_words2_lab_fr What words do you associate with 'Europe' (Word or expression n°2) H3001 europe H 30 01 words2 Open field recoded
H3102_eur_images_lab_en In general, your view of the European Union (EU) is: H3102 europe H 31 02 images Drop-down list
I3301_cad_educp1_lab_en What is the highest level of studies of your parent 1 (mother or guardian 1)? I3301 cadrag I 33 01 educp1 Drop-down list
I3302_cad_educp2_lab_en What is the highest level of studies of your parent 2 (father or guardian 2)? I3302 cadrag I 33 02 educp2 Drop-down list
I3403_cad_paysp1_lab_rec What is the country of birth of your parent 1 (mother or guardian 1)? I3403 cadrag I 34 03 paysp1 Recoding
I3404_cad_paysp2_lab_rec What is the country of birth of your parent 2 (father or guardian 2)? I3404 cadrag I 34 04 paysp2 Recoding
K3601_met_submit_aut Submitted K3601 metadt K 36 01 submit automatic

Enrichment of the Database

The data collected with the Maptionnaire tool are split into two files. The first, called “respondents database”, contains all the variables except the mental maps. The second, called “geometries database”, only contains geographical data (mental maps) and the associated vocabulary. These two files have been enriched, and several variables have been added.

Respondents Database

Creation of Standard Variables

Variables on the country and city of the concerned university were created. The creation of these two new fields (A0405_cad_univer_city et A0405_cad_univer_country) facilitates data manipulation for comparative analyses based on country of birth, administrative nationality and the country in which students are training.

Recoding Closed Questions

Other variables resulting from coding were added to the database, such as the recoding of cited countries (e.g. questions on desired mobility) based on the ISO3 code nomenclature (e.g. C1002_mob_livpro_lab_rec). The same work was carried out for the languages spoken by the respondents, coded according to the ISO 639-3 standard, and sometimes associated with an ISO3 code in order to specify a geographical sector (e.g. Algerian Arabic = ARA_DZA in the variable B0702_lan_dailan_lab_rec).

Recodage des questions ouvertes

Recoding Open-Ended Questions

The open-ended questions required special attention, with the aim of harmonising the text and spelling in each language of the questionnaire and then extracting simplified information.

As with closed questions, open questions concerning countries were recoded using ISO3 codes and those concerning languages were recoded using ISO 639-3 and ISO3. In the case of regional areas with vague boundaries, giving rise to responses that could describe cities, states or economic, cultural or political organisations, we re-encoded using an entity dictionary developed as part of the project by the IMAGEUN Media group.

Based on the model of other dictionaries developed by specialists in geographic media analysis (such as the newspmap package, for example, which offers a dictionary for recognising three levels of objects: countries and two levels of world regions), the IMAGEUN dictionary is a tool for detecting objects in textual character strings based on lists of syntagms. For example, the political organisation object “European Union” could be detected in French by the terms “union européenne”, “u.e.”, “ue”.

The originality of this dictionary is that it can identify several types of object (states, supra-national political organisations, macro-regions of different types: continents, seas and oceans, state capitals). Built from the Wikidata database, this multilingual dictionary links syntagms in the different languages of the project (English, French, German, Tunisian Arabic / Derja and Turkish) to a unique ontology (identifiable by a unique code). This specific encoding enables cross-referencing with the project’s other databases, including that of the Media theme.

Example for one country
Frankreich FRA Code ISO3
Angleterre GBR_ENG Code ISO3
Martinique FRA_MTQ Code ISO3
Exemple pour une langue
Igbo, Nigerian Pidgin IBO PCM Ici deux langues différentes
Darija(algerisch) ARA_DZA Ici une langue avec spécification du pays

Finally, when a category could not be coded using the ISO nomenclature, we harmonised the character string (case, spelling, spaces, etc.).

Example for belonging (country, identity)
JE SUIS MARTINIQUAIS. je suis martiniquais
l’espèce humaine l’espece humaine

Recoding the Words that Characterise Europe

The open-ended question (central to the project) on the words that students associate with Europe was also the subject of specific harmonisation work.

A multilingual recoding of the cited words was carried out collectively, followed by a harmonised translation of the 50 most frequent words into English and French.

Country of response Language of origin English French Frequency
Turc euro euro euro 108
Turc özgürlük freedom liberté 43
Allemand union union union 156
Allemand euro euro euro 127
Arabe اتحاد union union 157
Arabe ثقافة culture culture 51
Anglais union union 183
Anglais continent continent 176

Geometries Database

Construction of the geometries

The file containing the mental maps drawn by the respondents has a different structure to the respondents’ database. As respondents were able to draw up to five zones and associate a word/expression with each zone, each line of this data file corresponds to a geometry (a space) drawn. Therefore, unlike the respondent database, a student’s answers can be stored in several lines (up to five).

Maptionnaire allows to retrieve a very faithful drawing of the plots made by the students. A large number of points (nodes) are automatically recorded by the tool during the tracing of an area, which can be done continuously (with one click), or by a succession of clicks, as would be done in a traditional geographic information system. These geometries are then made available in WKT and in GeoJson formats.

The geometries of the drawn polygons were then reconstructed from the coordinates stored as a character string (WKT). The reconstruction was automated using R, but was subject to constant visual checking. The geometry database was then converted into a geographic layer and made available in geopackage format.

Geometries Characterisation

Once the geometries were reconstructed, the areas drawn were doubly characterised using a semi-automated procedure based on R (Figure 2) , in order to:

  1. Assess the quality of the drawn polygons (detection of topological problems) in order to correct them
  2. Construct a typology of plotted areas in order to understand the spatial anchoring logic and propose specific cartographic procedures

This process of characterising geometries, both technical and thematic, is the result of collective work and arbitration supervised by several members of the project.

Figure 2: Geometries classification loop

Correcting Geometries

Several topological problems were detected in the polygons drawn by the respondents:

  • intersection of the polygon’s own contour. These problems are mainly due to the way the polygon was being closed (Figure 3)
  • Assembly of several polygons drawn separately but which only designate one and the same space (sometimes with superimposition - see@fig-fusion)
  • Anti-meridian problem
  • “Scribble” not removed by a respondent
Figure 3: Types of defects observed in geometries

With a view to mapping, these polygons were repaired or reworked. In total, 882 polygons underwent a minor topological correction, 39 a more specific correction and 142 (5%) were invalidated, for example because of scribbling without giving a name to the figure. The corrections made only slightly altered the areas drawn. A few dots were removed, but their shapes were not altered in any way. These are only slight topological simplifications to improve the cartographic rendering and reduce the weight of the database.

It is important to note that although these repairs were carried out semi-automatically using R, all the corrections made were checked visually to ensure that no mental maps had been distorted.

Figure 4: Union of two polygons representing a common space

In the above case, two geometries were drawn by a respondent and a single word (“Europe”) was associated with one of the two geometries. The two polygons were merged into a single polygon, characterised by the word “Europe”.

Classification of Geometries

Once all the geometries have been corrected, a typology of the polygons on their relationship to each other was drawn up. This typology led to the development of four major types of space and two additional types, one characterising drawings combining the four major types, the other invalid drawings:

Figure 5: Typologies of relationships between the geometries of a single respondent

The Figure 5 shows the different spatial logics observed in the responses. The most common case is the drawing of a single polygone, representing almost 60% of the responses.

This is followed by “nested” polygons, symbolising different regions, in a scalar logic, from the smallest to the largest or vice versa.

Multiple regions designate disjointed spaces with no intentional spatial overlap, and account for 6% of responses.

Intersected (or superimposed) regions are by far the least frequent. These are two distinct areas that overlap a common area (e.g. the Euro-Mediterranean area and the European Union).

Finally, the last case described in Figure 6 concerns responses combining several spatial logics. Although these cases were few in number, they were nevertheless included in the database provided.

Figure 6: Example of combined spatial logics

Adding Weighting Variables

With a view to mapping these mental maps, the drawn polygons were weighted and implemented in the database. Three weighting values were established:

  • weight_geom: each polygon drawn has a weight of 1

  • weight_resp: each response has a weight of 1 (polygon weight = 1/nb. of polygons entered)

  • weight_scale: each response has a weight of 1 and nested geometries have a shared weight (total for a response = 1)

Figure 7: Weights

Typology of Words Associated with Geometries

Each polygon is named by the respondent in an open question. This original response (E1903_map_rgname) required several forms of recoding: an initial correction and spelling harmonisation in the original language (E1903_map_rgname_rec), accompanied by translations into the two working languages of the project (English: E1903_map_rgname_lab_en and French: E1903_map_rgname_lab_fr).

In addition to recoding, several forms of enrichment have been proposed in order to categorise objects and process them independently of the original language of the response (objective of multilingual processing).

Manual enrichment via a double typology: In order to allow semantic grouping of objects, manual coding was also proposed to offer a double typology (the modalities of these variables are presented in Table 2).

  • Concept: Is a category which links the E1903_map_rgname_rec syntagms and any translations to an entity (E1903_map_rgname_concept_en: English version, E1903_map_rgname_concept_fr in French). This variable can be used to link different phrases to the same entity (“République française” and “France”, for example, refer to the same entity: France).

  • Scalar typology (E1903_map_rgname_scale): Provides a categorisation of entities according to their scale (macro-regional, national, local, for example). This typology makes it possible to differentiate between objects that can be assimilated to macro-regions and other objects.
    Note: Not all the objects identified have a scale.

  • Typology type (E1903_map_rgname_type): Beyond their scale, a second typology appeared necessary to identify the type of object identified (city, political organisation, urban region, continent, sub-continent, etc.). In fact, two objects both located at similar scales (macro-region) (“Europe” and “EU”, for example) are categorised in different types (“continent” in the first case and “political organisation” in the second).

  • Parents: The variables (E1903_map_rgname_parent1_en, E1903_map_rgname_parent2_en, E1903_map_rgname_parent3_en) are variables used to group together objects linked by a scale relationship. For example, the term “Western Europe” is a parent of the term “Europe”. In practical terms, this variable allows you to work at different levels of granularity.

Below is a list of all the variables constructed on the basis of the word associated with a drawn region.

Table 2: Variables de la base de données des géométries décrivant les noms de région
CODENAME Variable_name_or_question Variable_description Example_value
E1903_map_rgname In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) Orignal answer in the original language, with no orthographical correction or coding Asya ve Avrupayı bağlayan köprü
E1903_map_rgname_lab_rec In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) Original answer in the original language with orthographical corrections Asya ve Avrupayı köprüsü
E1903_map_rgname_lab_fr In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) Original answer translated in French (except answers initially in french) Pont entre l'Asie et l'Europe
E1903_map_rgname_lab_en In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) Original answer translated in English (except answers initially in English) Bridge between Asia and Europe
E1903_map_rgname_concept_fr In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) Conceptual entities detected (French) pont entre Asie et Europe
E1903_map_rgname_concept_en In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) Conceptual entities detected (English) bridge between Asia and Europe
E1903_map_rgname_scale In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) Geographical scale of the object identified by it's name macroregional - sub-national - national - local - scalar - without scale - XXX - NA
E1903_map_rgname_type In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) Type of object identified by it's name continent - archipelago - region - state - sub-continent - city - political organization - concept - over-continent - over-continent/concept - strait - XXX - NA
E1903_map_rgname_parent1_en In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) General entity related to the object 1 bridge
E1903_map_rgname_parent2_en In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) General entity related to the object 2 Asia
E1903_map_rgname_parent3_en In your opinion, in which region of the world is the country where you currently live located ? Draw the limites of this region on the map (maximum : 5 areas) General entity related to the object 3 Europe

Assessment

Many Responses Deleted from the Database

Almost a third of the responses were deleted from the database. These were mainly responses that did not have any mental maps or that were too incomplete.

Numerous Enrichment Variables Created

46 additional variables were implemented in the respondents database. Two new variables were derived from pre-existing variables (country and city of university). The other additional variables are the result of recoding, harmonisation, translation or categorisation.

In the geometries database, 35 variables have been added, including variables for qualifying and weighting geometries useful for mapping, and