Wikipedia:GLAM/Auckland Museum/Tools and Resources: Difference between revisions

Auckland Museum Cultural Permissions Guidelines

Auckland Museum’s collections are open access by default, closed by exception. These exceptions include taonga Māori and Pacific collections. When uploading content to Wikimedia Commons, our staff will work within the guidelines of the Auckland Museum’s cultural permissions policy as outlined in He Korahi Māori: A Māori Dimension. To respect Māori cultural values, images relating to and depicting Māori subjects and content will not be uploaded to Wikimedia Commons, regardless of copyright status. This approach also applies to images that depict Pacific subjects and content, in line with the Museum’s Teu Le Va: the Pacific Dimension at Auckland Museum.

These guidelines are underpinned by our role as kaitiaki (guardians) to operate with good faith and respect.

The Glam Wiki guidebook is a beginner’s resource for GLAMs to learn about Wikimedia platforms and how to best utilise these online resources. It covers the basics and guidelines of use of Wikimedia, Wikipedia, Wikimedia Commons and Wikidata, Wikimedia’s role in GLAMs and how to get started.

Wikimedia Commons for GLAMs

Auckland Museum is committed to sharing content that is Open Access and culturally permissable, and part of this effort includes the sharing of our content on image sharing platforms for the benefit of public usage. Alongside Flickr, Flickr Commons, Pinterest, Unsplash and other image sharing sites, Wikimedia Commons is an important digital partnership for our institution to contribute to as part of our digital partnerships strategy. To ensure the sustainability of our contributions we have developed scalable and efficient workflows for metadata management and image uploading in bulk: a process commonly called Batch Uploading.

PIDs, Linked Open Data (LOD), and Data Roundtripping

Auckland Museum’s approach to digital partnerships is grounded in a philosophy of connected, sustainable, and transparent data sharing. Rather than treating each upload or dataset as an isolated contribution, we view every platform (Wikimedia Commons, GBIF, BHL, Pinterest, and others) as part of a wider linked data ecosystem. Each image, record, or dataset is connected through persistent identifiers (PIDs), as well as bi-directional data linking between our source systems and external platforms. Where possible, we integrate identifiers from other digital partnerships that host the same material, ensuring that our contributions reinforce, rather than fragment, the broader web of cultural data.

We do not advocate for data dumping or the dissemination of dissociated information detached from its source context. Our processes emphasise metadata stewardship; ensuring that attribution, provenance, and object relationships are maintained throughout. By reusing cleaned metadata and digitised images across multiple platforms, we operate sustainably and amplify the utility of our data. This interconnected approach to publishing acknowledges that our work contributes to an open, evolving knowledge network, rather than a closed or self-contained system.

Glossary

Attribution

A formal acknowledgement of where a work came from and who created it. On Commons, attribution appears through fields like Author/Creator, Source, and Credit line, and through the licence header if required by a Creative Commons licence.

Example: Auckland War Memorial Museum Tāmaki Paenga Hira. Photograph by Tudor Collins.

Batch Upload

Uploading many images at once using tools such as OpenRefine, Pattypan, or a Python script, instead of uploading one file at a time.

Bi-directional Data Linking

A method of ensuring that your internal system (e.g., Vernon) links out to external platforms (Commons, GBIF, BHL, Pinterest, Internet Archive), and that those platforms link back to your internal record using the system ID or PID.

CC Licence (Creative Commons licence)

A set of licences that tell people how they can reuse a work. Common ones on Commons include CC BY, CC BY-SA, or CC0.

Category (Wikimedia Commons)

A folder-like tag that helps people find related images. Example: Images from Auckland Museum.

Commons (Wikimedia Commons)

A media repository that stores freely licensed and public-domain images, videos, and audio.

Credit Line

A short statement acknowledging the institution or donor. Example: Auckland War Memorial Museum Tāmaki Paenga Hira. Required in Commons templates. Often interchangeable with attribution conceptually.

Crosswalk

A table or mapping that shows how metadata fields from one system correspond to fields in another. In GLAM batch uploads, a crosswalk usually shows how your internal metadata (e.g., from Vernon CMS or a spreadsheet export) maps to:

  • Commons template fields
  • Structured Data on Commons (SDC) properties
  • External identifiers (PIDs)
  • Any required transformations or cleaning steps

Crosswalks help ensure consistency, reduce errors, and make it easier to automate uploads or maintain long-term documentation.

Example: “Brief Description → |description= (Commons) → P180 depicts (SDC)”.

Data Model

A structured description of how data is organised and connected within a specific project.

Description (Metadata)

A short, human-readable explanation of what the image shows. This a freetext field and not structured data.

Digital Object

The file (image, video, etc.) being uploaded.

Digital Surrogates

Digitised versions of physical objects such as photos, books, or artworks.

File Page (on Commons)

The page associated with an uploaded image. It includes metadata, licence, categories, and structured data.

GREL

A scripting language used in OpenRefine to transform metadata (e.g., trimming, splitting, merging fields).

Information Template ( {{Information}} )

The default template used when describing an image on Commons, including title, description, date, source, author, permission, and other metadata. Supports the integration of structured data.

Licence Header

The part of the file page that shows how the image can be reused. Controlled by a template.

Linked Open Data (LOD)

Open, structured data that uses consistent identifiers to connect information across different systems. Wikidata is a major Linked Open Data platform used in the GLAM sector because it supports global identifiers for:

  • people
  • species
  • places
  • events
  • collections
  • artworks

LOD allows your image on Commons to be connected to wider cultural and scientific knowledge networks.

Metadata

Information about an image or object — structured or unstructured.
Examples include:

  • Title
  • Description
  • Photographer/Creator
  • Date
  • Object number / System ID
  • Rights or licence
  • Dimensions
  • Species names
  • Locality

For uploads, metadata is typically mapped into:

  • a Commons template (human-readable)
  • Structured Data on Commons (machine-readable)

Metadata quality determines how findable and reusable your images are.

OpenRefine

A tool used to prepare metadata for batch uploads, reconcile entities to Wikidata, and generate Commons templates automatically.

Pattypan

An uploader tool that works from a spreadsheet to upload multiple files to Commons.

Public Domain (PD)

A copyright status meaning the work is free to use for any purpose. Examples include PD-old, PD-US, etc. NB: this is not equivalent to No Known Copyright Restrictions.

PIDs (Persistent Identifiers)

Stable, unique identifiers used to track objects across platforms.

Reconciliation (OpenRefine)

Matching your metadata to Wikidata entries (people, species, places). Helps automate structured data.

Rights Statement

Information about whether and how an image can be reused (copyright status, licence, or public-domain statement).

SDC (Structured Data on Commons)

Machine-readable statements stored in Wikidata format on each Commons file

Example: depicts – Apteryx mantelli.

Source

Where the image came from — the physical object, the photographer, the digital file, or the platform it was extracted from.

Template

A pre-formatted block of wikitext for describing an image.

Wikitext

The markup language used on Wikimedia projects for formatting pages and templates.

Further Reading

https://www.wikidata.org/wiki/Wikidata:Linked_open_data_workflow

How to set up a Batch Uploading Project

This guide outlines the essential steps for GLAM organisations preparing to batch upload images and metadata to Wikimedia Commons. It is designed for staff who may not be information professionals but are managing digital assets, community engagement, or online publication workflows.

Documentation and Project Pages

Wikimedia projects across all projects (Wikipedia, Wikimedia Commons, Wikidata, WikiSource, etc.) tend to be transparent and publicly viewable, commited to open-access, education and cooperation with others. This, by convention, aligns well with the culture within GLAM for institutions to assist and support other institutions.

It is important to begin documenting work and contributions at the very genesis of a project. You may wish to create a general GLAM/Project Page for your institution’s project, and then create a page for more project-specific work (ie; Wikimedia Commons, Wikidata, specific content on Wikipedia) on the relevant platform hosting that project. For batch uploading projects, it is usually best to host the main project documentation on Wikimedia Commons. This keeps the documentation close to the media files being uploaded and makes it easier to link to works directly using short internal links (without needing the c: prefix).

What to include in your project documentation:

  • Project Overview Page: A single page explaining what collection is being uploaded, why, and who is involved.
  • Scope and Selection Criteria: What will be uploaded? (e.g., digitised photographs, specimen images, annual reports)
  • Data Sources: Outline where your core metadata originates (e.g., Vernon CMS, spreadsheet exports, API extracts). You may also wish to include a crosswalk showing how these source fields map to both Structured Data on Commons (SDC) and the fields in the Commons template you are using. Linking your source metadata to the target metadata format effectively creates a data model.
  • Rights & Licensing Summary: A simple table showing each rights category and the corresponding Commons licence template.
  • File Storage and Naming Conventions: Document where files are stored, naming rules, and how your System ID or PIDs are integrated.
  • Workflow Diagrams: Outline the steps: export → clean → map → prepare template → upload → verify → publish structured data.

Create an institution/project template

An institution template provides a standard way to credit your organisation on Wikimedia Commons and ensures that all files uploaded from your institution are consistently attributed. It also helps users find, understand, and reuse your content across Commons, Wikipedia, and other Wikimedia projects.

Institution templates typically include:

  • the full institution name
  • location
  • website or Collections Online link
  • Commons categories associated with your institution
  • optional identifiers (e.g., VIAF, Wikidata QID)
  • a standardised attribution line for reuse

Using an institution template (such as AWMM
) ensures that every file uploaded under your institution’s name displays a uniform credit and links back to the relevant categories and pages on Commons.

Why create one?

  • Gives all uploads a consistent institutional identity
  • Provides a central landing page for users wanting to learn more about the GLAM
  • Simplifies attribution for batch uploads
  • Supports discoverability through categories and links
  • Helps other Wikimedia contributors understand the provenance of your files

Where to place it:

Create the template directly on Wikimedia Commons under your institution’s name (e.g., Template:AWMM).
Store any draft versions in your user space until ready to publish.

Best practice:

Link the institution template from…

  • your batch upload project pages
  • your Commons category (e.g., Category:Images from Auckland Museum)

Creating an Institution Template

An institution template provides a standardised way to credit your organisation on Wikimedia Commons.
It ensures your uploads display consistent attribution, link back to your institution’s pages,
and appear in the correct categories.

The following examples show how to create both:

  • a main institution template (e.g. Template:YourInstitution)
  • a layout subpage (e.g. Template:YourInstitution/layout)

You can adapt these for your own GLAM organisation.

Main Institution Template (copy and paste):

Paste the following into a new page such as:
Template:YourInstitution

{{YourInstitution/layout}}

<noinclude>
{{Documentation}}
[[Category:Source templates]]
[[Category:GLAM templates]]
</noinclude>

This creates the main template shell, calls the layout page, and adds documentation and categories.

Layout Subpage (copy and paste)

Paste the following into:
Template:YourInstitution/layout

{| {{Partnership-Layout}}
| style="width:100px;" | [[File:Example.jpg|100px|Your Institution]]
| <div style="padding-top:6px;">
<big>This file was donated to Wikimedia Commons as part of a batch-uploading project by ''Your Institution''. 
For more information about this project, visit [https://www.example.org example.org].</big>
----
''[[en:Wikipedia:GLAM/Your Institution]]''
</div>
|}

<noinclude>
[[Category:Layout templates|{{PAGENAME}}]]
</noinclude>

This layout determines how the attribution box appears on each file page.

How to Use Your Template

Once created, any file on Wikimedia Commons can display your institutional credit by adding:

{{YourInstitution}}

This will show the layout box and apply any categories assigned in the template.

Example from Wikimedia Commons

For a real working example, see:

  • AWMM – Auckland War Memorial Museum’s institution template
  • AWMM/layout – its layout subpage

Before uploading any files in bulk, it is essential to decide how your institution’s metadata will map to both the Wikimedia Commons file page and Structured Data on Commons (SDC). This mapping is best documented using a metadata crosswalk.

A crosswalk table shows:

  • the internal metadata field (e.g. Vernon field)
  • the cleaned or transformed version (e.g. OpenRefine column)
  • the Commons template parameter
  • the SDC property (if applicable)
  • external identifiers (PIDs)
  • notes and transformation rules

Example Crosswalk Table (Rendered)

Below is an example of how a crosswalk table may look once formatted:

Internal Field Cleaned / Transformed Commons Template Parameter SDC Property External Identifier / PID Notes
Creator primaryMaker author = P170 (creator) Wikidata QID Reconcile creator name to Wikidata using OpenRefine.
System ID systemID N/A P217 (inventory number) Institutional PID Required for data round-tripping.

Copy-and-Paste Wikitext (Code Only)

The following block provides the same table in raw wikitext for reuse:

{| class="wikitable"
! Internal Field
! Cleaned / Transformed
! Commons Template Parameter
! SDC Property
! External Identifier / PID
! Notes
|-
| Brief Description
| cleanedDescription
| <code>|description=</code>
| N/A
| —
| Remove trailing punctuation; ensure sentence case.
|-
| Creator
|primaryMaker
| <code>|author=</code>
| P170 (creator)
| Wikidata QID
| Reconcile creator name to Wikidata using OpenRefine.
|-
| System ID
| systemID
| <code>|accession number=</code>
| P217 (inventory number)
| Institutional PID
| Required for data round-tripping.
|}

Crosswalk tables make it easier to review field mappings, collaborate with team members, and ensure consistent metadata preparation across batch uploads.

There are several tools used within the Wikimedia community to support batch uploads. Your choice of tool will depend on the scale of the project, technical skill level, and the amount of metadata preparation required.

OpenRefine

A powerful data-cleaning and transformation tool that can:

  • clean and standardise metadata
  • reconcile creators, places, species, or subjects to Wikidata
  • generate Commons-ready Wikitext
  • upload files directly to Commons using the Wikimedia extension
  • create SDC statements at scale

OpenRefine is recommended for medium to large projects, or whenever metadata needs extensive preparation.

Pattypan

A spreadsheet-driven uploader that:

  • works well for small to medium uploads
  • is easy for beginners to use
  • accepts custom templates
  • allows quick iteration and corrections

Pattypan is useful when metadata is already clean or when multiple contributors are working with a shared spreadsheet.

Python / Pywikibot scripts

Best suited for:

  • very large or automated projects
  • uploading from APIs or external repositories
  • synchronising institutional systems with Commons
  • recurring or programmatic uploads

Scripts can also be used for auditing, updating structured data, and reporting on reuse.

Workflow Overview

A typical batch upload follows these steps:

  1. export metadata from your CMS or API
  2. clean and normalise the dataset (OpenRefine)
  3. reconcile entities to Wikidata
  4. map fields to your Commons template and SDC structure
  5. generate Wikitext or an upload spreadsheet
  6. upload via OpenRefine, Pattypan, or a script
  7. verify file pages, categories, and SDC statements
  8. publish project documentation and record outcomes

Documenting this workflow helps ensure replicability and transparency for future contributors.

You can read more about our most recent Batch Uploading Project here.

Commons Categorisation for GLAMs

Categories are an important part of organising files on Wikimedia Commons and making GLAM collections visible to the public. They help users browse related material and support internal tracking of uploaded collections.

Institution Category: Every GLAM should maintain at least one overarching institutional category such as:
Category:Images from Your Institution

This provides a central place to browse all uploaded content.

Collection or Series Categories: Where appropriate, you may create categories for:

  • specific collections
  • projects
  • artists or photographers represented in your holdings
  • thematic subsets (e.g. “Botanical illustrations from …”)
  • These help structure large deposits of material and aid discoverability.

Subject Categories: Commons also supports subject-based browsing through categories such as:

  • natural history
  • architecture
  • historical events
  • artworks and media types

Assigning relevant subject categories improves visibility across Wikimedia projects. Auckland Museum uses both a flat (no nested categories) tracking category: Images from Auckland Museum and a descriptive category containing both collections and subjects nested in subcategories: Collections of Auckland Museum.

Keep categories as simple and as widely usable as possible
and use existing categories rather than creating new ones unless needed. It is also recommended to avoid over-categorisation, to ensure your institution category is linked to your institution template, and to supplement categorisation of images with SDC “depicts” statements on the images themselves for improved machine-readability, linking between platforms, and querying within commons.

Reporting and Engagement Insight

Once your institution’s files are uploaded, it is useful to track how the material is being used across Wikimedia platforms and beyond. This provides insight into public engagement and helps quantify the impact of your digital collections.

Tools for Measuring Usage

GLAMorgan – monthly reports showing where your files appear on Wikipedia, including view counts

PetScan – allows filtering and reporting on specific categories

WhatLinksHere – identifies pages using specific files

Commons file statistics – provides basic per-file view and usage information

Your internal audit scripts – Python or API-driven reports tailored to your institution’s data

External platform analytics – Pinterest, GBIF, BHL, Internet Archive, Flickr etc.

Why Reporting Matters

  • demonstrates the value of digitisation and open access
  • supports internal and external funding cases
  • informs future content priorities
  • identifies high-impact collections
  • highlights opportunities for further engagement or public programming
  • assists with long-term monitoring of your institution’s contribution to the Wikimedia ecosystem

Documenting Your Impact

Maintaining a central reporting page or dashboard helps track:

  • file counts and uploads over time
  • usage statistics
  • structured data completeness
  • cross-platform reuse
  • links back to your Collections Online platform

Consistent reporting provides measurable insight into how your content is being accessed and reused globally.

Templates Created

The following custom templates were created or significantly edited to support Auckland Museum uploads:

All templates are documented under:
Category:Auckland War Memorial Museum templates

Commons Categories Created

Each upload batch is organized into its own Commons category and linked to its corresponding Wikidata item.

Total Uploads

Current Uploads Total: There are 182,726 files in Images from Auckland Museum, as of November 2025. Following the development of the revised Batch Uploading methodology we have contributed 4,465 files.

  • These images span across 19th and 20th century photography, object catalogue photography, biodiversity heritage illustrations, and official historic museum documents
  • All uploads are tagged with source metadata and structured data where possible

In addition to serving as a long-term preservation and access platform, Wikimedia Commons functions as a practical image repository for derivative workflows. Auckland Museum has developed an approach demonstrating how Commons URLs can be repurposed for reuse across other digital platforms, reducing duplication of effort and ensuring a single, openly licensed master file remains at the center of distribution.

By using structured Commons URLs for each file, images can be referenced directly in bulk uploads to other platforms (such as Pinterest) without re-hosting or recompression. This method has been particularly valuable for GLAM organisations that do not yet have an internal API capable of delivering high-resolution images at scale. The process leverages the openness of Commons to streamline cross-platform publishing.

Detailed guidance on this workflow is available here:

OpenRefine Direct Image Extraction – demonstrates how to generate direct file URLs for Commons images suitable for reuse or export workflows.

Roundtrip Workflow: Uploading Auckland Museum Images from Wikimedia Commons to Pinterest – outlines the CSV format, field mapping, and metadata retention approach for reusing Commons images in external environments.

Together these examples highlight Commons’ capacity to act as a sustainable digital infrastructure layer, bridging GLAM repositories, external platforms for ease of distribution and proliferation of contributions to Open Access cultural data no matter what a GLAM’s budget, resourcing or existing technical infrastructure might be.

Leave a Comment

Your email address will not be published. Required fields are marked *

Exit mobile version