Documentation: Valossa Core API

Version of documentation: 2017-05-24

Current version of Valossa Core metadata JSON format: 1.2.1

See metadata JSON format version changelog.

Index:

API overview

Input data formats

Usage examples

Creating a new video analysis job

Getting status of a single job

Getting the results of a job

Listing statuses of all your jobs

Cancel a job

Understanding visualization of results in a heatmap

Output metadata JSON format

Code examples

Metadata JSON format version changelog

Persons in the celebrity face gallery

API overview

The Valossa Core API is a REST API for automatic video content analysis and metadata creation. Metadata contains detections of concepts from the video. Metadata describes the following detections: humans with visible faces (with attributes such as gender and age information and the possibly detected similarity to persons in our celebrity face gallery), visual context (such as objects, animals, scenes and styles), audio context (music styles and musical moods, instruments, sounds of human actions and animals and machines etc., sound effects), topical keywords from speech (speech source being either the audio track or the SRT transcript file of the video), topical keywords from video description, face groups of co-occurring faces, IAB categories for content related advertising, detected shot boundaries, and explicit content detection (visual nudity detection and offensive words detection from audio and transcript).

Detections are also provided in different practical groupings: grouped by detection type (best concepts first, and including time-coded occurrences of the concept when applicable) and grouped by second ("What is detected at 00:45 in the video?").

The API is invoked using HTTP (HTTPS) requests. The responses from the API are in the JSON format. Please use the secure HTTPS transport in all API calls to protect your valuable data and confidential API key: unencrypted HTTP transport is not allowed by the API.

Get your API key from under "My profile" - "API keys" in the Valossa AI Portal. If you have several applications, please create a different API key for each of them, on the "API keys" page. Give a descriptive name for each API key, and if necessary, give access rights of the API key to different users of your organization.

Note: As the administrator user of your organization, you can create new users under "My profile" - "Manage users". If your organization has several people who need to use the Portal, you should add them manually in "Manage users", so they are all mapped to your organization and may view analysis results (if you give the rights to the users in Portal) and post new jobs (if you give the rights). The permissions are mappings between users and API keys ("this user has read-write access to this API key so she can both view results and make new job requests"), so please configure the permissions understanding this; the API key permissions per user can be edited in "Manage users". For your company/organization, you must create only one customer account, but there can be as many users under the customer account as you wish!

The API consists of 4 different functions:

  1. new_job [HTTP POST]
    This function is used to create a new video analysis job in our system. The job (including e.g. the URL of the video file to be analyzed) is defined by using a JSON formatted data structure that is included as the body of the HTTP POST request. This function returns the job_id (UUID value) of the created job. The job_id is used after this as the identifier when querying the status and the results of the job.
  2. job_status [HTTP GET]
    This function is used to query (poll) the status of a specific job, based on its job_id.
  3. job_results [HTTP GET]
    This function is used to fetch the resulting metadata JSON of a finished analysis job identified by its job_id.
  4. list_jobs [HTTP GET]
    This function lists all the jobs for a given API key.
  5. cancel_job [HTTP POST]
    This function cancels a job, if it is in a cancellable state.

You can conveniently monitor the status of your jobs in the Valossa AI Portal. In the Portal, you can also call the new_job function of the API with an easy API request generator.

Your API key is shown in the Valossa AI Portal on the request generator page and job results page. Keep the key confidential.

Please note regarding speech analysis:

  • If you already have the speech transcript of your video in the SRT format (for example the subtitles of your movie), please specify the transcript URL in the request, along with the video URL. The transcript content will be analyzed, and the detected concepts will be included in the "transcript" part of the metadata JSON.
  • Your existing transcript is, obviously, a more reliable source for speech information than audio analysis. So, if you have the transcript, please use it – it’s a valuable asset!
  • Audio detection will be performed only if you did not provide the SRT transcript.
  • The audio-related metadata generated by us will not contain an actual audio transcript. Instead, we provide you a uniquely descriptive set of keywords extracted from the speech. Whether the source of speech information is audio itself or your transcript file, the output format of the detected keywords is similar in the metadata.

Input data formats

Supported video formats: we support most typical video formats, including but not limited to MP4, MPEG, AVI, FLV, WebM, with various codecs. Currently, we cannot provide a fixed list of supported formats and codecs, but for example MP4 with the H.264 codec works.

Video file size limit: 5GB per video file.

Video duration limit: 7 hours of playback time per video file.

Video vertical resolution limit: 1080 pixels.

Currently, the only supported language for audio detection is English.

If the video file contains several video streams, only the first one is analyzed.

If the video file contains several audio streams, only the first one is analyzed (and only if the SRT transcript is not provided). The audio stream can be either mono or stereo.

Supported transcript format: SRT only.

File size limit: 5MB per SRT file.

Currently, the only supported transcript language is English.

Usage examples

Creating a new video analysis job

You need to have purchased enough Valossa Credit in order to run a video analysis job. See details in the credit buying section of the Valossa AI Portal.

Send an HTTP POST to the URL:

https://api.valossa.com/core/1.0/new_job

Example new_job request body in JSON format:

{
  "api_key" : "kkkkkkkkkkkkkkkkkkkk",
  "media": {
    "title": "The Action Movie",
    "description": "Armed with a knife, Jack Jackson faces countless dangers in the jungle.",
    "video": {
      "url": "https://example.com/content/Action_Movie_1989.mp4"
    },
    "transcript": {
      "url": "https://example.com/content/actionmovie.srt"
    }
  }
}

The video URL and transcript URL can be either http:// or https:// based.

The video URL is mandatory. The URL must directly point to a downloadable video file. Our system will download the file from your system.

The transcript URL is optional – but recommended, because an existing SRT transcript is a more reliable source of speech information than audio analysis. The URL must directly point to a downloadable transcript file. Our system will download the file from your system.

The title is optional – but recommended: a human-readable title makes it easy for you to identify the video on the results page of Valossa AI Portal, and will also be included in the metadata file.

The description is optional. Description is any freetext, in English, that describes the video.

If title and/or description are provided in the call, the text in them will also be analyzed, and the detected concepts will be included in the analysis results (the "external" concepts in the metadata JSON).

If the analysis is technically successful (i.e. if the job reaches the "finished" state), the job will consume your Valossa Credit always based on full minutes. Example: a video of length 19 minutes 40 seconds will consume Valossa Credit for an equivalent of 20 minutes. You can purchase Valossa Credit for your organization when you are logged in to the Valossa AI Portal. On the credit buying page, please note the handy Auto-Refill functionality, which is able to automatically re-purchase more Valossa Credits when your Valossa Credit is about to run out.

Here is an example new_job request body with only the mandatory fields present:

{
  "api_key": "kkkkkkkkkkkkkkkkkkkk",
  "media": {
    "video": {
      "url": "https://example.com/my-car-vid.mpg"
    }
  }
}

The response of a successful new_job call always includes the job_id of the created job.

Example response in an HTTP 200 OK message:

{
  "job_id": "6faefb7f-e468-43f6-988c-ddcfb315d958"
}

Jobs are identified by UUIDs, which appear in "job_id" fields in various messages. Your script that calls the API must, of course, save the job_id from the new_job response in order to be able to query for the status and results later.

Example test call with Curl on the command line, if your test request JSON is in a file created by you:

curl --header "Content-Type:application/json" -X POST -d @your_request.json https://api.valossa.com/core/1.0/new_job

Error responses contain an error message, and can be automatically separated from 200 OK responses, because error responses are sent along with an HTTP error code (non-200). Error responses are also formatted as JSON, and they contain an "errors" array, where one or more errors are listed with the corresponding error messages.

An error message about missing Valossa Credit may appear either at this point (if you have zero credits — you cannot post a job when having zero credits), or alternatively, the job's status may change to "error" only later when the job is taken from the queue and the lack of required Valossa Credit is noticed by the system at that time.

Example error response in an HTTP 400 message:

{
  "errors": [
    {
      "message": "Invalid API key"
    }
  ]
}

Getting status of a single job

The status of a single analysis job is polled using HTTP GET.

Example request:

https://api.valossa.com/core/1.0/job_status?api_key=kkkkkkkkkkkkkkkkkkkk&job_id=6faefb7f-e468-43f6-988c-ddcfb315d958

Example response in an HTTP 200 OK message:

{
  "status": "processing",
  "details": null,
  "poll_again_after_seconds": 600
}

Possible values for the "status" field: "queued", "processing", "finished" and "error". More status values may be introduced in the future.

The "details" field may contain some additional details about the status of the job.

If the job status is "queued" or "processing", you should poll the status again after some time. If the status is "finished", you can fetch the job results using the job_results function.

If the status is "error", something went wrong during the analysis process. If there is an explanation of the error in the "details" field, please see if the cause of the error is something you can fix for yourself (such as a non-video file in the video URL of the job request). Otherwise, contact us in order to resolve the issue.

The value in "poll_again_after_seconds" is just a suggestion about when you should poll the job status again (expressed as seconds to wait after the current job_status request).

If there was a problem with the job_status query itself, the error will be indicated in an HTTP non-200 response with a JSON body, similar to the error responses of the new_job function.

Getting the results of a job

After a job has been finished, the resulting video metadata can be fetched using HTTP GET.

Example request:

https://api.valossa.com/core/1.0/job_results?api_key=kkkkkkkkkkkkkkkkkkkk&job_id=6faefb7f-e468-43f6-988c-ddcfb315d958

Response data is in the JSON format. For more details, see chapter "Output metadata JSON format".

Save the metadata and use it from your own storage disk or database for your easy and quick access. We will not necessarily store the results perpetually.

If there was a problem with the job_results query itself, the error will be indicated in an HTTP non-200 response with a JSON body, similar to the error responses of the new_job function.

Listing statuses of all your jobs

Convenience function for listing all your job statuses, using HTTP GET:

Example request:

https://api.valossa.com/core/1.0/list_jobs?api_key=kkkkkkkkkkkkkkkkkkkk&show_status=true

Example response in an HTTP 200 OK message:

{"jobs": [
  {
    "job_id": "6faefb7f-e468-43f6-988c-ddcfb315d958",
    "job_status":
    {
      "status": "finished",
      "details": null,
      "poll_again_after_seconds": null
    }
  },{
    "job_id": "36119563-4b3f-44c9-83c6-b30bf69c1d2e",
    "job_status":
    {
      "status": "processing",
      "details": null,
      "poll_again_after_seconds": 600
    }
  }
]}

By adding the optional GET parameter "n_jobs" to the request (example: n_jobs=500), you can control how many of your jobs will be listed if your job list is long. The default is 200. The maximum possible value for "n_jobs" is 10000.

If there was a problem with the list_jobs query itself, the error will be indicated in an HTTP non-200 response with a JSON body, similar to the error responses of the new_job function.

Cancel job

Cancel a job by sending a HTTP POST to the URL:

https://api.valossa.com/core/1.0/cancel_job

Example cancel_job request body:

{
  "api_key": "kkkkkkkkkkkkkkkkkkkk",
  "job_id": "be305b1e-3671-45b1-af88-1f052db3d1bb"
}

Example response in an HTTP 200 OK message:

{
  "job_status": "canceled"
}

The job must be in a cancellable state for this function to succeed. For example, a finished job is not cancellable.

If there was a problem with the cancel_job query itself, the error will be indicated in an HTTP non-200 response with a JSON body, similar to the error responses of the new_job function.

If you are interested in deleting job media from the service, contact us.

Understanding visualization of results in a heatmap

The Valossa AI Portal provides an easy-to-use visualization tool, called heatmap, for you to get a quick visual overview of the most prominent detections. On the Results page, there's a link to the heatmap of each successfully analyze video.

A heatmap contains a timeline of the video, and detections of concepts are placed on the timeline. Each detection is shown on its own row (its own timeline). Detections are grouped by their detection type (e.g. faces — note that different colors are given to different detection types for distinguishing them visually, but the colors don't have any other meaning). Within a detection type, detections are grouped by prominence: for example, the most prominent facfes are shown first. With the heatmap controls, you can change the resolution of the timeline (how many distinct timeslots are shown) and the number of detections shown.

The colored spots on the timeline for a detection show how much detections of that concept are in that timeslot, or how strong the detections are. Click on a colored spot and the video player on the heatmap page will playback the video from the corresponding timecode. Thus, you are able to see the main concepts of the video arranged by time and prominence, and verify their correctness. With the main timeline and seek bar under the video player, you can also move to any time-position in the video.

Output metadata JSON format

Starting on date 2017-02-24, i.e., starting with metadata version 1.0.0, the Core metadata format has undergone major improvements, resulting in data structures that are completely different from the old format (0.6.*).

The Core metadata about your videos is hierarchical and straightforward to parse. High-level structure of the current Valossa AI video metadata JSON format, not showing detailed subfields:

{
  "version_info": { "metadata_format" : "...", "backend": "..." },
  "job_info": { "job_id": "...", "request": {...} },
  "media_info": { ... },
  "detections": { ... },
  "detection_groupings": { 
    "by_detection_type": { ... },
    "by_second": [ ... ]
  },
  "segmentations": { ... },
  "data_entities": { ... }
}

You will best understand the details of the metadata structure by viewing an actual metadata JSON file generated from one of your own videos!

All concept detections from the video are listed in "detections". This is an associative array, where the key is detection ID and the value is the corresponding detection.

The detection IDs are used in "detection_groupings" to refer to the specific detection, so the detailed information about each detection resides in one place in the JSON but may be referenced from multiple places using the ID. In "detection_groupings", several practical groupings of detections are given for you:

  • The subfield "by_detection_type" has detection type identifiers as the key and the value is an array of detection IDs; the array is sorted by relevance, most relevant detections first! Using "by_detection_type", you can easily for example list all the detected faces, or all the detected audio-based keywords. Want to find out whether there's a cat somewhere in your video? Just loop over the visual.context detections and match detections against the Google Knowledge Graph concept identifier of "cat" ("/m/01yrx"), or against the human-readable concept label if you're adventurous; see details below. (Note: Because some identifiers were inherited from the Freebase ontology to Google Knowledge Graph, some of the identifiers are Freebase-compatible such as in the example case of "cat", but not all identifiers are from Freebase; so you should use Google Knowledge Graph as the reference ontology.)
  • The subfield "by_second" contains an array, where each item corresponds to one second of the video. Using this array you can answer questions such as "What is detected at 00:45 in the video?". Under each second, there is an array of objects which contain at least the field "d" (detection ID). Using the detection ID as the index, you will find the detection from the "detections" list. If applicable, there is also the field "c" (confidence), currently available only for visual.context and audio.context detections. If the field "o" exists, it contains an array of occurrence identifiers that correspond to this detection in this second.

In "segmentations", the video is divided into time-based segments using different segmentation rules. Currently we support automatically detected shot boundaries, hence "segmentations" contains "detected_shots".

The field "data_entities" contains data entities that cannot be directly expressed in "detections". Currently, the only possible field under "data_entities" is "speech_to_text_srt", which exists if and only if a non-empty SRT-based speech transcript was created from the audio during the analysis. The field "speech_to_text_srt" is indexed with audio track identifiers. Currently, only one track is supported; the object is stored under the identifier "default", and the object contains the SRT in a subfield called "data" and may additionally contain metainformation about the speech transcript. Thus, if SRT speech transcript exists, the raw SRT data is available in the field core_metadata["data_entities"]["speech_to_text_srt"]["default"]["data"].

How to get an overview of the most prominent detections? That's easy: in "by_detection_type", start reading detections from the beginning of the lists under each detection type. Because the detections are sorted the most relevant ones ffirst, reading e.g. the 20 first detections from "human.face" gives you an overview of the most prominent faces in the video. For an easy and quick overview of detections, you may view the heatmap (visualization of detections) of the video in the Valossa AI Portal.

Every detection in the JSON has, at minimum, the fields "t" (detection type identifier) and "label". The "label" is just the default human-readable label of the detected concept, and for many detection types, more specific information is available in additional data fields. The following is the list of currently supported detection type identifiers.

visual.context
audio.context
human.face
human.face_group
transcript.keyword.novelty_word
transcript.keyword.name.person
transcript.keyword.name.location
transcript.keyword.name.organization
transcript.keyword.name.general
audio.keyword.novelty_word
audio.keyword.name.person
audio.keyword.name.location
audio.keyword.name.organization
audio.keyword.name.general
external.keyword.novelty_word
external.keyword.name.person
external.keyword.name.location
external.keyword.name.organization
external.keyword.name.general
topic.iab.transcript
topic.iab.visual
topic.iab.audio
explicit_content.nudity
explicit_content.audio.offensive
explicit_content.transcript.offensive

The identifiers are mostly self-explanatory. Please note that "visual.context" offers a broad range of visual detections such as objects; "audio.context" offers a broad range of audio-based detections; "topic.iab.*" are IAB categories for the entire video; "external.keyword.*" refers to keywords found from video description or title; "human.face_group" are people who have a temporal correlation high enough to probably have meaningful interaction with each other.

Fields that exist or don't exist in a detection, depending on the detection type and situation, include "occs", "a", "ext_refs" and "cid".

The field "occs" contains the occurrence times of the detection. There is a start time and an end time for each occurrence. For example, a visual object "umbrella" might be detected 2 times: first occurrence from 0.3 seconds to 3.6 seconds, and another occurrence from 64.4 seconds to 68.2 seconds — so there would be 2 items in the "occs" array. Time values are given as seconds "ss" (seconds start) and "se" (seconds end), relative to the beginning of the video.

Detections that are not time-bound (e.g. topic.iab.* and external.keyword.*) cannot contain "occs".

If applicable to the detection type, occurrences have a maximum confidence ("c_max") detected during the occurrence period. (Because confidence varies at different moments during the occurrence, it makes sense to provide just the maximum value here. To find out the confidence during a particular moment, check out the "c" field of each second in the "by_second" data.) Currently, only visual.context detections have "c_max".

Please note that if you want to answer the question "What is in the video at time xx:xx?", then you should see the "by_second" array in the "detection_groupings". Occurrences, on the other hand, are good when you want to answer the question "At what time-sections is Y detected?"

If exists, the object-field "a" contains attributes of the detection. For example, the "human.face" detections may have attributes: "gender" that includes the detected gender and "s_visible" i.e. the total screen-time of the face, and "similar_to" i.e. possible visual similarity matches to celebrities in the face gallery. The "gender" and structure also contains the confidence field "c" (0.0 to 1.0).

If exists, the array-field "ext_refs" contains references to the detected concept in different ontologies. Most visual.context detections have "ext_refs", expressing the concept identity in the Google Knowledge Graph ontology. Inside "ext_refs", the ontology identifier for Google Knowledge Graph is "gkg" (see examples). As noted above, some of the concept IDs originate from the Freebase ontology, and Freebase concept IDs are contained as a subset the Google Knowledge Graph ontology.

If exists, the string-field "cid" contains the unique identifier of the concept in the Valossa Concept Ontology. Most visual.context detections and audio.context detections have "cid".

Faces-related information: Use the detection ID as the unique identifier per face within each video. If and only if a face is similar to one or more faces in our face gallery, the "a" field of the specific face object will contain a "similar_to" field that contains an array of the closely matching faces (for example, a specific celebrity), and within each item in the "similar_to" array there is a "name" field providing the name of the visually similar person and "c" that is the confidence (0.0 to 1.0) that the face is actually the named person. In "similar_to", the matches are sorted the best first. Please note that a face occurrence doesn't directly contain "c" — confidence for faces is only available in the "similar_to" items.

The "explicit_content.nudity" has two possible detections: "bare skin" and "nsfw".

The "detected_shots" in segmentations provides shot boundaries with seconds-based start and end timepoints ("ss", "se") and with start and end frame numbers ("fs", "fe"). Note: frame-numbers are 0-based, i.e. the first frame in the video has the number 0.

An example of what is found in "detections", a visual.context detection:

...
"86": {
  "t": "visual.context",
  "label": "hair",
  "cid": "lC4vVLdd5huQ",
  "ext_refs": {
    "gkg": {
      "id": "/m/03q69"
    }
  },
  "occs": [
    {
      "ss": 60.227,
      "se": 66.191,
      "c_max": 0.80443,
      "id": "267"
    },
    {
      "ss": 163.038,
      "se": 166.166,
      "c_max": 0.72411,
      "id": "268"
    }
  ]
},
...

Another example from "detections", a human.face detection:

...
"64": {
  "t": "human.face",
  "label": "face",
  "a": {
    "gender": {
      "c": 0.929,
      "value": "female"
    },
    "s_visible": 4.4,
    "similar_to": [
      {
        "c": 0.92775,
        "name": "Molly Shannon"
      }
    ]
  },
  "occs": [
    {
      "ss": 28.333,
      "se": 33.567,
      "id": "123"
    }
  ]
},
...

An example of an audio.context detection:

...
"12": {
  "t": "audio.context",
  "label": "exciting music",
  "cid": "o7WLKO1GuL5r"
  "ext_refs": {
    "gkg": {
      "id": "/t/dd00035"
    }
  },
  "occs": [
  {
    {
      "ss": 15,
      "se": 49
      "c_max": 0.979,
      "id": "8",
    }
  ],
},
...

An example of an IAB category detection:

...
"173": {
  "t": "topic.iab.transcript",
  "label": "Personal Finance",
  "ext_refs": {
    "iab": {
      "labels_hierarchy": [
        "Personal Finance"
      ],
      "id": "IAB13"
    }
  }
},
...

An example of keyword detection:

...
"132": {
  "t": "transcript.keyword.name.location",
  "label": "Chillsbury Hills",
  "occs": [
    {
      "ss": 109.075,
      "se": 109.975,
      "id": "460"
    }
  ]
}
...

Please note that transcript keyword occurrence timestamps are based on SRT timestamps. In the future, if a non-timecoded transcript is supported, transcript keywords might not have occurrences/timecoding.

The version number of the metadata format will be updated in the future, when the format changes. See the version changelog below. If only the minor version number (z in x.y.z) changes, the changes are purely additions to the structure i.e. they can't break the parsing code.

Code examples

Example code snippet (in Python) that illustrates how to access the data fields in Valossa Core metadata JSON:

import json
metadata = None
with open("your_core_metadata.json", "r") as jsonfile:
        metadata = json.loads(jsonfile.read())

# Loop over all detections so that they are grouped by the type
for detection_type, detections_of_this_type in metadata["detection_groupings"]["by_detection_type"].iteritems():
        print "----------"
        print "Detections of the type: " + detection_type + ", most relevant detections first:"
        print
        for det_id in detections_of_this_type:
                print "Detection ID: " + det_id
                detection = metadata["detections"][det_id]
                print "Label: " + detection["label"]
                print "Detection, full info:"
                print detection

                # Example of accessing attributes (they are detection type specific)
                if detection_type == "human.face":
                        attrs = detection["a"]
                        print "Gender is " + attrs["gender"]["value"] + " with confidence " + str(attrs["gender"]["c"])
                        if "similar_to" in attrs:
                                for similar in attrs["similar_to"]:
                                        print "Face similar to person " + similar["name"] + " with confidence " + str(similar["c"])

                # More examples of the properties of detections:

                if detection_type == "visual.context" or detection_type == "audio.context":
                        if "ext_refs" in detection:
                                if "gkg" in detection["ext_refs"]:
                                        print "Concept ID in GKG ontology: " + detection["ext_refs"]["gkg"]["id"]

                if "occs" in detection:
                        for occ in detection["occs"]:
                                print "Occurrence starts at " + str(occ["ss"]) + "s from beginning of video, and ends at " + str(occ["se"]) + "s"
                                if "c_max" in occ:
                                        print "Maximum confidence of detection during this occurrence is " + str(occ["c_max"])
                                        # If you need the condifence for a particular time at second-level accuracy, see the by_second grouping of detections

                print
        print

# Example of listing only audio (speech) based word/phrase detections:
for detection_type, detections_of_this_type in metadata["detection_groupings"]["by_detection_type"].iteritems():
        if detection_type.startswith("audio.keyword."):
                for det_id in detections_of_this_type:
                        detection = metadata["detections"][det_id]
                        print "Label: " + detection["label"]  # etc... You get the idea :)
print

# Example of listing only detections of a specific detection type:
if "human.face" in metadata["detection_groupings"]["by_detection_type"]:
        for det_id in metadata["detection_groupings"]["by_detection_type"]["human.face"]:
                detection = metadata["detections"][det_id]  # etc...
print

# Example of listing IAB categories detected from different modalities (visual/audio/transcript) of the video
for detection_type, detections_of_this_type in metadata["detection_groupings"]["by_detection_type"].iteritems():
        if detection_type.startswith("topic.iab."):
                for det_id in detections_of_this_type:
                        detection = metadata["detections"][det_id]  # etc...
                        print "IAB label, simple: " + detection["label"]
                        print "IAB ID: " + detection["ext_refs"]["iab"]["id"]
                        print "IAB hierarchical label structure:"
                        print detection["ext_refs"]["iab"]["labels_hierarchy"]
print

# Time-based access: Loop over time (each second of the video) and access detections of each second
sec_index = -1
for secdata in metadata["detection_groupings"]["by_second"]:
        sec_index += 1
        print "----------"
        print "Detected at second " + str(sec_index) + ":"
        print
        for detdata in secdata:
                det_id = detdata["d"]
                if "c" in detdata:
                        print "At this second, detection has confidence " + str(detdata["c"])
                if "o" in detdata:
                        # If for some reason you need to know the corresponding occurrence (time-period that contains this second-based detection)
                        print "The detection at this second is part of one of more occurrences. The occurrence IDs, suitable for searching within the 'occs' list of the 'detection' object, are:"
                        for occ_id in detdata["o"]:
                                print occ_id
                print "Detection ID: " + det_id
                detection = metadata["detections"][det_id]
                print "Label: " + detection["label"]
                print "Detection of the type " + detection["t"] + ", full info:"
                # Of course, also here you can access attributes, cid, occurrences etc. through the "detection" object
                # just like when you listed detections by their type. In other words, when you just know the ID
                # of the detection, it's easy to read the information about the detection by using the ID.
                print detection
                print

Metadata JSON format version changelog

1.1.0: more compact format

1.0.0: large changes, completely deprecated old version 0.6.1.

Persons in the celebrity face gallery

Aaron Staton
Aarti Mann
Adam Baldwin
Adam Sandler
Adele
Aidan Gillen
Aidy Bryant
Alanna Masterson
Alan Tudyk
Alec Baldwin
Alfie Allen
Al Gore
Alicia Keys
Alison Brie
Al Pacino
Amanda Schull
Amy Poehler
Anabelle Acosta
Andrew J. West
Andrew Lincoln
Andrew Scott
Andy Samberg
Angela Merkel
Angela Sarafyan
Angelina Jolie
Anna Faris
Anna Paquin
Anthony Bourdain
Anthony Hopkins
Antonio Banderas
Ariel Winter
Arnold Schwarzenegger
Art Parkinson
Aubrey Anderson-Emmons
Audrey Hepburn
Aung San Suu Kyi
Aunjanue Ellis
Austin Nichols
Ban Ki-moon
Barack Obama
Bashar Assad
Beck Bennett
Ben Affleck
Ben Barnes
Benedict Cumberbatch
Ben Feldman
Benjamin Netanyahu
Ben Kinglsey
Ben Mendelson
Ben Stiller
Beyonce
Bill Gates
Bill Hader
Bill Murray
Billy Crystal
Bobby Moynihan
Bob Newhart
Bono
Brad Pitt
Britney Spears
Brooks Wheelan
Bruce Springsteen
Bruce Willis
Bryan Batt
Caleb McLaughlin
Cameron Diaz
Cara Buono
Carice van Houten
Carrie-Anne Moss
Carrie Fisher
Cary Grant
Catherine Zeta-Jones
Cecily Strong
Chad Coleman
Chandler Riggs
Charles Chaplin
Charles Dance
Charlie Heaton
Charlie Sheen
Charlie Watts
Chevy Chase
Christian Bale
Christian Serratos
Christina Hendricks
Christine Woods
Christopher Stanley
Christopher Walken
Christoph Walz
Clifton Collins Jr.
Clint Eastwood
Conan O'Brien
Conleth Hill
Constance Zimmer
Corey Stoll
Cristiano Ronaldo
Curtiss Cook
Daisy Ridley
Dalai Lama
Dana Carvey
Danai Gurira
Dan Aykroyd
Daniel Portman
Darrell Hammond
David Beckham
David Cameron
David Copperfield
David Harbour
David Letterman
David Morrissey
David Spade
Dean-Charles Chapman
Denzel Washington
Derek Cecil
Desmond Tutu
Diana Rigg
Diego Luna
Donald Trump
Donnie Yen
Don Pardo
Dr. Dre
Drew Barrymore
Dr. Phil McGraw
Dustin Hoffman
Dwayne Johnson
Eddie Murphy
Ed Harris
Ed O'Neill
Edward Snowden
Elisabeth Moss
Eliza Coupe
Elizabeth Marvel
Elizabeth Norment
Ellen Burstyn
Elon Musk
Elton John
Emilia Clarke
Emily Kinney
Eminem
Emma Bell
Emma Thompson
Emma Watson
Eric Stonestreet
Eugene Simon
Evan Rachel Wood
Ewan McGregor
Felicity Jones
Fernando Alonso
Fidel Castro
Finn Jones
Finn Wolfhard
Forest Whitaker
Francois Hollande
Fred Armisen
Gabriel Macht
Gary Oldman
Gaten Matarazzo
Gemma Whelan
George Bush Jr.
George Clooney
Gerald McRaney
Gilbert Gottfried
Gilda Radner
Gina Torres
Gordon Ramsay
Graham Rogers
Gwendoline Christie
Halle Berry
Hannah Murray
Harrison Ford
Heath Ledger
Heidi Klum
Hillary Clinton
Horatio Sanz
Howard Stern
Hugh Grant
Hugh Jackman
Hugh Laurie
Iain Glen
Ian Whyte
Indira Varma
Ingrid Bolsø Berdal
IronE Singleton
Isaac Hempstead Wright
Ivanka Trump
Iwan Rheon
J. Mallory McCree
Jack Gleeson
Jackie Chan
Jack Nicholson
Jacob Anderson
Jake McLaughlin
James Corden
James Marsden
James Mattis
Jamie Foxx
Jamie Lee Curtis
Jan Hooks
January Jones
Jared Harris
Jared Kushner
Jason Momoa
Jason Sudeikis
Jay Pharoah
Jay R. Ferguson
Jeff Bezos
Jeffrey Dean Morgan
Jeffrey DeMunn
Jeffrey Wright
Jennifer Aniston
Jennifer Lawrence
Jennifer Lopez
Jennifer Love Hewitt
Jeremy Bobb
Jeremy Holm
Jeremy Maguire
Jerome Flynn
Jerry Seinfeld
Jerry Springer
Jesse Tyler Ferguson
Jessica Alba
Jessica Lange
Jessica Paré
Jewel Staite
Jiang Wen
Jim Carrey
Jimmi Simpson
Jimmy Carter
Jimmy Fallon
Jimmy Kimmel
Jim Parsons
Jodie Foster
Joe Dempsie
Joe Keery
Johanna Braddy
John Belushi
John Boyega
John Bradley-West
John Cena
John Goodman
Johnny Depp
Johnny Galecki
John Ross Bowie
John Slattery
John Travolta
Jonathan Pryce
Jon Bernthal
Jon Hamm
Jon Stewart
Joseph Gordon-Levitt
Josh Hopkins
Josh McDermitt
Jude Law
Juha Sipilä
Julia Louis-Dreyfus
Julian Assange
Julian Glover
Julianne Moore
Julia Roberts
Julie Bowen
Justin Bieber
Justin Doescher
Justin Timberlake
Kaley Cuoco
Kanye West
Kate Hudson
Kate Mara
Kate McKinnon
Kate Micucci
Kate Winslet
Katy Perry
Keanu Reeves
Keith Richards
Kenan Thompson
Ken Arnold
Kevin Nealon
Kevin Rahm
Kevin Spacey
Kevin Sussman
Khloé Kardashian
Kiefer Sutherland
Kiernan Shipka
Kim Jong Un
Kim Kardashian
King Salman bin Abdulaziz al Saud
Kit Harington
Kourtney Kardashian
Kristen Connolly
Kristen Wiig
Kristian Nairn
Kristofer Hivju
Kunal Nayyar
Kyle Mooney
Kylie Minogue
Lady Gaga
Larry Ellison
Larry King
Larry Page
Laura Spencer
Lauren Cohan
Laurie Holden
Lena Headey
Lennie James
Leonardo DiCaprio
Leonardo Nam
Liam Cunningham
Linus Torvalds
Lionel Messi
Lorne Michaels
Louise Brealey
Luke Hemsworth
Madison Lintz
Madonna
Mads Mikkelsen
Maggie Siff
Mahershala Ali
Maisie Williams
Mao Zedong
Marcia Cross
Mariah Carey
Marilyn Monroe
Mark Falvo
Mark Gatiss
Mark Hamill
Mark Pellegrino
Mark Wahlberg
Mark Zuckerberg
Marlon Brando
Martha Stewart
Martin Freeman
Martin Sheen
Martti Ahtisaari
Mason Vale Cotton
Matt Damon
Matthew McConaughey
Max von Sydow
Mayim Bialik
Meghan Markle
Melania Trump
Mel Gibson
Melissa McBride
Melissa Rauch
Meryl Streep
Michael Caine
Michael Cudlitz
Michael Douglas
Michael Gladis
Michael Jackson
Michael Jordan
Michael Kelly
Michael McElhatton
Michael Phelps
Michael Rooker
Michael Warner
Michel Gill
Michelle Fairley
Michelle Obama
Michiel Huisman
Mick Jagger
Mike Myers
Mike O'Brien
Mike Pence
Miley Cyrus
Millie Bobby Brown
Molly Parker
Molly Shannon
Morena Baccarin
Morgan Freeman
Muhammad Ali
Naomi Watts
Nasim Pedrad
Natalia Dyer
Natalia Tena
Natalie Dormer
Natalie Portman
Nathalie Emmanuel
Nathan Darrow
Nathan Fillion
Nathan Lane
Nelson Mandela
Neve Campbell
Nicholas Cage
Nicki Minaj
Nicole Kidman
Nikolaj Coster-Waldau
Noah Schnapp
Noël Wells
Nolan Gould
Norman Reedus
Oprah Winfrey
Patrick J. Adams
Paul McCartney
Paul Sparks
Penélope Cruz
Penn
Pete Davidson
Peter Dinklage
Phil Hartman
Pope Francis
Prince Charles
Prince William
Priyanka Chopra
Ptolemy Slocum
Queen Elizabeth II
Rachel Brosnahan
Reese Witherspoon
Reg E. Cathey
Reid Ewing
Reince Priebus
Richard Madden
Rich Sommer
Rick Cosnett
Rick Hoffman
Rico Rodriguez
Rihanna
Riz Ahmed
Robert De Niro
Robert Downey, Jr.
Robert Morse
Robin Williams
Robin Wright
Rodrigo Santoro
Ronda Rousey
Ron Glass
Ronnie Wood
Rory McCann
Rose Leslie
Ross Marquand
Rupert Graves
Russell Crowe
Ryan Gosling
Ryan Reynolds
Saara Aalto
Sacha Baron Cohen
Sakina Jaffrey
Samuel L. Jackson
Sandra Bullock
Sandrine Holt
Sara Gilbert
Sarah Hyland
Sarah Rafferty
Sarah Wayne Callies
Sasheer Zamata
Sauli Niinistö
Scarlett Johansson
Scott Wilson
Sean Bean
Sean Connery
Sean Maher
Sean Penn
Sean Spicer
Sebastian Arcelus
Sebastian Vettel
Serena Williams
Seth Gilliam
Seth Meyers
Shakira
Shannon Woodward
Sharon Osbourne
Shinzo Abe
Sibel Kekilli
Sidse Babett Knudsen
Sigourney Weaver
Silvio Berlusconi
Simon Cowell
Simon Helberg
Simon Pegg
Sofía Vergara
Sonequa Martin-Green
Sophie Turner
Spencer Garrett
Stephen Colbert
Stephen Dillane
Stephen Hawking
Steve Bannon
Steve Jobs
Steven Yeun
Sting
Summer Glau
Susan Sarandon
Sylvester Stallone
Talulah Riley
Taran Killam
Tate Ellington
Taylor Swift
Teemu Selänne
Teller
Tessa Thompson
Thandie Newton
Theresa May
Tiger Woods
Tina Fey
Tom Cruise
Tom Hanks
Tommy Lee Jones
Tom Payne
Tom Wlaschiha
Tracy Morgan
Ty Burrell
Tyler James Williams
Tyra Banks
Uma Thurman
Una Stubbs
Usain Bolt
Vanessa Bayer
Vincent Kartheiser
Vin Diesel
Vladimir Putin
Warwick Davis
Will Ferrell
Will Forte
Will Smith
Winona Ryder
Woody Allen
Yasmine Al Massri
Zooey Deschanel