The SMPTE ST 2064 A-V Timing Measurement Standard is intended for live, realtime measurement of lipsync, sampled at a known good reference point and other points downstream. Core to this process is the fingerprinting algorithms which reduce audio and video to lightweight fingerprints for comparison and correlation. Today, however, this technology has applicaitons far beyond this simple goal, and to meet those effectively, some improvements which retain backward compatibility are worth investigating.
This paper looks at how the ST 2064 Standard can be applied to content ID and matching throughout a workflow or distribution infrasturcture for live verification and remediation of issues in addition to pure lipsync measurement. It further looks at the mechanisms of the fingerprinting and discusses potential improvements which may bring added benefit and performance capabilties.
Identification and matching of content can be applied to many use cases including verificaiton of presence, correctness of and placement of logos, graphics and other enroute visual modifications and confirmation of as-played content against as-expected. These and related applications are examined in detail in both simple "workbench" terms as well as workflow / system examples.
The Standard uses simple but effetive algorithms for fingerprint generation, well suited to the FPGA-based implementations circa its design. Today, this would be as likely done in software as hardware, and in both domains, much more processing power can be applied to generate more "powerful" fingerprints, offering greater accuracy. In addition, the Standard did not foresee the array of video resolutions in use today and does not explicitly provide guidance for their accommodation. These potential improvements are discussed both in term of implementation in hardware and software and what options for algorithmic imrpovement might offer a new level of capability and usefulness.