Author: | Bernardo Martínez Garrido |
---|---|
Copyright: | WESO 2015 |
License: | MIT |
Interpreters: | Python 2.6, 2.7, 3.2, 3.3, 3.4, Pypy, Pypy3 |
The CWR API library offers a model to represent the content of files following the CISAC CWR standard.
With this model, and various helper classes, it is not only possible to read and show those files, but also to operate with the data they contain.
This library has been developed based on CWR specification version 2.1 revision 3, from December 10th 2004.
The library can be found at Pypi, making its installation it very easy:
$ pip install cwr-api
Contents:
The CWR file standard has been created by CISAC as a way for publishers and societies to share discographic information.
CWR files containing updated information are created, sent by a sending party to a receiver, and then processed before an acknowledgement file is returned.
While the CWR standard is defined on a single document CISAC has created a few other which may help to understand it or, in the case of for example the lookup tables, are actually required if one wishes to work with the files.
The files which can be downloaded here are the ones which have been used to develop the library. But the latest version of these files (which at the time of creating the project are the same on this page) can always be found on the {{{www.cisac.org}CISAC}} website.
A single file contains the standard specification:
Additional information about rules to be used when writing into a file can be found on the following files:
There is a user manual to help when working with these files:
Some files contain information required to fill and validate the CWR files.
CWR Validation and Lookup Tables (CRF020)
CWR Sender ID and Codes (CWR06-1972)
CWR Error Messages (CWR08-2493)
TIS information.
Miscellany related files. Some of these are not required to use the standard, but can help to understand it, while others have a very specific use.
The CWR light is a variant which reduces greatly the number of fields in the records.
The following files are used to prepare a CWR-based communication between two parties.
There is an implementation spreadsheet showing society and publisher communications have been established with the CWR standard.
CWR files are meant to be used as an information transmission system and reflect this on their structure.
The information is divided in two parts: the file name contains data uniquely identifying the file, and the contents of the file contain the actual information being sent.
The CWR standard gives a special importance to the file’s name, certain metadata is stored on it to allow it being used as an unique identifier for the file.
The filename follows the pattern CWyynnnnsss_rrr.Vxx where each section means the following:
On the original CWR v2.1 specification the sequence number consists of only
two digits. Was changed to four on revision 5.
Note that the filename specification is not always followed, and so by default it should be considered optional.
The file is structured as a batch process, storing a consecutive series of transactions.
All these transactions are grouped into a single Transmission, which is then divided into several Groups, one for each type of transaction.
The start and ending of both the Transmission and the Groups are marked by a header and trailer record.
So the structure of file’s interior can be defined as: [HDR, [GRH, GRT]*, TRL]
Where the tags are the CWR record header tags, meaning:
These four types of record are the Control Records of the file, used not only to separate the sections, but also to verify the data contained in them is correct.
This is done comparing the information these records contain with the information read from the section they enclose.
There is only single Transmission in the file, and it contains all the records.
Groups contain batchs of transactions.
While there are several groups on the file, there can be only one for each type of transaction, and they indicate which type of transaction they are storing.
They are numbered consecutively, starting on 1. No two Groups may have the same number, and there can’t be any gaps between their numbers.
A Transaction is a batch of records containing all the data for a single job.
For example, a Transaction may contain information for registering an Agreement, for indicating a registering conflict with a Work, or even for indicating an error on a Transaction.
A Transaction is always of a single type, which is specified by it’s header record, and will be named after it. So if a Transaction starts with an Agreement record it is an Agreement Transaction.
The possible transactions in CWR v2.1 are:
In practise, a Transaction is just a relationship of Records, and it indicates which records can or should follow the header.
Going back to the previous example, an Agreement Transaction would indicate an Agreement, the Territories it applies to and the Interested Parties for each Territory.
CWR records are divided into fields, each following a clear pattern, which consists on defining the following properties:
The field name is used only to help humans identifying the field, while the start index and size serve to acquire the data from the record line.
Format | Code | Notes |
---|---|---|
Alphanumeric | A | ASCII characters in upper case. |
Boolean | B | A single character. Can be ‘Y’, for yes, or ‘N’, for no. Meaning the Boolean values True and False. |
Flag | F | The same as Boolean, but adding a third option: ‘U’ for unknown. This can’t be parsed into a Boolean value. |
Date | D | Eight numeric characters, following the pattern YYYYMMDD. |
Numeric | N | Usually an integer, sometimes a float value. In that case the field documentation indicates how many characters are for the decimal value. |
Time | T | Six numeric characters, following the pattern HHMMSS. It is on military format (24 hours, and not two groups of 12). |
List/Table Lookup | L | Only accepts values coming from a specific list or table. The table is indicated in the field description. |
Fields may be optional, or there may not be any data to put in them. In those cases, the empty columns must be filled as follows.
Format | Code | When empty |
---|---|---|
Alphanumeric | A | Columns should be filled with the empty character. |
Boolean | B | ? |
Flag | F | Should be set as ‘unknown’ (‘U’). |
Date | D | Columns should be set as 0. |
Numeric | N | Columns should be set as 0. |
Time | T | Columns should be set as 0. |
List/Table Lookup | L | Columns should be filled with the empty character. |
Date and Time formats have additional constraints due to the patterns they follow.
Date follows the pattern YYYYMMDD, which has the following constraints:
Time follows the pattern HHMMSS, which has the following constraints:
ID | Constraint | Failure level |
---|---|---|
1 | File should be readable | ER |
2 | First record should be HDR | ER |
3 | Second record should be GRH | ER |
4 | Groups open with GRH and close with GRT | ER |
5 | Last record is TRL | ER |
6 | GRH should be followed by a Transaction header | ER |
7 | GRT should be followed by a GRH or TRL | ER |
7 | Only a single HDR and a single TRL exist | ER |
After receiving and processing the CWR file, the recipient will create and return an acknowledgement file, containing most of the original file information, and adding Acknowledgement Transactions.
These transactions will include all additional information that may be needed, such as the reasons for rejecting a transaction, or the CAE/IPI numbers where they may be missing.
Information that is not relevant to the creator of the Acknowledgment file will not appear on it. For example, a society will generally not return SPU/SPT records for sub-publishers in territories it does not control.
Note that when validating the original CWR file the process won’t stop at the first error encountered, but will continue to report all errors, unless a severe error makes further processing inadvisable.
According to the CWR standard, along the Acknowledgment file a form must be fulfilled and sent back to the submitter.
This contains the transmission participants:
And also a series of details:
Along a series of boolean flags:
Information on the Acknowledgement file is added with the use of Acknowledgement transactions.
These mark the Transactions on the original file, adding any needed information about them, such as if it has been rejected.
It follows the structure: [ACK, MSG*, AGR | NWR | REV | EXC]
As far as we know, the yearly volume is commonly quite low.
The current format allows sending up to ten thousand files between each submitter and recipient, which was increased from an original one hundred, but probably no submitter will send so many files.
CWR files are commonly sent by publishers each trimester, but can amount to up to four per year.
Societies on the other hand may not send a single CWR file in a year, but may reply to any and all received files with an acknowledgement file.
A single group may contain up to ten million transactions, and there can be a hundred thousand groups. But again, this limit seems hard to reach.
A big file will contain around a hundred thousand agreements, which will mean a few hundred thousand lines. While a smaller one will just have less than ten thousand lines.
This section contains a more detailed view of how is data stored inside a CWR file.
The valid Record types are always indicated in the latest CWR standard specification, and indicated on CISAC’s Record Type table.
These are used on a Record prefix to identify it’s type.
This list is offered just to make it easier identifying each of them.
Record Type | Record name |
---|---|
GRH | Group Header |
GRT | Group Trailer |
HDR | Transmission Header |
TRL | Transmission Trailer |
Note that the Transaction type is actually the type of the header Record on the Transaction.
So for example an Acknowledgment Transaction starts with an Acknowledgment Record.
Record Type | Record name |
---|---|
ACK | Acknowledgment of Transaction |
AGR | Agreement supporting Work Registration |
EXC | Existing Work which is in conflict with a Work registration |
NWR | New Works Registration |
ISW | Notification of ISWC assigned to a Work |
REV | Revised Registration |
Record Type | Record name |
---|---|
ALT | Alternate Title |
ARI | Additional Related Information |
COM | Composite Component |
EWT | Entire Work Title for Excerpts |
IPA | Interested Party of Agreement |
IND | Instrumentation Detail |
INS | Instrumentation Summary |
MSG | Message |
NAT | Non-Roman Alphabet Title |
NCT | Non-Roman Alphabet Title for Components |
NET | Non-Roman Alphabet Entire Work Title for Excerpts |
NOW | Non-Roman Alphabet Other Writer Name |
NPN | Non-Roman Alphabet Publisher Name |
NPR | Performing Artist in Non-Roman alphabet |
NVT | Non-Roman Alphabet Original Title for Versions |
NWN | Non-Roman Alphabet Writer Name |
OPU | Other Publisher |
ORN | Work Origin |
OWR | Other Writer |
PER | Performing Artist |
PWR | Publisher for Writer |
REC | Recording Detail |
SPT | Publisher Territory of Control |
SPU | Publisher Controlled by Submitter |
SWR | Writer Controlled by Submitter |
SWT | Writer Territory of Control |
TER | Territory in Agreement |
VER | Original Work Title for Versions |
All the records contain an initial field serving the uniquely identify them, and to note the type of record it is.
It should be noted that the prefix structure on this page applies only to Transaction and Detail records. Control records lack the Transaction Sequence Number and the Record Sequence Number.
This field is the record prefix, which contains just three values:
Field | Type | Description |
---|---|---|
Record Type | Table Lookup | One from the Record Type or Transaction Type tables |
Transaction Sequence # | Numeric | Unique ID for each Transaction in a Group |
Record Sequence # | Numeric | Unique ID for each Detail Record in a transaction |
Transactions and Detail Records share both sequence numbers, but use them in a different way.
Transactions have the Record Sequence Number set to 0 always. Their Transaction Sequence Number is 0 for the first Transaction on a Group, and a consecutive value for the following Transactions on the same Group.
Detail Records use the Transaction Sequence Number of the Transaction they are part of, while the Record Sequence Number is that of the previous Record plus one.
Ambiguity: the specification file is not very clear about the Detail
Records numbering. I suppose the first Detail Record on a Transaction
should have RSN 1 (RSN of the Transaction, which is 0, plus 1)
This would be an example of sequence numbering:
Record Type | Transaction Sequence # | Record Sequence # |
---|---|---|
First transaction header | 0 | 0 |
First detail in transaction | 0 | 1 |
Second detail in transaction | 0 | 2 |
Third detail in transaction | 0 | 3 |
Second transaction header | 1 | 0 |
First detail in transaction | 1 | 1 |
Second detail in transaction | 1 | 2 |
Third transaction header | 2 | 0 |
First detail in transaction | 2 | 1 |
Second detail in transaction | 2 | 2 |
Third detail in transaction | 2 | 3 |
Fourth detail in transaction | 2 | 4 |
The Transmission header indicates the begginning of the CWR data on the file, and contains information about it’s creation and sender.
It contains the following fields:
Field | Type | Required | Description |
---|---|---|---|
Record Type | Alphanumeric | Yes | It is always ‘HDR’ |
Sender Type | Alphanumeric | Yes | Indicates the role of the sender. Only ‘AA’, ‘PB’, ‘SO’ or ‘WR’ are accepted. |
Sender ID | Numeric | Yes | Code identifying the sender |
Sender Name | Alphanumeric | Yes | Name of the sender |
EDI Standard Version Number | Alphanumeric | Yes | Version of the header and trailer. ‘01.10’ for CWR 2.1 |
Creation Date | Date | Yes | The date that this file was created |
Creation Time | Time | Yes | The time of day that this file was created |
Transmission Date | Time | Yes | The date that this file was transmitted to all receiving entities |
Character Set | Time | No | To be used if this file contains data in a character set other than ASCII |
The Group Trailer closes a CWR file, and contains validation information.
This validation data is the number of groups, transactions and records which should have been processed.
It contains the following fields:
Field | Type | Required | Description |
---|---|---|---|
Record Type | Alphanumeric | Yes | It is always ‘TRL’ |
Group Count | Numeric | Yes | Number of transactions in the Transmission |
Transaction Count | Numeric | Yes | Number of transactions in the Transmission |
Record Count | Numeric | Yes | Number of records in the Transmission |
The Group header indicates the begginning of a batch CWR transactions on the file.
It contains the following fields:
Field | Type | Required | Description |
---|---|---|---|
Record Type | Alphanumeric | Yes | It is always ‘GRH’ |
Transaction Type | Table Lookup (Transaction Type table) | Yes | All the transactions in the group are of this type |
Group ID | Numeric | Yes | Sequential ID starting on 1 |
Version Number | Alphanumeric | Yes | CWR version of the transaction. By default it is ‘02.10’ for CWR 2.1 |
Batch request | Numeric | No | ID used by the submitter to internally identify this batch |
The Group Trailer closes a batch of transactions, and contains validation information.
This validation data is the number of transactions and records which should have been processed.
It contains the following fields:
Field | Type | Required | Description |
---|---|---|---|
Record Type | Alphanumeric | Yes | It is always ‘GRT’ |
Group ID | Numeric | Yes | The same as the header |
Transaction Count | Numeric | Yes | Number of transactions in the group |
Record Count | Numeric | Yes | Number of records in the group |
This record indicates the transaction status after its validation, along any information needed to link this transaction with the original one.
Field | Type | Required | Description |
---|---|---|---|
Original Transaction Sequence # | Numeric | Yes | The sequence number of the original transaction |
Original Transaction Type | Table Lookup (Transaction Type Table) | Yes | The type of the original transaction |
Processing Date | Date | Yes | The date the file was received |
Transaction Status | Table Lookup (Transaction Status Table) | Yes | Current status for the Transaction |
Creation Title | Alphanumeric | No | If the original transaction reffers to a work, its title should be here |
Submitter Creation # | Numeric | No | ID assigned by the submitter. Required if the original Transaction was accepted |
Recipient Creation # | Numeric | No | ID assigned by the recipient. Required if the original Transaction was accepted |
Indicates the results of validation and accompanies Acknowledgement records.
Field | Type | Required | Description |
---|---|---|---|
Record Prefix | Alphanumeric | Yes | The type is always ‘MSG’ |
Message Type | List Lookup | Yes | One of ‘F’/’R’/’T’/’G’/’E’ |
Original Record Sequence # | Numeric | Yes | The Record Sequence Number which caused this message |
Record Type | Alphanumeric | Yes | The Record Type which caused this message |
Message Level | List Lookup | Yes | One of ‘E’/’G’/’T’/’R’/’F’ |
Validation Number | Alphanumeric | Yes | Identifies the specific edit condition that generated this message |
Message Text | Alphanumeric | Yes | The text associated with this message |
This section contains a more detailed view of how is a CWR file validated.
Once received, the file undergoes a validation process.
Three levels of validation are applied:
The Transaction validation ensures the overall relationship between the records is correct. This checks mainly the order, numbering and counts of records.
Record validation checks a concrete relationship between records. For example, a TER agreement should follow an AGR or TER agreement.
Field level validation ensures each field contains the correct information. This checks things such as the size of the field, the pattern it must follow, or that the references IDs exist somewhere.
Each validation constraint has a failure level assigned.
Code | Failure name |
---|---|
ER | Entire file is rejected |
GR | Entire group is rejected |
TR | Entire transaction is rejected |
RR | Entire record is rejected |
FR | Record is rejected and set to the default value |
ID | Constraint | Failure level |
---|---|---|
1 | File should be readable | ER |
2 | First record should be HDR | ER |
3 | Second record should be GRH | ER |
4 | Groups open with GRH and close with GRT | ER |
5 | Last record is TRL | ER |
6 | GRH should be followed by a Transaction header | ER |
7 | GRT should be followed by a GRH or TRL | ER |
7 | Only a single HDR and a single TRL exist | ER |
ID | Field | Constraint | Failure level |
---|---|---|---|
1 | Record Type | Should be one from the Record Type or Transaction Type tables | ER |
2,3,4,7,8 | Transaction Sequence # | See below | TR/ER |
5,6 | Record Sequence # | See below | ER |
This section contains a more detailed view of how is the grammar for CWR files created.
Depending on their scope, there are three kinds of rules:
In practice only terminal rules are different. These are stored in Python modules, and so are static, while the other rules are generated dynamically from configuration files.
Except for terminal rules, all rules are an aggregation of smaller rules, creating a small tree.
These trees are read in pre-order.
The following example is a generic rules tree:
Rules nodes will be substituted by the rules they contain.
Note that the rule in the second level has a loop. This means that the rules it contains may appear multiple times.
As trees are read in pre-order, this would read as:
“Rule composed of terminal rule 1, followed by terminal rule 2, followed by terminal rule 4 multiple times, followed by terminal rule 3, followed by terminal rule 1.”
A small DSL is bein used to set up the grammar.
Files using this DSL are read and processed, and the data is then sent to the grammar factory to build the grammar.
An example of this DSL, defining the Agreement Record:
transaction_record:
id: agreement
head: AGR
rules:
[
sequence
[
field: submitter_agreement_n
field: international_standard_code
field: agreement_type
field: agreement_start_date
field: agreement_end_date
field: retention_end_date
field: prior_royalty_status
field: prior_royalty_start_date
field: post_term_collection_status
field: post_term_collection_end_date
field: date_of_signature
field: number_of_works
field: sales_manufacture_clause
field: shares_change
field: advance_given
field: society_assigned_agreement_n
]
]
Rules are composed of several smaller rules. The terminal rules are the fields, defined on their own module.
This creates a tree of rules.
There are two groups of rules in a tree:
Note that rules can be terminal rules. All rule blocks should generate trees ending in terminal rules.
Show how the rules are defined as a tree.
The DSL consists on a series of blocks, each of them representing a grammar rule.
These rules represent a logical section of the file, and may be for a line, or for a series of them.
They have the following structure, which only shows compulsory fields:
rule_group_1:
id: rule_id_1
rules:
[
internal_rules_list
[
rule_group_2: rule_id_2
rule_group_1: rule_id_3
rule_group_2: rule_id_4
]
rule_group_2: rule_id_5
]
Each block has a set of required fields:
Field | Notes |
---|---|
Root rule group | The root of the block. In the example it is ‘transaction_record’. It indicates the global group to which it belongs. |
Rule id | Identifier for this rule |
Rules | The smaller rules which compose this rule |
Internal rules | A new tree of rules |
Rule group | A group of rules |