NAME
    rdf2tarql - Convert RDF examples to TARQL scripts

SYNOPSIS
      perl rdf2tarql.pl model.ttl > model.tarql

DESCRIPTION
    rdf2tarql converts an RDF example with embedded CSV column names into a
    TARQL script (query). TARQL <http://tarql.github.io/> is a high-speed
    streaming convertor from CSV to RDF. We've used it to convert huge files
    (over 10M rows, 145 columns) using complex TARQL queries (480 lines: 110
    prefixes, 33 nodes, 250 triples, 110 binds).

  RDF Model
    Typically the example is an rdfpuml model that uses embedded column
    names in URLs and attribute values.

    Consider the following example about persons (customers):

        <person/(customer_id)> a :NaturalPerson;
          :id "(customer_id)";
          :firstName "(first_name)";
          :lastName "(last_name)";
          :gender "(gender)";
          :religion "(religion)";
          :hasAddress <person/(customer_id)/address>;
          :hasEvent  <person/(customer_id)/birth>;
          :hasEvent  <person/(customer_id)/education>.

        <person/(customer_id)/address> a :Address;
          :houseNumber "(house_number)";
          :street "(street)";
          :postalCode "(postal_code)";
          :city <country/(country)/city/urlify(city)>;
          :country <country/(country)>.

        <country/(country)/city/urlify(city)> a :City; :country <country/(country)>; :name "(city)".

        <country/(country)> a :Country; :code "(country)".

        <person/(customer_id)/birth> a :BirthEvent; :hasDate "(date_of_birth)"^^xsd:date.

        <person/(customer_id)/education> a :EducationEvent;
          :hasDate "(enrollment_date)"^^xsd:date;
          :university <university/urlify(university)>;
          :degree <degree/urlify(education_degree)>.

  Generated Construct
    The generated TARQL consists of two parts. First is a CONSTRUCT that's
    very similar to the example (model graph):

      construct {
        ?person_URL a :NaturalPerson;
          :id ?customer_id;
          :firstName ?first_name;
          :lastName ?last_name;
          :gender ?gender;
          :religion ?religion;
          :hasAddress ?person_address_URL;
          :hasEvent  ?person_birth_URL;
          :hasEvent  ?person_education_URL.

        ?person_address_URL a :Address;
          :houseNumber ?house_number;
          :street ?street;
          :postalCode ?postal_code;
          :city ?country_city_URL;
          :country ?country_URL.

        ?country_city_URL a :City; :country ?country_URL; :name ?city.

        ?country_URL a :Country; :code ?country.

        ?person_birth_URL a :BirthEvent; :hasDate ?DATE_OF_BIRTH.

        ?person_education_URL a :EducationEvent;
          :hasDate ?ENROLLMENT_DATE;
          :university ?university_URL;
          :degree ?degree_URL.

        ?university_URL a :University; :name ?university.
        ?degree_URL a :AcademicDegree; :name ?education_degree.

  Generated Binds
    Then come a bunch of bindings generated by:

    Using the CSV fields (eg ?customer_id),
    Computing URLs from patterns (eg ?person_address_URL),
    Implementing a urlify() function that replaces consecutive punctuation
    with a single _ and removes leading/trailing punctuation (eg ?CITY then
    ?country_city_URL)
    Implementing datatype casting using strdt() (eg ?DATE_OF_BIRTH)

    All these binds are generated automatically using some simple
    conventions:

    URL variables are named using the constant parts (eg ?person_birth) and
    appending _URL
    Transformed variables are rendered in uppercase (eg ?CITY,
    ?DATE_OF_BIRTH)

    Here is the result:

      } where {
        bind(iri(concat("person/",?customer_id)) as ?person_URL)
        bind(iri(concat("person/",?customer_id,"/address")) as ?person_address_URL)
        bind(iri(concat("person/",?customer_id,"/birth")) as ?person_birth_URL)
        bind(iri(concat("person/",?customer_id,"/education")) as ?person_education_URL)
        bind(replace(replace(replace(?city,'[^\\p{L}\\p{N}]+','_'),'^_',''),'_$','') as ?CITY)
        bind(iri(concat("country/",?country,"/city/",?CITY)) as ?country_city_URL)
        bind(iri(concat("country/",?country)) as ?country_URL)
        bind(strdt(?date_of_birth,xsd:date) as ?DATE_OF_BIRTH)
        bind(strdt(?enrollment_date,xsd:date) as ?ENROLLMENT_DATE)
        bind(replace(replace(replace(?university,'[^\\p{L}\\p{N}]+','_'),'^_',''),'_$','') as ?UNIVERSITY)
        bind(iri(concat("university/",?UNIVERSITY)) as ?university_URL)
        bind(replace(replace(replace(?education_degree,'[^\\p{L}\\p{N}]+','_'),'^_',''),'_$','') as ?EDUCATION_DEGREE)
        bind(iri(concat("degree/",?EDUCATION_DEGREE)) as ?degree_URL)
      }

  Prerequisites
    TARQL <https://github.com/tarql/tarql/releases>: tested with version
    1.2-SNAPSHOT, BUILD_DATE: 2017-12-07T13:33:10Z

    See test/customer for an example (includes a Makefile for make).

  Limitations
    Don't use uppercase in field names as that may conflict with generated
    variable names.

    Supports only one simple function urlify(), more should be added.

    Non-ASCII characters in IRIs get converted to ugly escapes.

SEE ALSO
    rdfpuml: a tool that generates PlantUML diagrams from RDF examples.

    rdf2rml: a tool that generates R2RML transformations from RDF examples.

    rdf2ontorefine: a tool that generates OntoRefine SPARQL updates from RDF
    examples.

AUTHOR
    Vladimir Alexiev, Ontotext Corp

    Last update: 9-Jun-2020

