Thursday, 5 January 2012

Introduction to Magento Dataflow


Introduction to Magento Dataflow


One of major features of e-commerce websites is the possibility to share data with offline sale management systems. Magento made data exchange flexible and quite easy with DataFlow module.
Magento DataFlow is a data exchange framework that use four types of components: adapter, parser, mapper and validator. At current state of development validators are not implemented, but are reserved for future use.

Dataflow profile definition

Dataflow of data exchange process is called profile and defined as XML structure. Magento provides simple wizard-like tool for generation of some basic import/export profiles operating on products or customers entities. Advanced profiles manager is also provided for advanced users able to create XML defining profile without wizard tool and with need to use more custom dataflow operations related also to other entities.
Basic concept is that data exchange process is a set of actions. Each action executes part of the process, depending on its type, which can be for example parsing one set of data into another one with parser or collecting data from a resource with adapter. Common data, passed from action to action is stored within profiles batch container.

Adapter definition

Adapters are responsible for plugging into an external data resource and fetching requested data or saving given data into data resource. For this purpose all adapters implement interface Mage_Dataflow_Model_Convert_Adapter_Interface which contains two methods: load() and save(). Data exchange concept introduced in Dataflow module use adapters to perform 3 action types: 
  • to load data from resource - using load() method
  • to save data to resource - using save() method
  • to process one parsed row - when defined as adapter/method pair of variables of parser
For first two actions adapter’s XML definition looks like that: 

<action type=“dataflow/convert_adapter_io" method=“load”>
   ...
</
action> 
Action tag has two parameters: type and method. Type tells us which adapter class is going to be used to perform action. It is defined using its alias. Method tells us which method of this adapter class action should call. By default there are two available methods: load and save. Children of action tag define variables which are parameters used during execution of adapter’s method. Variables are defined like in the example below: 

<action type=“dataflow/convert_adapter_io" method=“load”>
   <var 
name=“type”>file</var>
   <var 
name=“path”>var/import</var>
   <var 
name=“filename”><![CDATA[products.csv]]></var>
   <var 
name=“format”><![CDATA[csv]]></var>
</
action> 

Magento DataFlow standard adapters

Magento DataFlow module includes few default adapter classes which you can find in app/code/core/Dataflow/Model/Convert/Adapter folder. Not all of them have yet implemented load() and save() methods.
For common case of reading data from or saving data to local or remote file you will use dataflow/convert_adapter_io (Mage_Dataflow_Model_Convert_Adapter_Io).
Following variables will allow you to define local/remote file as data source: 
  • type - defines type of io source we want to process. Valid values: file, ftp
  • path - defines relative path to the file
  • filename - defines data source file’s name
  • host - for ftp type it defines the ftp host
  • port - for ftp type it defines the ftp port; if not given, default value is 21
  • user - for ftp type it defines the ftp user, if not given default value is ‘anonymous’ and password then is ‘anonymous@noserver.com’
  • password - for ftp type it defines the ftp user’s password
  • timeout - for ftp type it defines connection timeout; default value is 90
  • file_mode - for ftp type it defines file mode; default value is FTP_BINARY
  • ssl - for ftp type if it is not empty, then ftp ssl connection is used
  • passive - for ftp type it defines connection mode; default value is false

Customer and Product adapters

For most commonly exchanged entities - customer and product - Magento provides default adapters: customer/convert_adapter_customer (Mage_Customer_Model_Convert_Adapter_Customer) and catalog/convert_adapter_product (Mage_Catalog_Model_Convert_Adapter_Product). Both inherit from Mage_Eav_Model_Convert_Adapter_Entity.
To simply load all customers data for selected store you can use the following xml: 

<action type=“customer/convert_adapter_customer" method=“load”>
   <var 
name=“store”>default</var>
</
action> 
Sometimes you may want to load only defined group of customers from database. To help you with this there are available following filtering variables: 
  • filter/firstname - to load only customers with firstname starting with value of this variable
  • filter/lastname - to load only customers with lastname starting with value of this variable
  • filter/email - to load only customers with email starting with value of this variable
  • filter/group - to load only customers from group with id equal to value of this variable
  • filter/adressType - to export only selected addressType; valid values are: both, default_billing, default_shipping
  • filter/telephone - to load only customers with telephone starting with value of this variable
  • filter/postcode - to load only customers with postcode starting with value of this variable
  • filter/country - to load only customers with country iso code equal to value of this variable
  • filter/region - to load only customers with region equal to value of this variable (for US just 2-letter state names)
  • filter/created_at/from - to load only customers created after a date defined as value of this variable
  • filter/created_at/to - to load only customers created before a date defined as value of this variable

For example: 

<action type=“customer/convert_adapter_customer" method=“load”>
   <var 
name=“store”><![CDATA[0]]></var>
   <var 
name=“filter/firstname”><![CDATA[a]]></var>
   <var 
name=“filter/lastname”><![CDATA[a]]></var>
   <var 
name=“filter/email”><![CDATA[a]]></var>
   <var 
name=“filter/group”><![CDATA[1]]></var>
   <var 
name=“filter/adressType”><![CDATA[default_billing]]></var>
   <var 
name=“filter/telephone”><![CDATA[1]]></var>
   <var 
name=“filter/postcode”><![CDATA[7]]></var>
   <var 
name=“filter/country”><![CDATA[BS]]></var>
   <var 
name=“filter/region”><![CDATA[WA]]></var>
   <var 
name=“filter/created_at/from”><![CDATA[09/22/09]]></var>
   <var 
name=“filter/created_at/to”><![CDATA[09/24/09]]></var>
</
action> 
Same way you can load and filter products loaded from database with following variables: 
  • filter/name - to load only products with name starting with value of this variable
  • filter/sku - to load only products with sku starting with value of this variable
  • filter/type - to load only products with type defined as value of this variable; valid values are: simple, configurable, grouped, bundle, virtual, downloadable
  • filter/attribute_set - to load only products with attribute set id equal to value of this variable
  • filter/price/from - to load only products with price starting from value of this variable
  • filter/price/to - to load only products with price up to value of this variable
  • filter/qty/from - to load only products with quantity starting from value of this variable
  • filter/qty/to - to load only products with quantity up to value of this variable
  • filter/visibility - to load only products with visibility id equal to value of this variable
  • filter/status - to load only products with status id equal to value of this variable

Example: 

<action type=“catalog/convert_adapter_product" method=“load”>
   <var 
name=“store”><![CDATA[0]]></var>
   <var 
name=“filter/name”><![CDATA[a]]></var>
   <var 
name=“filter/sku”><![CDATA[1]]></var>
   <var 
name=“filter/type”><![CDATA[simple]]></var>
   <var 
name=“filter/attribute_set”><![CDATA[29]]></var>
   <var 
name=“filter/price/from”><![CDATA[1]]></var>
   <var 
name=“filter/price/to”><![CDATA[2]]></var>
   <var 
name=“filter/qty/from”><![CDATA[1]]></var>
   <var 
name=“filter/qty/to”><![CDATA[2]]></var>
   <var 
name=“filter/visibility”><![CDATA[2]]></var>
   <var 
name=“filter/status”><![CDATA[1]]></var>
</
action> 

Parser definition

Parsers are responsible for transforming data from one format to another. Parser’s interface Mage_Dataflow_Model_Convert_Parser_Interface defines two methods required in each parser: parse() and unparse(). Definition of parser can be as simple as: 

<action type=“dataflow/convert_parser_serialize" method=“parse" /> 
Similarly to adapter we define action tag with two attributes: type, which tells which class we want to use and method of this class we want to execute. Of course there is possibility to define variables within action tag body as you will see below.

Magento DataFlow standard parsers

Magento DataFlow includes few standard parsers which you can find in app/code/core/Dataflow/Model/Convert/Parser.
The simplest of standard parsers is dataflow/convert_parser_serialize (Mage_Dataflow_Model_Convert_Parser_Serialize) which doesn’t require any variables passed. It requires though that any of previous actions set data within profile’s container. Method parse() unserialize data stored within profile’s container and replace it with the result. Method unparse() do the opposite, so it serializes data stored within profile’s container and replace it with the result.
One of most often used standard parsers is dataflow/convert_parser_csv which allows transforming from (with method parse()) or to (with method unparse()) CSV file. Example of definition: 

<action type=“dataflow/convert_parser_csv" method=“parse”>
     <var 
name=“delimiter”><![CDATA[,]]></var>
     <var 
name=“enclose”><![CDATA[“]]></var>
     <var name=”
fieldnames“>true</var>
     <var name=”
store“><![CDATA[0]]></var>
     <var name=”
decimal_separator“><![CDATA[.]]></var>
     <var name=”
adapter“>catalog/convert_adapter_product</var>
     <var name=”
method“>parse</var>
</action>
 
This parser requires that you call some IO adapter prior to its execution (using for example dataflow/convert_adapter_io to read some CSV file) if you want to call method parse. If you want to store data into CSV file you have to do both - call any action that will set data within profile’s container prior to parser execution and call IO adapter after parser execution to store data within file.
Following variables will allow you to customize CSV file parsing: 
  • - delimiter - defines delimiter used in CSV file; defaults to comma (,) character
  • - enclose - defines what character is used to enclose data values; defaults to empty character
  • - escape - defines escape character for CSV file; defaults to \\
  • - decimal_separator - defines decimal separator sign
  • - fieldnames - if set to true, it is assumed first row of CSV file contains field names; if set to false map variable is used
  • - map - defines fieldnames for files where first row doesn’t contain fieldnames; to see how to define a map take a look at section of this article related to mapping values
  • - adapter - tells which adapters method should be called on each row
  • - method - tells which method of adapter should be called on each row; defaults to saveRow
All variables defined within parser’s action body are passed to the defined adapter, so if you need to pass something to it, you can simply set required variable within parser’s action body.
Last of standard parsers included within DataFlow module is dataflow/convert_parser_xml_excel (Mage_Dataflow_Model_Convert_Parser_Xml_Excel), which converts data from and to Excel XML file. Example of definition: 

<action type=“dataflow/convert_parser_xml_excel" method=“unparse”>
     <var 
name=“single_sheet”><![CDATA[products]]></var>
     <var 
name=“fieldnames”>true</var>
</
action> 
Use requirements are the same as for dataflow/convert_parser_csv.
Following variables will allow you to customize CSV file parsing: 
  • - fieldnames - if set to true, it is assumed first row of CSV file contains field names; if set to false map variable is used
  • - map - defines fieldnames for files where first row doesn’t contain fieldnames
  • - single_sheet - tells if parsed should be one sheet or all; should contain name of the sheet to be parsed
  • - adapter - tells which adapters method should be called on each row
  • - method - tells which method of adapter should be called on each row; defaults to saveRow

Standard customer and product entity parsers

For most commonly exchanged entities - customer and product - Magento provides also standard parsers: customer/convert_parser_customer (Mage_Customer_Model_Convert_Parser_Customer) and catalog/convert_parser_product (Mage_Catalog_Model_Convert_Parser_Product). Both inherit from Mage_Eav_Model_Convert_Adapter_Entity.
Since standard adapter’s load() methods calls result with array of solely entities’ id values it is required to call parser’s unparse method, if we want to get more detailed data. Both parsers take this arrays and for each entity parse its data variable content, ignore system fields, objects, non-attribute fields and create an associative array from the rest. Additionally product parser add to the array result of parsing product related stock item object, and customer parser - result of parsing shipping and billing addresses and information about newsletter subscription.
Both entities parsers have deprecated parse() methods, since their function is now mostly done by parser actions with standard adapter methods called within parser’s context. Example of product parser definition, parsing only products from selected store: 

<action type=“catalog/convert_parser_product" method=“unparse”>
   <var 
name=“store”><![CDATA[1]]></var>
</
action> 

Mapping values

DataFlow module provides also a mapper concept - class with map() method that is responsible for mapping processed fields from one to another. The definition of mapper looks like that for example: 

<action type=“dataflow/convert_mapper_column" method=“map”>
 <var 
name=“map”>
     <
map name=“category_ids”><![CDATA[categorie]]></map>
     <
map name=“sku”><![CDATA[reference]]></map>
     <
map name=“name”><![CDATA[titre]]></map>
     <
map name=“description”><![CDATA[description]]></map>
     <
map name=“price”><![CDATA[prix]]></map>
     <
map name=“special_price”><![CDATA[special_price]]></map>
     <
map name=“manufacturer”><![CDATA[marque]]></map>
 </var>
 <var 
name=“_only_specified”>true</var>
</
action> 
Again we have action tag with two attributes: type set as mapper class alias and method that is called to do the mapping. Mapper dataflow/convert_mapper_column is a standard mapper you can find in Magento DataFlow module within app/code/core/Dataflow/Model/Mapper/ folder, and its purpose is to map one array into another with changing the name and possibility to limit fields in result. Map’s tag attribute name tells which field name should be replaced in new array by field named like the content of map’s tag. If named field doesn’t exist in source array, value for target’s array field is set to null. Variable _only_specified tells if only fields specified in map definition should be in the resulting array.

No comments:

Post a Comment