Search

Wednesday, January 5, 2011

Loading XML Data into SQL Server

With XML becoming more and more the rage, DBAs are seeing many more requests to deal with XML data files. The question is: how do you load the data in the XML file into a table in SQL Server?

The XML File
For the purposes of this article, let's use a small XML file that is readily available on the Internet. W3Schools.com has such a sample file at http://www.w3schools.com/XML/cd_catalog.xml. As you might surmise by the name of the file, it contains brief information about a CD catalog. Please download and save this file to your computer. I am going to assume that this is being saved in the C:\SQL folder, and that you are keeping the file name the same (cd_catalog.xml). If you change any of this, make appropriate changes to the code below.

The structure of the XML file is as follows. Note that there are multiple tags.

<CATALOG>

<CD>
<TITLE>Empire BurlesqueTITLE>
<ARTIST>Bob DylanARTIST>
<COUNTRY>USACOUNTRY>
<COMPANY>ColumbiaCOMPANY>
<PRICE>10.90PRICE>
<YEAR>1985YEAR>
CD>
CATALOG>

The Database
Let's just keep this short and simple. We will load this file into one table, with an identical structure as that of the XML file.

CREATE TABLE dbo.CD_Info (

Title
varchar(100),
Artist
varchar(100),
Country
varchar(25),
Company
varchar(100),
Price
numeric(5,2),
YearReleased
smallint);

Load the file into a staging table

We will be utilizing the OpenRowset function to bulk load the file into a staging table. First, we will create a staging table with one column of the XML data type. We will then load the file into this column. Note that use of the OpenRowset function requires a table alias.

DECLARE @CD TABLE (XMLData XML);

INSERT INTO @CD
SELECT *
FROM OPENROWSET(BULK N'C:\SQL\cd_catalog.xml', SINGLE_BLOB) rs;

Retrieve the data from the staging table

Next we will retrieve the data from the staging table, and "shred" the XML into its columns. One key thing to note about XML: regardless of the case sensitivity of your server/database, XML commands are always case sensitive. In the sample file, all of the XML tags are in upper case, so everything we need to do needs to be in upper case also.

This command will retrieve each element of each CD in the sample file:

SELECT Title = x.data.value('TITLE[1]','varchar(100)'),

Artist
= x.data.value('ARTIST[1]','varchar(100)'),
Country
= x.data.value('COUNTRY[1]','varchar(25)'),
Company
= x.data.value('COMPANY[1]','varchar(100)'),
Price
= x.data.value('PRICE[1]','numeric(5,2)'),
YearReleased
= x.data.value('YEAR[1]','smallint')
FROM @CD t
CROSS APPLY t.XMLData.nodes('/CATALOG/CD') x(data);

Note in this command the nodes and value functions. These are XML functions. The nodes function specifies the path to the data that you will be using, and it requires an alias. You then apply the value function to the alias to retrieve the specific data that you want. In the value function, you need to supply the element as a singleton expression, and the data type.

The CROSS APPLY will apply each row in the staging table to the XML nodes function. For more information on how the CROSS APPLY works, I refer you to these articles:
Understanding and Using Apply, Part 1: http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using Apply, Part 2: http://www.sqlservercentral.com/articles/APPLY/69954/

Inserting the data into the CD_Info table

At this point, inserting the data is very simple: just put an insert statement in front of the above select statement.

DECLARE @CD TABLE (XMLData XML);

INSERT INTO @CD
SELECT *
FROM OPENROWSET(BULK N'C:\SQL\cd_catalog.xml', SINGLE_BLOB) rs;

INSERT INTO dbo.CD_Info (Title, Artist, Country, Company, Price, YearReleased)
SELECT Title = x.data.value('TITLE[1]','varchar(100)'),
Artist
= x.data.value('ARTIST[1]','varchar(100)'),
Country
= x.data.value('COUNTRY[1]','varchar(25)'),
Company
= x.data.value('COMPANY[1]','varchar(100)'),
Price
= x.data.value('PRICE[1]','numeric(5,2)'),
YearReleased
= x.data.value('YEAR[1]','smallint')
FROM @CD t
CROSS APPLY t.XMLData.nodes('/CATALOG/CD') x(data);
SELECT *
FROM dbo.CD_Info;

References

OpenRowset: http://msdn.microsoft.com/en-us/library/ms190312.aspx
Value(), Nodes() and OpenXML(): http://msdn.microsoft.com/en-us/library/bb522645.aspx

No comments:

Post a Comment