Learning and Sharing: July 2018

Saturday, July 28, 2018

What is schema in Data warehousing and type of schemas

Definition: A schema is a collection of tables, views, indexes and synonyms. Each of these individuals are called schema objects. In simple terms terms we could say a schema is collection of database objects.

Schema plays a very important role when we design a data warehouse, because that would decide if the data warehouse we are building would serve its purpose. There are quite a few terminologies or types of schema. I am going to cover here a few which are very commonly used or reffered.

Before we get into that, it is important to note these are just design principles which are good to know but are not hard rules. Feel free to adapt as per your business requirements.

Commonly used schema types are:

Enumerating Datasets and Reports in Power Bi Tenant

Below is how you enumerate through the datasets and reports in a power bi tenant.

Install the Power Bi Management cmdlts and login to your power bi
Download the power bi Management cmdlts from here. click on the deploy button as shown in the image below and login using your Power Bi admin account

.
Enumerate through all datasets that uses SQL server database
Use the Get-PowerBIDataset and Get-PowerBIDatasource to the enumerate through the data sets. Below is how the power shell should look like:
$datasetIds = Get-PowerBIDataset -Scope Organization -ErrorAction SilentlyContinue | Foreach {$dsId = $_.Id; Get-PowerBIDatasource -DatasetId $dsId -Scope Organization -ErrorAction SilentlyContinue | Where-Object {$_.DatasourceType -eq 'Sql' -and ($_.ConnectionDetails.Server -like 'sqldb01' -and $_.ConnectionDetails.Database -like 'DBNAME')} | Foreach { $dsId }}
Enumerate all reports that use the identified datasets
Using the dataset IDs found in the step 2, enumerate through the all the reports which are using the datasets
$reports = $datasetIds | Foreach { Get-PowerBIReport -Filter "datasetId eq '$_'" -Scope Organization }

Friday, July 27, 2018

API and powershell for Power Bi Administrators

Power Bi enables organisation to be more data driven by empowering every business user build there own reports. But organisations must be capable to govern this self service BI boom. So the IT administrators must be capable of handling Licencing, Capacity, Security, Governance and Regulatory requirements.

To help achieve this, last week Microsoft released APIs, power shell and .Net sdk for power bill admins.

A general scenario: Consider you have a database which is being used by many users and is under heavy load. Obviously there would some reports which could be optimised to reduce query size. As an admin if you could trace which are these reports you could then ask the users to optimise. This can be achieved by enumerating through resources under your tenant. Click here to read about how can we enumerate and identify.

This is just one example, but you could achieve a lot more. Below are some more related resources.

https://docs.microsoft.com/en-us/powershell/power-bi/overview?view=powerbi-ps
https://github.com/Microsoft/powerbi-powershell
https://github.com/Microsoft/PowerBI-CSharp
https://docs.microsoft.com/en-us/rest/api/power-bi/
https://docs.microsoft.com/en-us/power-bi/service-admin-reference

Thursday, July 26, 2018

Make all SharePoint Surveys truly anonymous real quick

If you have worked withe surveys, you must be knowing survey provide an option "Show user names in survey results", to make survey anonymous, If not you could refer this. Now the problem with this is anyone with rights to edit list settings could change this setting and all users would be visible.

There is no suggested approach to do this there are lots of options available. One very nice article is available on this msdn blog. Mostly suggested changing the name using event receiver or workflows. But none of these serve my purpose, as i needed to make all the surveys on the farm anonymous including the new ones. Also with hundreds of surveys across the farm, these workflows or event receivers could hamper performance.
So, i did a small trick to make sure no one to change the list setting and rest SharePoint could handle automatically.
I made some tweaking in the application page witch impacted all my surveys. We could also do the same for a particular site.
We will have to make changes in two application pages, available in layouts folder of 15 hive. Combined these changes will give an effect to of no in survey setting.

ListGeneralSettings.aspx : This page opens when we click "ListName, Description, Navigation in site settings.

New.aspx : This page opens when we create a new survey list. Page URL is same for both popup and advanced options page.

To make sure user can not select yes on advanced options screen. To do this find the below ts tag and comment it.

<td class="ms-authoringcontrols" nowrap="nowrap"><input id="onetidShowUserNamesNo" type="radio" value="on" name="showUsernames" title="<SharePoint:EncodedLiteral runat='server' text='<%$Resources:wss,lstsetng_showusernamesyes_title%>' EncodeMethod='HtmlEncode'/>" /><label for="onetidShowUserNamesYes"><SharePoint:EncodedLiteral runat="server" text="<%$Resources:wss,lstsetng_showusernamesyes_label%>" EncodeMethod='HtmlEncode'/></label></td>

By pass the Basic creation popup and navigate to advanced screen because by default SharePoint consider it yes. To do this:

Find the below code in the file:

ListGeneralSettings:

Search for "Survey Settings" and make radio button with id "RADShowUserYes" hidden by adding attribute visible:false. This will hide the yes option in List settings.

New.aspx: We will make two changes in this file.

if (this.IsDialogMode){

%><a class="ms-textSmall" href="javascript:;" onclick="NavigateToFullPage()"><SharePoint:EncodedLiteral runat="server" text="<%$Resources:wss,lstsetng_advancedoptions_text%>" EncodeMethod='HtmlEncode'/></a><%}

and another if condition inside this if. Final code should look like this:

if (this.IsDialogMode)

{

if (HttpContext.Current.Request.Url.AbsoluteUri.Contains("ListTemplate=102"))

{

Page.ClientScript.RegisterStartupScript(this.GetType(), "CallMyFunction", "NavigateToFullPage()", true);

}else {

<a class="ms-textSmall" href="javascript:;" onclick="NavigateToFullPage()"><SharePoint:EncodedLiteral runat="server" text="<%$Resources:wss,lstsetng_advancedoptions_text%>" EncodeMethod='HtmlEncode'/></a>

}}

If you have any existing survey lists, you could use powershell to to change their list setting.

How to Enable Share-point Designer in Central Admin

Below are steps for SharePoint Designer settings in Central Admin:

Open Central Application Website.
Open SharePoint Designer Settings under General Application Settings.
Select your web application from the drop down at the top of the form.
Select the appropriate options as mentioned below(per your needs) and click ok.

There are four check boxes as listed below:

Enable SharePoint Designer : This Enables or disables SharePoint Designer for the Web Application and all of its site collections.

Note: If unchecked, users still see SharePoint Designer buttons in the browser, but when they click them, they get a message stating that SharePoint Designer is not allowed on the site.

Enable Detaching Pages from the Site Definition : This option enables or disables the ability to detach pages from the site definition. If unchecked, SharePoint Designer only allows you to edit pages in normal mode. The Edit File in Advanced Mode is disabled. Running in Advanced mode lets a user ghost pages by modifying them from the content originally in the site definition stored on the server’s hard drive. The customized version of the page is stored in the SharePoint content database. Any changes made to the site definition files aren’t reflected in detached pages. This can create maintainability problems and should be used with care.

Note: This setting does not apply to pages that have already been detached as well as new blank ASPX or HTML pages created by the user.

Enable Customizing Master Pages and Layout Pages : This enables or disables the ability to customize master pages and page layouts. If unchecked, SharePoint Designer does not display Master Pages or Page Layouts in the Navigation pane.

Note: Page layouts also require publishing enabled on the server, so if you have enabled it here but the site is not a publishing site, you will still not see the Page Layouts option in the Navigation pane.

Enable Managing of the Web Site URL Structure: This enables or disables the ability to view and edit files in the URL structure of a site. If unchecked, SharePoint Designer does not display the All Files option in the Navigation pane.

Note: These four options are also available at site collection settings.

Sunday, July 22, 2018

Bridge Tables in datawarehouse

As the name suggests, bridge table is used to link tables when direct linking does not make sense. Usually used in a little complecated senarios when we have many to many relations between 2 tables.

Example: Consider we have multiple customer linked to a single bank account balance. Putting multiple customer keys in that fact would create obvious problems in the fact table.

Bridge table is also used sometimes to manage variable hierarchies. For e.g. to maintain one primary customer and one or more secondary customers.

Bridge table come to rescue in such situation, but just like outrigger dimension don't use it too much. Remember, every query has to pay toll to navigate the bridge.

Although, it is not a dimension but I personally call it Bridge dimension because it has many properties of a dimension.

Click here to read about the other dimensions in datawarehousing.

Tuesday, July 17, 2018

What is Outrigger Dimension and its purpose

Definition: As per Kimball, when in dimension in the DWH is linked with another dimension table, the secondary table is call outrigger dimension. For example a customer dimension referring to the Date dimension for the date of account creation.

Now, it might seem like snowflaked(not complete), but it is not. There are a few differences as detailed below:

Snowflake make the structure highly normalized where as outrigger makes it partially normalized.
Snowflake makes the ETL complicated and reduces the query performance.

While, Now outrigger dimensions are similar to snowflake but are vary limited in nature. They are just one or more levels removed fro the fact table. This is mostly used when there is a business requirement to slice and dice of the particular column.

Consider the following criteria before creating an outrigger dimension:

User Analytics Needed: If the user wants to slice and dice on the column
Data Granularity vs Cardinality : A column with high low granularity and high cardinality is a candidate.
Update Frequency : If a column is updated very frequently, not more than once a month
It is divided in the source itself.
Customer and it location would be a good candidate for outrigger dimension

For list of dimensions in data-warehouse, click here

Granularity and Cardinality of a table

What is granularity : Granularity is the measure of grain for any table. Facts are usually designed at the lowest level of granularity, such that we can trace the lowest level of details when needed.

What is Cardinality : Opposite to Granularity, cardinality is calculated at the column level. Cardinality refers to the number of unique values a table has. For example in a customer table Customer Name would have higher cardinality than location, whereas Customer ID would have the highest cardinality of one or 100 %.

Wednesday, July 11, 2018

Generic DateTime dimension - Part 2

In Part 1 of this post blog i talked about a generic dimension for Date, which is pretty simple and effective.

Continuing from there, below are the Calculation for each field:

[DATE_PKEY] AS (CONVERT([int],CONVERT([char](8),[DATE],(112))))
[DAY_OF_MONTH] AS (datepart(day,[DATE]))
[DAY_SUFFIX] AS (case when datepart(day,[DATE])=(13) OR datepart(day,[DATE])=(12) OR datepart(day,[DATE])=(11) then CONVERT([varchar],datepart(day,[DATE]))+'th' when right(datepart(day,[DATE]),(1))=(1) then CONVERT([varchar],datepart(day,[DATE]))+'st' when right(datepart(day,[DATE]),(1))=(2) then CONVERT([varchar],datepart(day,[DATE]))+'nd' when right(datepart(day,[DATE]),(1))=(3) then CONVERT([varchar],datepart(day,[DATE]))+'rd' else CONVERT([varchar],datepart(day,[DATE]))+'th' end)
[DAY_NAME] AS (datename(weekday,[DATE]))
[DAY_OF_WEEK] AS (datepart(weekday,[DATE]))
[DAY_OF_QUARTER] AS (datediff(day,dateadd(quarter,datediff(quarter,(0),[DATE]),(0)),[DATE])+(1))
[DAY_OF_YEAR] AS (datepart(dayofyear,[DATE]))
[WEEK_OF_MONTH] AS ((datepart(week,[DATE])-datepart(week,dateadd(month,datediff(month,(0),[DATE]),(0))))+(1)),
[WEEK_OF_QUARTER] AS (datediff(week,dateadd(quarter,datediff(quarter,(0),[DATE]),(0)),[DATE])+(1))
[WEEK_OF_YEAR] AS (datepart(week,[DATE]))
[MONTH] AS (datepart(month,[DATE]))
[MONTH_NAME] AS (datename(month,[DATE])),
[MONTH_OF_QUARTER] AS (case when datepart(month,[DATE])=(10) OR datepart(month,[DATE])=(7) OR datepart(month,[DATE])=(4) OR datepart(month,[DATE])=(1) then (1) when datepart(month,[DATE])=(11) OR datepart(month,[DATE])=(8) OR datepart(month,[DATE])=(5) OR datepart(month,[DATE])=(2) then (2) when datepart(month,[DATE])=(12) OR datepart(month,[DATE])=(9) OR datepart(month,[DATE])=(6) OR datepart(month,[DATE])=(3) then (3) end),
[YEAR] AS (datepart(year,[DATE])),
[YEAR_NAME] AS ('CY '+CONVERT([varchar],datepart(year,[DATE]))),
[MONTH_YEAR] AS ((left(datename(month,[DATE]),(3))+'-')+CONVERT([varchar],datepart(year,[DATE])))
[MMYYYY] AS (right('0'+CONVERT([varchar],datepart(month,[DATE])),(2))+CONVERT([varchar],datepart(year,[DATE])))
[FIRST_DAY_OF_MONTH] AS (CONVERT([date],dateadd(day, -(datepart(day,[DATE])-(1)),[DATE])))
[LAST_DAY_OF_MONTH] AS (CONVERT([date],dateadd(day, -datepart(day,dateadd(month,(1),[DATE])),dateadd(month,(1),[DATE]))))
[FIRST_DAY_OF_QUARTER] AS (CONVERT([date],dateadd(quarter,datediff(quarter,(0),[DATE]),(0))))
[LAST_DAY_OF_QUARTER] AS (CONVERT([date],dateadd(quarter,datediff(quarter,(-1),[DATE]),(-1))))
[FIRST_DAY_OF_YEAR] AS (CONVERT([date],dateadd(year,datediff(year,(0),[DATE]),(0))))
[LAST_DAY_OF_YEAR] AS (CONVERT([date],dateadd(year,datediff(year,(0),[DATE])+(1),(-1))))
[CALENDER_YR_QUARTER] AS (datepart(quarter,[DATE]))
[FINANCIAL_YR_QUARTER] AS (case datepart(quarter,[DATE]) when (1) then (4) when (2) then (1) when (3) then (2) when (4) then (3) end)
[FINANCIAL_YEAR] AS (case datepart(quarter,[DATE]) when (1) then (CONVERT([varchar],datepart(year,[DATE])-(1))+'-')+CONVERT([varchar],datepart(year,[DATE])) when (2) then (CONVERT([varchar],datepart(year,[DATE]))+'-')+CONVERT([varchar],datepart(year,[DATE])+(1)) when (3) then (CONVERT([varchar],datepart(year,[DATE]))+'-')+CONVERT([varchar],datepart(year,[DATE])+(1)) when (4) then (CONVERT([varchar],datepart(year,[DATE]))+'-')+CONVERT([varchar],datepart(year,[DATE])+(1)) end)

Generic DateTime dimension in data warehouse

If you have worked on a data warehouse you would surely have a dimension for Date. It is usually called Dim_Date, Dim_Time or Dim_DateTime. Don't get confused by the use of word time, the table is usually create with grain of one day unless there are some very specific reporting requirements.
So what should be columns in this dimension and how it should be populated and maintained. From my experience in datawarehousing, i have created a generic dimension for date which works in almost all scenarios and is does not require any maintainance.
Below is how the structure looks like.

As you could notice apart from the first column, rest of the columns are Computed. So by using this approach we need not to create any ETL for it and hence no maintenance pain.
At the moment there are 27 columns in the table above but you can surely add more as needed.

All we need to do is populate the first column with a simple SQL query and rest of the columns will follow.

Below query inserts the dates in the table from year 2010 to year 2022. You could update the dates as per your need

declare @start_date datetime, @end_date datetime, @cur_date datetime
set @start_date = '2010-01-01'
set @end_date = '2022-12-31'
set @cur_date = @start_date
while @cur_date <= @end_date
begin
insert into [Dim_Date](Date)
select CONVERT(varchar, @cur_date) as Date
set @cur_date = dateadd(dd, 1, @cur_date)
end

Formulas of the computed fields are available here.

What is grain of a table in data warehousing

In simple terms grain is the lowest level till which you could identify the data in a table.
For example: If you have table which holds the stock inventory and table holds:

One record for each day of the stock : Than the grain would be one day
One record for each hour : than the grain of the table would be hour
One record for each week : than the grain of the week would be week.

How does grain impacts the data warehouse:
It is important that while designeing the databse we design till the lowest grain for which we business needs to analyse or report.
With example above, If your Stock inventory table has grain of day you can easily calculate weekly stock but viceversa is not true.

Similarly, grain of related tables(Facts and Dims) should complement each other. For eg if Fact_Stock_Inventory holds data for each hour and Dim_DateTime has a grain of a day, it will generate distorted results.

Tuesday, July 10, 2018

What is Conformed Dimension and its advantages

People usually finds it difficult to understand the Conformed dimension, I am trying to explain it with an example. To start with, please do not confuse it confirmed / static dimension, because of the name.

Definition: A conformed is defined as a dimension, which has the same meaning to all the facts in a data warehouse, even if the facts belong to different data-marts.
It could one single table or multiple copies of same table across different marts. There is no hard rule of the copies having the same table name. One copy could even be a sub set of another copy.

Let us now try to understand it with a example. Consider online movie tickets data warehouse scenario.
Lets assume we have divided our EDW into 2 data marts

Marketing : Marketing team is more concerned about the no of tickets sold.
Finance : Finance team is more concerned about the profit.

Both teams would require to know the stats for each cinema. Now, both teams might also have different terminologies. Marketing team might call it Cinems whereas finance team might call it Screen.
So in this scenario we could create 2 copies of same table with different names as shown below.

But why should we use it. Why don't we create 2 separate tables if we need. Below are a few disadvantages of not creating Conformed dimensions:

There will more ETL work to generate the logic of different tables.
More ETL will surely mean some extra maintenance effort.
and possible difficulties in joining multiple data marts.

Visit my post on Types of Dimensions to understand more about different types of dimensions.

Monday, July 9, 2018

List of dimensions in a Data Warehouse

Below is the list of dimensions types we should consider while designing any data warehouse. Click on the link below for a detailed definition and example.

Basic dimension
Junk dimension
Conformed dimension
Outrigger Dimension
Bridge table
Snowflaked dimension (Details article in progress)
Role Playing dimension (Details article in progress)
Degenerate Dimension (Details article in progress)
Slowly Changing dimension (Details article in progress)
Inferred dimension
Shrunken Dimensions
Static Dimensions

This list will keep on improving with time, so you can bookmark this page and feel free to subscribe to the blog by entering your email

What is Enterprise data warehouse or EDW

Wikipedia says, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.
DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.
EDW can be logically divided into the data-marts depending on business areas like HR, Finance, Production etc. It is important to note that a data-mart is just a logical divide and you could still hold the data together physically

Below are major components of the any data warehouse:
1.) Date warehouse tables which include Facts and Dimensions. These are stored in a OLTP database.
2.) OLAP or Cube for reporting
3.) ETL Tools

Sunday, July 8, 2018

What is a dimension in data warehouse

As per Wikipedia, A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions.
Commonly used dimensions are people, products, place and time.

If we consider the Data-warehouse scenario of Online Ticket Booking, The first dimension that comes to mind is for List of cinemas.
So, if at any point of time business needs to find out how many tickets were sold for each cinema, they could simply join Dim_Cinemas with tickets table as detailed in below image.

This suggests that 2 tickets were sold in cinema C1 and 1 ticket was sold for cinems C2

There are many types of dimension in a data warehouse listed here.

Friday, July 6, 2018

Windows 7 is dangerous to use, Warns Microsoft

Microsoft recently published a post(in German) warning customers to move from Windows 7 to windows 10. The official new room warns that the ageing platform suffers from various failings including security issues and lack of hardware support.
Microsoft further added that windows 7 has an outdated security architecture, and people using it are more susceptible to cyber attacks.

Is there a real threat or it is just a scare tactic

Let me say it clearly, Windows 7 is totally safe and users should not panic. Is seems like a desperate attempt to push people to upgrade.
Provided, system is upgraded with latest updates, you their is nothing to worry.
Windows 7 supports ends officially in 2020, so you have ample time to move at your pace for upgradation.

Conclusion: windows 10 security architecture is better than windows 7, but there is no hurry to upgrade. Windows 7 was one of the best version of Windows, so you can comfortably take your time de-addict your self.

What is Junk Dimension and its purpose

Definition: By definition, it is a collections of random transnational code flags. It is usually created by joining multiple low cardinality dimensions into one.
As we merge different dimensions in to one, it will reduce the number of foreign keys in the fact table.

If we consider the Data warehouse scenario of Online Ticket Booking, The design should have following 3 tables, with fact having 2 foreign keys one for each dim.

Fact_Tickets
Dim Transaction Type
Dim Cert Type

Both Dims are of very low cardinality(i.e. no of records in each dim would be very low)
Transaction Type : The values could be Net-banking, Card, Wallet. The chances of increasing this count is very low
Movie Certificate : The values could be U, A, U/A. Again the chances of increasing/Changing these is very rare.

The picture below describes the 2 possible designs of it, if we keep these parameters in dimension table.

So with the help of junk dimension, we converted 2 forein keys in to one.
No of Records: Notice that, the no of records in the junk transaction is the multiplication of no of records in each dimension. In the example above there are 3 records for both Transaction and Certificate, so the total count in junk dimension is 9.
If we are merging 3 tables with 3 records each, the no of records in the junk dimension would be 27(3*3*3)

Ensure the dimensions we junk together should be of very low cardinality. I would suggest the no of records in the junk transaction should be less than thousand.

Please feel free to post your queries in the comments section and follow this blog.

Data-warehouse scenario - Online ticket Booking Business

I am writing a series of articles on Data warehousing design concepts. During the series i will use the following scenario in all my examples.

Scenario: Consider we are building a warehouse for a company which sells Online\Offline movie tickets. For e.g. BookMyShow in India and Movie Tickets in USA or UK.

The major business areas of would be:

Customer
Movie Details
Ratings
Tickets
Transactions
Venues
Cinemas
Promotional Codes
Will add more to this list as the series grows

Wednesday, July 4, 2018

Truncate Microsoft SQL Database without manually dropping Key constraints

We all have faced times when we want to truncate all the tables of a database, but to truncate a table we need to drop the all the constarints. It is painful to drop all constraints, than truncate and recreate the constarints.
If it is a small database this trick might work but for a big database it is next to impossible, specially when your database structure is changing over time.
Although there is no out of the box way to do it, but with a few line of code we could surely do it.

Following peice of code would do the trick. We can also create a procedure for it if needed regularly.

We will need 3 tables to store the data temporarily. Let's first check if they already exists:
IF OBJECT_ID('dbo.DropFkey', 'U') IS NOT NULL
DROP TABLE dbo.DropFkey;
IF OBJECT_ID('dbo.DROPTruncate', 'U') IS NOT NULL
DROP TABLE dbo.DROPTruncate;
IF OBJECT_ID('dbo.CreateConstraints', 'U') IS NOT NULL
DROP TABLE dbo.CreateConstraints;

Next step is to create a list of all the primary keys and store it into a table.
;WITH cte(consColumn,
foreignKeyName,
parentSchema,
parentTableName,
parentColName,
refSchema,
refTableName,
refColName)
AS (SELECT fkc.constraint_column_id AS consColumn,
fk.NAME AS foreignKeyName,
parentSchema.name AS parentSchema,
parentTable.NAME AS parentTableName,
parent_col.NAME AS parentColName,
refSchema.name AS refSchema,
refTable.NAME AS refTableName,
ref_col.NAME AS refColName
FROM sys.foreign_keys fk
INNER JOIN sys.foreign_key_columns fkc ON fk.object_id = fkc.constraint_object_id
INNER JOIN sys.tables parentTable ON parentTable.object_id = fkc.parent_object_id
INNER JOIN sys.schemas parentSchema ON parentSchema.schema_id = parentTable.schema_id
INNER JOIN sys.columns parent_col ON fkc.parent_column_id = parent_col.column_id
AND parent_col.object_id = parentTable.object_id
INNER JOIN sys.tables refTable ON refTable.object_id = fkc.referenced_object_id
INNER JOIN sys.schemas refSchema ON refSchema.schema_id = refTable.schema_id
INNER JOIN sys.columns ref_col ON fkc.referenced_column_id = ref_col.column_id
AND ref_col.object_id = refTable.object_id
AND parentTable.type = 'U'
AND refTable.type = 'U')
SELECT DISTINCT
foreignKeyName,
parentSchema,
parentTableName,
SUBSTRING(
(
SELECT ','+a.parentColName+''
FROM cte a
WHERE a.foreignKeyName = c.foreignKeyName
ORDER BY a.consColumn
FOR XML PATH('')
), 2, 200000) AS parentColName,
refSchema,
refTableName,
SUBSTRING(
(
SELECT ','+b.refColName+''
FROM cte b
WHERE b.foreignKeyName = c.foreignKeyName
ORDER BY b.consColumn
FOR XML PATH('')
), 2, 200000) AS refColName
INTO #FkeyDesc
FROM cte c;

Now that we have the list of all the foreign keys, we can drop the keys in the table. We would use the list of keys stored above to re-create the keys post truncation.
SELECT DISTINCT Row_Number() OVER ( ORDER BY foreignKeyName ASC ) AS RN,
'IF EXISTS
(SELECT * FROM sys.foreign_keys
WHERE
object_id = OBJECT_ID(N''['+parentSchema+'].['+foreignKeyName+']'')
AND
parent_object_id
= OBJECT_ID(N''['+parentSchema+'].['+ParentTableName+']''))
ALTER TABLE ['+parentSchema+'].['+ParentTableName+']
DROP CONSTRAINT ['+foreignKeyName+']' AS foreignKey_drop_script
INTO DropFkey
FROM #FkeyDesc;
DECLARE @Job NVARCHAR(MAX);
DECLARE @RN INT;
DECLARE @RN1 INT;
SELECT TOP 1 @Job = foreignKey_drop_script,
@RN = RN
FROM DropFkey
ORDER BY RN;

WHILE @RN IS NOT NULL
BEGIN
EXEC sp_executesql
@Job;
SET @RN1 = @RN;
SELECT TOP 1 @Job = foreignKey_drop_script,
@RN = RN
FROM DropFkey
WHERE RN > @Rn1
ORDER BY RN;
IF @@ROWCOUNT = 0
BREAK
END;

As the keys are dropped we could simply truncate all the tables, with the exception of tables which stores the list of keys to be regenerated
SELECT DISTINCT ROW_NUMBER () OVER (ORDER BY Name) AS RN,
' TRUNCATE TABLE '+ Name + '' AS TruncateScript
INTO DROPTruncate
FROM sys.tables
WHERE type_desc = 'USER_TABLE'
AND NAME NOT IN
(
'DropFkey',
'DropTruncate',
'CreateConstraints'
--If neede we can add more tables here and these tables would not be truncated.
)
DECLARE @RNTruncate INT;
DECLARE @RNTruncate1 INT;
DECLARE @JobTruncate NVARCHAR(MAX);
SELECT TOP 1 @JobTruncate = TruncateScript,
@RNTruncate = RN
FROM DROPTruncate
ORDER BY RN;
WHILE @RNTruncate IS NOT NULL
BEGIN
EXEC sp_executesql
@Jobtruncate;
SET @RNTruncate1 = @RNTruncate;
SELECT TOP 1 @JobTruncate = TruncateScript,
@RNTruncate = RN
FROM DROPTruncate
WHERE RN > @RNTruncate1
ORDER BY RN;
IF @@ROWCOUNT = 0 BREAK;
END;

Now that all the tables are truncate we could simply re-create the keys.
SELECT DISTINCT
ROW_NUMBER () OVER (ORDER BY foreignKeyName) AS RN,
'ALTER TABLE ['+parentSchema+'].['+ParentTableName+'] WITH CHECK
ADD CONSTRAINT ['+foreignKeyName+'] FOREIGN KEY('+parentColName+')
REFERENCES ['+refSchema+'].['+refTableName+']('+refcolname+')' AS Add_constraints_script
INTO CreateConstraints
FROM #FkeyDesc;
DECLARE @RNCREATE INT;
DECLARE @RNCREATE1 INT;
DECLARE @JobCREATE NVARCHAR(MAX);
SELECT TOP 1 @JobCREATE = Add_constraints_script,
@RNCREATE = RN
FROM CreateConstraints
ORDER BY RN;
WHILE @RNCREATE IS NOT NULL
BEGIN
EXEC sp_executesql
@JobCREATE;
SET @RNCREATE1 = @RNCREATE;
SELECT TOP 1 @JobCREATE = Add_constraints_script,
@RNCREATE = RN
FROM CreateConstraints
WHERE RN > @RNCREATE1
ORDER BY RN;
IF @@ROWCOUNT = 0 BREAK;
END

Performance: It seems like a gigantic piece of code, but i have tested it on database with more than 100 tables and a few million records. Execution time was approx 1 minute.

Feel free to post your queries in comments and follow this blog if you liked the post.

Tuesday, July 3, 2018

How to import large SQL files in database

I recently faced an situation, where i was supplied with a .SQL file which was exported from a large table with some million records. The file was exported from the generate script options of SSMS and had the insert statements. The size of the file was more than 2GB.

SSMS(SQL Server Management Studio) is not capable enough of handling such big files. It would through one following error:

Insufficient memory to continue the execution of the program.
or SSMS would simply reject the request to open the file.

SQLCMD is the life saver here. We could use SQL command line utility to execute the T-SQL commands. It is pretty simple to use as detailed below:

Open the Command prompt with the administrator privileges.
Execute the following command.


 sqlcmd -S SQLSERVER\InstanceName -i "F:\File.sql"

The difference: SSMS uses the .Net framework for execution where as the command line uses ODBC driver.