Geeks With Blogs

Tangible Thoughts SharePoint, MOSS? and all the other questions

Here are few lessons learnt from playing with the BDC on a project

1. Even though the BDC can crawl associated entities in your database schema, the search results will not show related entity instances for the item you searched.

2. In case of mapping 'complex' db schemas specially ones that contain m:n relationships between entities, sometimes it might be advisable to show un-normalized relationships via stored procedures or views, but this approach provides the following roadblocks

    1. Un-normalized views/data will often have composite keys to uniquely identify a single row. You will have to define separate identifiers for each component of your composite key.
    2. More often than not when there isn't an associated entity to the parent entity the columns representing that entity in the un-normalized entity will be null, including part of the composite key. This is an issue with the BDC. If a key or part of a key is null, the BDC ignores that entity instance, so you would need to pad your null keys, (in SQL use ISNULL or COALESCE).
    3. If you are using a Stored-Proc to provide your un-normalized view then you would be defining your MinGeneratedKey and MaxGeneratedKey as Stored-Proc parameters, which will be used in the 'where' section of the query. Please note; remember those padded keys, you would need to pad them in your Where clause as well because when SQL executes the "Where" component of the query, the values returned will be null and outside your MinGeneratedKey and MaxGeneratedKey range. (Where component is executed before your Select, so padding your null keys in the Select portion of query is not enough)

3. Moving on to incremental crawls, the only reference to creating them in MSDN or the OSS SDK is a slightly confusing note on this page. To implement incremental crawls on the BDC you would need the following

1. You would need some column on your table/view/or Stored-Proc to indicate the last modified time of that entity instance. Adding a timestamp column to your tables is the easier approach since it requires no change to any application logic to update the last modified time. Point to note is that if you are combining tables to create an un-normalized view then you would need to calculate the greater timestamp in SQL from your combined entity instances. Also note, if you are using a timestamp column, you would need to cast it to DateTime in SQL for it to be of any use to the BDC.

2. In your IdEnumerator method you would need to declare a type descriptor for this column in your return parameter like so.

<Parameter Direction="Return" Name="entityReturnParam">
    <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="entityDataReader" IsCollection="true">
        <TypeDescriptors>
            <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="entityDataRecord">
                <TypeDescriptors>
                    <TypeDescriptor TypeName="System.Int32" IdentifierName="entityPK" Name="entityPK" />
                    <TypeDescriptor TypeName="System.DateTime" Name="timestamp" />
                </TypeDescriptors>
            </TypeDescriptor>
        </TypeDescriptors>
    </TypeDescriptor>
</Parameter>

3. You will need to declare a propperty on your entity that refers to the timestamp column defined above, but this propperty needs to be called __BDCLastModifiedTimestamp and should be of type string

<Entity EstimatedInstanceCount="0" Name="entityName">
    <Properties>
        <Property Name="Title" Type="System.String">entityName</Property>
       
<Property Name="__BdcLastModifiedTimestamp" Type="System.String">timestamp</Property>
    </Properties>

Cross-posted from tariqayad.com

Posted on Wednesday, May 23, 2007 4:21 AM SharePoint | Back to top


Comments on this post: Lessons Learnt on the BDC

# re: Lessons Learnt on the BDC
Requesting Gravatar...
I created custom web part to search BDC.
I created entities using 2 tables 1:m
I Un-normalized views/data to eliminate repetition of records.
I used FullTextQuery and TrimDuplicates = true to remove duplicate records, I select only fields in first table, query returned duplicate records.
Questions:
1. How can I select distinct records.
2. Also I have problem with Group By and Order By
I works only with existing only, i doesn't work with metadata mapping
Left by Hassan on Dec 17, 2007 7:11 PM

# re: Lessons Learnt on the BDC
Requesting Gravatar...
If we add one timestamp column act as LastModified, should we convert the timestamp to datetime in IdEnumerator Method, and return this converted field in SpecificFinder Method.
I found that if we use timestamp,and convert it to datetime, eveytime incremental crawl, it still act as full incremental. Any suggestion, please let me know. Thanks.
Left by Allen on Apr 21, 2008 9:25 PM

# re: Lessons Learnt on the BDC
Requesting Gravatar...
Hi Allen,
Is it indexing all the rows or only the row that has changed.
What I found out was that even with using the timestamp, the crawl would go through every row to find which row had changed, and then index only that row.

-Tariq

Left by Tariq on Apr 21, 2008 10:49 PM

# re: Lessons Learnt on the BDC
Requesting Gravatar...
Can you describe how to use timestamp as LastModifiedTimeStamp? I just converted them to datetime, but it returns like 1900-mm-dd:hh.... And when incremental crawl, it still acts as full crawl.
But when the bdc data has one datetime column like UpdateTime or CreateTime, which I set as LastModifiedTimeStamp, when incremental crawl, it's ok,crawl will finish soon, and the log is smaller.
What I did for timestamp:
in specificFinder / IdEnumerator, Convert the timestamp to datetime.
Set this converted field as __BdcLastModifiedTimestamp.
Any suggestion? Thanks.
Left by Allen on Apr 22, 2008 2:40 AM

# re: Lessons Learnt on the BDC
Requesting Gravatar...
Hi Tariq,

I might find the reason:
The table I crawled has a lot of rows which do not match the conditions. When incremental crawl, the engine will crawl all those rows, so that cost a lot time and space.
I think the incremental crawl will be more effective when most of data are indexed. What do you think about?
Left by Allen on Apr 22, 2008 8:22 PM

Your comment:
 (will show your gravatar)


Copyright © Tariq | Powered by: GeeksWithBlogs.net