Tuesday, October 5, 2021

Working with Large SharePoint Online Lists

First and foremost.  Please don't do this if you can avoid it.  Just back away from the mouse and keyboard and rethink everything you're doing.  I personally hate myself that my own work has led to situations like this, but I recognize that sometimes these kinds of applications expand beyond your control.

So, for those of you who are trapped and have to come up w/ a solution, I've laid out some strategies for dealing with large Lists in SharePoint Online (let's say 5,001 - 500,000 items) and some of the issues you will encounter when using Power Apps to access them.

NOTE: As of 2021/10/5 my account has been throttled and my list import into PowerApps from Excel limited to 15k rows.  Working w/ MS to understand if they have a better answer than the non-answers we have today.  The key to avoid this is you can do everything I outline below, just don't open 3 VMs and start doing the same stuff under the same account in each.  :D

Overview

If you haven't encountered it, you should be aware that Lists in SharePoint Online have some limitations per sizing that are not immediately obvious.  While you can still put in millions of records (30 million at the time of this article) into these Lists, the Views within the SharePoint Online interface are limited to 5000 records.

Slightly more than 5000 records

When you are dealing with such large lists, there are a number of issues you can run into just using SharePoint itself.  Then add in that PowerApps wants to limit the # of items you can grab by default to 500 and can only go UP TO 2000, that's a lot less than 5000 (and certainly less than 238,175).

However, there are a few strategies you can use to work w/ PowerApps and large lists that are not immediately obvious.


Delegation

You're going to really struggle on this one because there are lots of ways in which PowerApps & SharePoint will not warn you for items that are not truly delegated.  Take for example the following query that SharePoint won't necessarily warn you about:

ClearCollect(myResults,
    Filter(
        mySPList, 
        Created<Today()-30
    )
)

This is designed to pull back all items that were Created over 30 days ago.  For lists with <500 items you would never notice a problem.  If you adjust your PowerApp to include up to 2000 records (the max - which I'll assume you've done as we continue) then you could avoid any issues for up to that many items.  However, the moment you'd pass over these thresholds, the specter of Delegation will rise up and punish you for your hubris.

You see, it will run this query, but only on the first 2000 rows.  So if the 2001st item was > 30 days old, then it would not be returned.  Even if the first 2000 items were < 30 days.  Read that again.  It did not even return the one item that matched this query because this was not properly delegated.

However, if you knew the exact moment when the 2001st item happened (e.g. Midnight on June 1st, 2021), then you could return it.  The following query would get you that 2001st item:

ClearCollect(myResults,
    Filter(
        mySPList, 
        Created=Date(2021,07,01)
    )
)

Now of course, this is a terrible use-case because using the Created value for exact matching is nearly impossible.  You instead will want to use a field where you absolutely can have an EXACT MATCH.  That means Date, Text, Number or other similar fields.

The problem amounts to using any non-exact value (> vs. = or any DateTime value vs. Date).  So if we had an ID of an item (e.g. 500,000), we could return it regardless of how far down the list it is, but if we tried to ask for items > than a specific ID value (e.g. >2000) then it would fail and return nothing.  Similarly, matching on a Date value works, while doing DateTime generally does not.

Microsoft's reference on delegation within SharePoint.  But take some of what they "claim" works with a (large) grain of salt.

NOTE: Some of what I just said "won't work" might work occasionally.  The issue is that when you start working with such large lists you need to stop trusting that SharePoint Online and PowerApps will allow this to work consistently.

Growing beyond 2000 records - Extending PowerApps Queries

It is essential once we pass 2000 records in a data source that we consider how to structure Delegation properly to populate our variables.  And again, while PowerApps doesn't warn us sometimes, it is necessary sometimes to better structure out commands to get good results.

Let's say we have 5000 records in our source.  We want to have all items where orderDate (a Date value in our List in SharePoint) is >30 days old, but we also want all items that have a status = Pending If there are less than 2000 items that have a status=Pending, you might assume you can do a compound query like this and would get all of your records:

ClearCollect(myResults,
    Filter(
        mySPList, 
        status="Pending"
        orderDate<Today()-30)
    )
)

Nope.  No joy.  But you also might have guessed that wouldn't work given my initial example above.  But let's get sneaky.  What if I trust that PowerApps will properly work from the inside out and build my query like this:

ClearCollect(myResults,
    Filter(
        Filter(
            mySPList,
            status="Pending"
        ), 
        orderDate <Today()-30
    )
)

This is (supposedly) doing a Filter based upon status=Pending, then Filtering that down to a date more than 30 days old.  And while this should work, it doesn't (reliably).  Same results as before.  So there's no way to make this work?  Well, this works just fine:

ClearCollect(myResults,
    Filter(
        mySPList, 
        status="Pending"
    )
);
ClearCollect(myResults,
    Filter(
        myResults,
        orderDate <Today()-30)
    )
)

We specifically did a Filter() on items with a status="Pending".  Then we did a Filter() on the items we returned for the orderDate.

Surprised?  While we'd expect that PowerApps would have worked with the first one, when it failed it made us think perhaps the second wouldn't work either.  But it works just fine.

This all comes down to that PowerApps isn't quite a smart as we might imagine it is when doing complex formulae.  Therefore, it is essential that you sometimes manually break things up (like I have above) to properly work w/ a large list.

NOTE: Again, some of the delegation will work as documented by Microsoft sometimes with some fields.  It is just not always clear if/when this is not working as expected.  Although Microsoft does warn on delegation and does say inside-out filtering SHOULD work, YOU CANNOT RELY ON THAT.

A Note on Indexes

I started using a field orderDate above as a key indicator for items I care about in my queries.  It is worth pointing out that if you are going to target a specific field for these kinds of operations, you probably should go in and set that as an Index field in SharePoint.  

While this isn't required, doing so will help optimize your list sooner vs. later.  SharePoint will definitely figure it out over time and eventually build this Index for you, but you might consider just doing this in advance and avoiding any headaches near-term.



Growing beyond 5000 records - Breaking SharePoint Online Views

This section is specifically talking about viewing records within SharePoint Online vs. within PowerApps.  We'll get back to that in a moment.

With >5000 items, not only PowerApps starts to break down, but SharePoint Online as well.  The viewing limit is 5000 rows / list in SharePoint Online.  There are ways around this, but for the most part, once you reach 5000 rows, you should start betting that users interacting directly with your list will have...issues.  

However, you can manage to cheat your way through to your destination if you have enough information.  Filters will not necessarily solve for all problems, but you should start there to get some ideas on how to manage to only grab the rows you care about (that are hopefully less than 5000).

On a very basic level, if you sort Newest vs. Oldest on the Created field, then you'll get the first 5000 in either direction.  So if you're working with just past 5000 then this can generally help you sneak through and still function.  But once you get past 10,000 you'll again miss records.  Oldest -> Newest gives you the first 5000, Newest->Oldest gives you the last.  But what about the middle?

To extend beyond these limits, you should consider using a tool that will pull down the entire data set and let you work with it.  That likely means either Excel or Power BI.  While you can still play w/ the SharePoint Online interface to "sorta" make things work, these tools will work reliably and with a few hacks, still allow you to use SharePoint Online's interface for what it is pretty good at.

Let's say you have a Power BI query that is pulling the first 50k rows from your list and then slicing the data to show you a few items you will need to hand-edit in SharePoint.  If you go to your SharePoint list manually and Filter it by any ID you see (e.g. 12), then you're almost there.  Copy that URL from the browser and use that as your starting point.

Take the example URL below:

https://{...yourSiteAndListURL...}?FilterField1=ID&FilterValue1=12&FilterType1=Counter&viewid={GUID for your list's view}

If we take the blue text above, we can replace the "12" with any ID we choose in our list.  Therefore, even though an end-user could not easily Filter or jump to ID=25,999, we could build out a link in PowerBI that would take them directly to it:

https://{...yourSiteAndListURL...}?FilterField1=ID&FilterValue1=25999&FilterType1=Counter&viewid={GUID for your list's view}

This will work all the way up to the SharePoint maximums.

As I mentioned above, you also can use Excel.  


Exporting your entire list to Excel will let you at least search the list so you can do a similar trick as above with PowerBI and grant you a link that could take you straight to that item in SharePoint to edit it.  

Nifty!

NOTE: Microsoft again throttled this.  Your mileage may vary for larger lists in excess of 15k rows.

Populating >2000 records in a Collection

Getting back to PowerApps...

While you might think that you cannot put more than 2000 records in a Collection in PowerApps, it will only take a quick test to show this is not true.  Just import an item from Excel w/ 3000 rows and you will see that Collections can be much larger than 2000.  As an example, at this moment, I am working with two ~250,000 row Collections I imported from Excel that I can still Sort/Search/Filter (although slowly) via a Dropdown.

NOTE: I wouldn't try to push this out to a broad user base of course, but for some admin work...

Now that you know you can have more records in a Collection, there remains a question:  how do I get >2000 out of SharePoint (or Dataverse, etc.) vs. a manual import?

I gave you a clue above in using multiple Filter() statements.  Combining this w/ a Collect() instead of a ClearCollect and we can start to build out  a Collection that is much larger.

Let's say we are wanting to pull in data from a series of dates.  Each one has less than 2000 records per day, but over a 31 day span has 20,000 total.  We can do something like:

ClearCollect(//Put Today's items in List
    myResults,
    Filter(
        mySPList,
        orderDate = Today()
    )
);
ClearCollect(//build list of dates to avoid Delegation issues
    dateList,
    ForAll(
        Sequence(30) As counter,
        {dateInHistory: Today() - counter.Value}
    )
);
ForAll(//add items by each matching date to list
    dateList As counter,
    Collect(
        myResults,
        Filter(
            mySPList,
            orderDate = counter.dateInHistory
        )
    )
)

NOTE: I am using the Sequence() function here to generate a 30 item list of dates and also using the As operator to bypass issues w/ referencing rows properly.

Let's take a look at how this works.  First we build out a Collection of items ordered Today():

ClearCollect(//Put Today's items in List
    myResults,
    Filter(
        mySPList,
        orderDate = Today()
    )
);

This should have zero delegation issues for us.

Next we build out a list of Dates.  Why?  Because if we attempt to do Operators on Dates in the middle of our Collect statement then we may have Delegation issues.  It is always better to simply do these operations first when dealing w/ such large datasets to avoid issues.

ClearCollect(//build list of dates to avoid Delegation issues
    dateList,
    ForAll(
        Sequence(30) As counter,
        {dateInHistory: Today() - counter.Value}
    )
);

Now we have two Collections.  One (myResults) is our list we still need to add to, and the second (dateList) is our list of dates we want to add in.

Finally, we dump in all the remaining bundles of data into our first Collection by iterating through our second Collection.

ForAll(//add items by each matching date to list
    dateList As counter,
    Collect(
        myResults,
        Filter(
            mySPList,
            orderDate = counter.dateInHistory
        )
    )
)

In the end, we have a Collection that started at up to 2000 rows, then we added 30 more entries of up to 2000 rows each for a max of 62,000 rows.  

Just Getting A Large List Populated in SharePoint Online

NOTE: This is very much a work-in-progress for me as I don't have great recommendations.

There are a number of blogs about getting large amounts of data into SharePoint but no solid and definite answer.  I've tried a few methods and found them to be highly intermittent.  

Ultimately, this is the method I've leaned on for reliable results.

1) Add a rowID to your raw data.

If you add a column to an Excel Table and put an Index in there (as simple as A1+1), then this will get you to a point where you have your own referenceable unique value.  You can check and see if your uploads worked and if any rows were missed.  Don't rely upon row counts to save you.  The last thing you want is to come back in 3 hours and see that you have 80% of the records in place, but the last record you see is near the very end.  Finding the ones that got skipped will drive you insane.

2) Test a small subset for "special characters"

If you're putting in a bunch of data from "somewhere else", odds are you might encounter the situation where the "strings" have characters you don't expect.  In fact, some of them might blow up SharePoint if you try to import them directly.  You should purge any non-standard ASCII characters before you do an import or your 35,000 item list could become a 1,000,000 item list just because you have some dirty data.

You can do this w/ some simple Find/Replace work or doing an export/import out to text and some tools to purge those, or even convert your data to JSON() and back again using the new built-in tools (which should fix most special character issues magically), then bring them back into Excel.

3) Have a good connection.

I cannot swear by this, but I seem to get more complete lists when uploading from our corporate network than from my home one.  I have a very reliable low-ping connection, but then again, I don't have an ExpressRoute set up directly to Microsoft from my home.

If you're going to push hundreds of thousands of records up to Microsoft's cloud, then you want to avoid problems like I described in step 1) above.

4) Start with Excel Tables.

There's a nifty feature hidden in the Excel ribbon under Table Design:  Export.

If you drop your data into Excel, turn it into a Table (select all data | Insert | Table), then click on Table Design (far right of ribbon w/ any part of the Table selected), you'll see the Export function available to you.  The first choice is the one you want: Export Table to SharePoint List.


From there, you drop in the URL of your SharePoint Site (just the base site URL vs. an existing List URL).  It will prompt you for a Name and Description of your Table and use these to create a new List for you in SharePoint.

Once you start this process it will take awhile.  In that it does not seem to matter per the network connection up to Microsoft speed-wise (anything reasonable works similarly).  I was watching it process about 1,600-1,700 records / minute from our corporate network and only slightly slower from home.  

However, the main problem you might have is that this can stop almost anywhere during the process.  It won't show you an error, in fact it will say it worked and completed successfully.  Even if it stopped after 35,000 records and you still have 300,000 to go.  That's why Steps 1) and 2) above are important.

NOTE: Microsoft has some recommendations in various areas per data import to limit requests to 15-20k records.  I'm reviewing some methods to chop up CSV's into 15k blocks before converting them to Tables for processing.

5) Fix any problems.

This section is the fuzziest on my plan as I'm still adapting it. Depending upon the results from your import I might suggest different approaches.  If you have a ridiculously small % of your list, just delete the List and start over (if you can do it, empty the Recycle Bin on your SharePoint site afterward).  If you're closer to the end or in the middle, then you can decide one of several approaches.  However, both primarily come down to using PowerApps or PowerAutomate to roll through the two lists and do a compare to see if records are missing.

NOTE: If you try to delete a list that is "too large" then SharePoint won't delete it.  You'll first need to get your list down below X before you can delete it.  In my case X was ~80k records and was purged via a PowerApp that (slowly) removed records ~20k at a time.  Sure I could use PowerShell, but I'm trying to restrict myself to more end-user-facing tools.

This is MUCH SLOWER than the Excel Export above, but can also be validated/stored as you go if you capture the results of your Patch() statements like:

Clear(writtenRecords);
ForAll(
    missingRecords As thisItem
    Collect(
        writtenRecords,
        Patch(
            mySPList,
            thisItem
        )
    )
)

or alternatively

ClearCollect(
    writtenRecords,
    ForAll(
        missingRecords As thisItem,
        Patch(mySPList,thisItem
        )
    )
)

Note: I prefer the former method here because I can use a CountRows(writtenRecords) elsewhere to watch the count as it rolls through within the UI.

At the end of this, writtenRecords should hold all of the rows that were successfully written out to SharePoint and you can do any updates, error-checking, etc. using it.

To jump-start my own effort, I started with an export of the "nearly complete" SharePoint list to Excel, then imported both the original list and the partially complete list into PowerApps as local Collections.  Then I used that local data to do the comparisons.  

In Conclusion

First, see Microsoft's notes on managing large Lists in SharePoint Online.  These Lists aren't going to act as nicely as you might be used to, so best to start reading up on how to manage them.

Many of the tools designed to import data into SharePoint are not great at doing this w/ large lists for SharePoint Online.  There remain questions on what a "best practice" would look like from Microsoft (probably because they would say their "best practice" is NOT to do this).

Once you get the data into a Collection within PowerApps you can really work on the data well.  So in many cases, it comes down to jumping through a few hoops to copy the data out of SharePoint and into memory, then doing the changes to the list in memory, then writing it all out to SharePoint Online once you're done.

Finally, Microsoft has a lot of documentation around Delegation that works...sometimes.  I honestly recognize that when we're working w/ massive lists we have to really be more cautious w/ who we trust to fix our own personal messes.  In this situation, the "gotchas" are hidden all over the place and not well documented.  Therefore, ignoring some of what Microsoft says is POSSIBLE and instead using my methods above which are RELIABLE will mean you can still work with 200, 2000, 20000, 200000 records and any issues you encounter will be predictable and consistent.

3 comments:

  1. Thank you, I really enjoyed this article. Would love to find you on LinkedIn -- is there a way?

    ReplyDelete
  2. Thank you for this article. It's greate. Please continue sharing if there is any update.

    ReplyDelete
  3. I also find that my article on "Importing Excel data at run-time (15,000 row import workaround)" to be valuable when working w/ highly dynamic real-time data where you also don't want to wait on the extremely SLOOOOOW process of querying Excel directly. A quick up-front import can get you tens of thousands of records jammed into a Collection to do "something" with and it will only take a few seconds to parse out even a pretty wide table (I've tested out to A->CZ but started hitting limits).

    ReplyDelete

Because some d-bag is throwing 'bot posts at my blog I've turned on full Moderation. If your comment doesn't show up immediately then that's why.