For large datasets and sometimes querying APIs or Power Automate queries to retrieve information concurrently, it is helpful to use the Concurrent() function to retrieve multiple items at the same time. However, Concurrent() cannot exist within a ForAll() and when processing groups of items it may be required to chunk them up to avoid overloading a datasource (can you really query 1000 rows simultaneously from Excel or your APIs without breaking something?)
This post walks through how to process chunks of data simultaneously while also giving a nod to the limitations of certain back-office platforms.
In my own scenario, this process is querying either an API (if it is working) or an Excel export from the same ERP. As we have ~45,000 records in one of those Excel files and roughly ~200,000 in another, these queries can take up to 30 seconds to retrieve a value from one of those files per user. So splitting up these tasks to take place concurrently is essential.
TLDR Setup
In your Power App you'll use a button to start the process, then one to actually do the Concurrent statements, then a Timer to trigger the Concurrent button to run again if there are any remaining items in your list.
This avoids infinite Loop errors |
For the process you will:
- Start Processing:
- Grab all the items into a Collection you want to process (pendingCol)
- Create any counters or warnings you'll want to display (processing popup, etc.)
- Select the other button
- Run 10 queries Concurrently (you can change 10 to whatever you think your system can handle for concurrent queries safely)
- Grab the first 10 items from the full list and put them into a temp Collection (tempCol) for processing
- Create a results Collection (resultsCol) of where you will store everything that returns
- Have a Concurrent() statement that puts the results from each simultaneous run into a local variable to store for this loop
- UpdateContext({results1:pAuto.Run()...}),
- UpdateContext({results2:pAuto.Run()...}
- At the end, process anything you've returned as needed (ParseJSON(), etc.) and save it to your results Collection (resultsCol)
- Start your Timer:
- UpdateContext({startTimer:true})
- Timer OnStart (duration generally doesn't matter but probably set it fairly low (e.g. 1000ms)
- Remove first X items from your pending Collection (pendingCol)
- Remove(pendingCol, FirstN(pendingCol, 10))
- Check to see if there are any remaining items in your pending Collection (pendingCol)
- If(CountRows(pendingCol)=0, ...
- If there are remaining items then:
- Stop the Timer:
- UpdateContext({startTimer:false})
- Select the Run 10 queries Concurrently button again
Detailed Walkthrough
Let's say you have a list of emails or people IDs you're wanting to gather data from some external system that is slower than you might like. This could be Excel, some ERP platform, etc. You want to grab a bunch of data at the same time w/ concurrent sessions to try and speed things up. We give our end-user a Multi-line text Input field that they can drop in a series of items into and we're going to process these until they're all done.
Lots of people start w/ Excel lists so for large groups it's a good assumption |
If I start with a group of say 23 ID's, then I'd need to loop through my processing button 3 times (2 for a full 10 people then one last time for the remaining 3).
My initial button will initialize things, then call the processing button, which will start the timer when it is complete. Timer will re-call the processing button until we're done w/ our list 10 at a time.
Initial Button
Taking a list like I have shown above and splitting it into a Collection is simple enough:
This takes a list of items dropped into our control (personNumInput) and creates a new Collection (tmpCol) that holds everything we're planning to process.
We've created a total count of everything we're planning to process (itemCount) and cleared out final Collection that we'll use to capture all of our results as each loop completes (resultsCol)
Note: I am keying on a value personNumber for this example. Of course, you might have a list with emails, order #'s, etc. for your own scenario. Replace the blue text above as applicable.
Processing Button
This part will be specific to your use-case, but in general you'll probably want to:
- Create another temporary Collection of the next 10 (or whatever) items you're going to process/parse from your source (tmpNextTenCol)
- Do your Concurrent Loops
- Then update your Collection above
- Optionally pull out any relevant JSON values if you're doing that
- Optionally write that data out for permanent storage in a more accessible location (e.g. SharePoint).
- Add the temporary Collection details to your final Collection (resultsCol) so your end-user can see all the results at the end
- Start the Timer
Here's an example of how this might look in the OnSelect code:
Creating a temp Collection for your data
All of the blue items of course would be specific to your own use-case. Just whatever fields you're going to want to ultimately populate or work with. In this case, my Power Automate script is returning two JSON blobs (employee Details and Benefits) that I will store in distinct fields while also pulling some data from those blobs into other fields.
Concurrently processing your data
This will be very similar for your own scenario. Just change the numbers to reflect how many items you are processing. You'll have to create individual variables for every item you are going to concurrently process.
Again, you'll update the blue text to whatever your Power Automate is called and also whatever variable you're using to send into that for each query (personNumber in my case). I've cut out the middle code here but you'll of course need a section for every one of your items you'd want to concurrently process.
Update your second temporary Collection (tmpNextTenCol)
This is somewhat optional, but it is a good practice to create what you believe you're going to write before you write it to your ultimate source. This is an example of how we'd take the returned values from our ten concurrent items and update them:
Again, the blue text would reflect your own variables/values you're sending/returning. In my case, I am using the personNumber value to match and identify employees whose data will be updated.
(Optional) Pulling out some JSON field(s)
In my scenario, I'm going to pull a couple of values out of those JSON variables and store them in my Collection to ultimately write directly to SharePoint (to simplify searching/sorting). Here's an example of that code as well:
Again, replace the blue items with values that are meaningful to your own process. I'm assuming all values coming back are Text() values in JSON in my example but yours may differ.
(Optional) Save your data
As you're processing, you might want to go ahead and save this information to a more accessible location (e.g. SharePoint) for caching or storage as a part of your process.
Here's an example of that:
Again, replace the blue items with fields that are relevant to your process. This is just going to essentially store these as we process each ten (or less) items within each loop.
Update your final Collection
This will update the final list so you can display the progress or all data to your end-user at the end.
See, not everything is complicated. :)
Start your Timer
Last step is to start your timer so it can check to see if there are any more rows to process and then start another loop as needed:
Again, simple.
Timer Logic
This is the last part and much simpler than the prior section. We will:
- Change the Duration to 1000
- Change the Start to startTimer (the variable we set=true at the end of our button code)
- Change the OnStart to be our code below:
That's it. After testing, set the Visible property to false for the Timer and the processing Button and you're good to go.
Limitations of the Concurrent() function
There are multiple reasons we have to do this kind of workaround, but the primary one is because we cannot update a single Collection within multiple Concurrent code segments. For example, this will not work:
You cannot run Collect (or ClearCollect) on the same Collection in multiple Concurrent steps. You also cannot loop this inside a ForAll(Concurrent()) without also running afoul of similar kinds of restrictions on either ForAll() or Concurrent(). So setting these up as individual local variables via UpdateContext({}) is one of the best ways to get the job done (even if it is cumbersome).
Why the Timer?
Again, PowerApps stops you from doing recursive kinds of loops where buttons call one another and could arguably create a infinite loop. Even if your code would of course not allow that to happen, it simply won't allow one button to call another button and then the same button to call the original. And no, you also can't try to fool it by chaining it further (I tried) and including a 3rd button. However, apparently the very smart AI-written code Microsoft is shoveling out didn't think to block Timers. So enjoy!
Final Thoughts
In my scenario, I have this combined with a Power Automate script that also is set to do parallel processing against two Excel files (worst case) or two APIs (best case) simultaneously. There are an array of ways in which you can do some parallel processing to speed things up. However, it can become a little tedious and overly manual to do it.
Make sure to include some information for your end-users while you're processing things. I used the Boolean working to indicate if we're processing items or not and show an overlay to let the user understand the progress which then disappears when the processing is completed.
It is possible to also play with the Concurrent setting within Power Automate to say send it a list of IDs and have it then run them all together and do "something" to try and make this happen. However, I will say that I've had some very weird things happen when enabling that within Power Automate. It works, mostly, but when it doesn't, it is very bad. In my scenario above, each call to Power Automate exists in its own instance and no data corruption should happen (I bolded that word for a reason).
Do note that of course you could do >10 simultaneous calls if you just keep copying/pasting/editing those segments of code. However, there does exist some upper bound for how many sessions you can have running simultaneously both within PowerApps and/or for your data sources. I can't say what that is, but I've definitely tested 10 sessions with some very large Excel files, so that feels fairly safe. Whether I'm asking for 1 value or 10 simultaneous values in unique sessions, they take the same amount of time and haven't yet caused Microsoft to throttle me or blow up my account.
No comments:
Post a Comment
Because some d-bag is throwing 'bot posts at my blog I've turned on full Moderation. If your comment doesn't show up immediately then that's why.
DIAF Visualpath team