Monday, October 28, 2024

Distinct() on multiple columns

Distinct() is a handy function for finding unique values within a Collection.  However, it can only do this on a single column.  That limits the value unless you start to get creative on your lists.

Combining multiple columns into a single string that you can later Split() (if needed) can allow you to do a multi-column Distinct fairly easily and use the results readily as well.

Overview

Let's say you've got some kind of export from another system where you're getting a list of individual people who have access to certain Company's, Business Units, and Cost Centers.  The hierarchy in many ERP systems essentially look like:

  • Company
    • Business Unit 
      • Cost Center
When exports come out of these systems, they are fairly flat files that might look something like this:


While you could of course do some elaborate pre-work to put this data into cross-referenced tables, you don't really have to do that.  You can work with these tables (within reason) immediately and instead of doing crazy table structures, just work w/ this one table and do some manipulation to make it more like a traditional RDBMS experience.

One of the kinds of things you might want to do is to see some of these details as they relate to certain users, or certain companies, or whatever combo you want to create here to reflect your view.  If we want to see companies by users, business units by company, etc., then we need to do a Distinct() on this to remove duplicates.  However, we also need to filter the list to do this.  In some cases a complex Filter() pattern can be used, however, sometimes it is better/faster to do a Distinct() that covers multiple columns.  So how do we do that?

Combine Your Columns With a Delimiter

If we start with a Collection containing the data/columns above named myOriginalData, then we can do the following:

ClearCollect(
    myExtendedData,
    AddColumns(
        myOriginalData,
        userAndCompany,
        ThisRecord.email & "|" & ThisRecord.company
    )
)

What this gives us now is a Collection named myExtendedData that includes all of the columns from above, but also includes a merged email and company fields with a pipe character between them similar to:


Now, if we want to show a list of companies each user has access to (and perhaps do something about that, like deny, approve, etc.), then we can now do a Distinct on the new column.

ClearCollect(
    uniqueCompaniesByUser,
    Distinct(
        myExtendedData,
        userAndCompany
    )
)

This gives us a Collection of the unique values inside the single column userAndCompany that actually is a combination of two columns.



Using the Distinct Collection and Pulling the Combined Fields Apart Easily

Let's say you want to break this apart.  Whether you do this right at the beginning or later in your app, the approach is the same.  You now have a non-unique identifier from your original list or a unique one on your new list.  Depending on what you want to do, there are a few functions that will help you:

  • RenameColumns()
  • AddColumns()
  • Index(), First(), or Last(),
  • Split()
  • LookUp()

So let's make our new collection hold data similarly to our original list.  Consider the following code:

ClearCollect(
    uniqueCompaniesByUser,
    AddColumns(
        RenameColumns(
            Distinct(
                myExtendedData,
                userAndCompany
            ),
            Value,
            userAndCompany
        ),
        email,
        First(
            Split(
                userAndCompany,
                "|"
            )
        ).Value,
        company,
        Last(
            Split(
                userAndCompany,
                "|"
            )
        ).Value
    )
)

Look at this from the inner-most (most indented) code segments first.  The Distinct function works as before and gives us the list from before, Then the RenameColumns() function is changes the column name Value back to what we had originally in the first Collection to userAndCompany.  

From there, we add two more columns email and company (similar to the original list) and populate them w/ the values from the combo column.

The fun part here is that if you're looking at this closely, the Split() command returns again a single column list similar to the Distinct function.  We're just taking the First or Last (but you could also use the Index function to grab any of them).  This can be applied in many ways to easily reference items inside a linear list of objects all mushed together into a single field.  Just like a row from a CSV.

Let's try an alternate approach that might be more familiar using LookUp():

ClearCollect(
    uniqueCompaniesByUser,
    AddColumns(
        RenameColumns(
            Distinct(
                myExtendedData,
                userAndCompany
            ),
            Value,
            userAndCompany
        ),
        email,
        LookUp(
            myExtendedData As nextItem,
            userAndCompany = nextItem.userAndCompany
        ).email,
        company,
        LookUp(
            myExtendedData As nextItem,
            userAndCompany = nextItem.userAndCompany
        ).company
    )
)

In this scenario, I used the LookUp() function to go back to the original list and retrieve them.  LookUp always returns the first/top result, so even if there are multiple matches on the userAndCompany column then it will still work fine.

Matching More Fields

The exact same approach can work if we wanted to create some kind of Collection where we are doing unique calculations that involve any number of fields (within reason).  

ClearCollect(
    myExtendedData,
    AddColumns(
        myOriginalData,
        userCompanyAndBusinessUnit,
        ThisRecord.email & "|" & ThisRecord.company & "|" & ThisRecord.businessUnit
    )
)

The key here is that then you might need to use the Index() command to access fields beyond the first/last when using the Split() function.

Final Thoughts

Whichever approach you use things will work, but the Index/Split approach is something worth learning for other scenarios.  LookUp is much more widely used and known, but sometimes merging multiple values into a single field in a more linear fashion (vs. say JSON()) will be more familiar with people who are working with CSV exports from other legacy systems.

Don't be afraid to create copies of existing Collections if you want to use subsets by rows or columns.  As well, you might occasionally need to fill a base Collection first w/ blanks for certain calculated columns, and then use an UpdateIf() afterward to update the blank fields to the real values. 

No comments:

Post a Comment

Because some d-bag is throwing 'bot posts at my blog I've turned on full Moderation. If your comment doesn't show up immediately then that's why.

DIAF Visualpath team