How to compare xls data with Cosmos Data

Cosmin Vladutu
4 min readApr 26, 2024

Intro

For the third time I needed to do this kind of “processing”, last time yesterday, a colleague of mine asked me to publish this so it will help us both in the future.

Problem

The Business department gave me an Excel with some data, and we needed to make a comparison between our data (what we had in our internal system) and their data (what it was in the “back office”).

For the example's sake let’s say the data was an ID, Email, FirstName and LastName and to make it more fun let’s say in our Cosmos database the object would be pretty fat with a lot of things (props, lists and so on).
To continue with our example, let’s say the result should look like this: Id, our Email, their email and if they are the same (a true or false or n/a if it was not found).

Disclaimer: The solution shouldn’t involve a console app, WPF app, or whatever app that does the job.

Solution

Phase 1

Have fun by writing the query for our cosmos in which we select only what we need, ID and Email. After long hours of working if you don’t get it right, be lazy. Get an object from your database, obfuscate/anonymize the data, and give it to ChatGPT and ask him for the query. Your query and his will probably look different, but who cares, if it works?

Phase 2

You have now the data in your browser and their data in Excel. You need it in the same format. My option was to install the Cosmos migration tool and use it to extract the data (as JSON).
The source is your Cosmos Db, and the target should be a local file, in there you’ll also add the query from Phase 1, and in a few seconds, you should get your data.

Cosmos migration tool

Phase 3

Get the data in the same format. For this you can you Excel to import the data from JSON to a table

Excel — Import data

Phase 4

You should now have your data in a sheet, so, copy-paste their data into another sheet and let’s have fun with Excel. Do have them sortable, first, make for both data structures some tables. This is not necessary for our use case, but it looks better.
To make it more fun, let’s say their data means 6k rows and your data means 18k rows.

On a new row (and column) use “match” function to get the row number from their datasheet (also known as index of the row)

=MATCH([@[Id]],theirData!B:B,0)

B column is the ID column from their datasheet.

After you get the row number in here, the next step is to get the email from their datasheet here. To do this you can use the “index” function.

=INDEX(theirData!C:C,P5264)

As you can imagine, the C column for their data is the email column and P5264 is the cell with the result of the match (current match, aka, we are “processing” row number 5264 from our list).

Now since we have both our email and their email and we created a nice table in the beginning we can do a simple equal to check if they are equal or not.

=[@[Email]]=[@[TheirEmail]]

To make it more clear, “TheirEmail” is the result column of the index function.

The last thing since we have such a big table is to hide the columns that we don’t need and keep only the columns asked by the business department for the comparison

TLDR for the used functions

=MATCH([@[Id]],theirData!B:B,0) means give me the row number (index) from theirData sheet, where the value of the current row number of my Id (from my sheet) column equals to the value from theirData sheet, and search in that sheet on the B column.

=INDEX(theirData!C:C,P5264) means give me the value from theirData sheet from column C where the row number is the value from my current sheet from column P, row number 5264.

Conclusion

Other ways to do this might be possible. Probably it would have been faster to write some code to do this, but what I wanted was to learn something new, remember how the Cosmos Data Migration Tool worked, and learn how different processing can be done with Excel and different sheets.

If you have other solutions please share.

--

--

Cosmin Vladutu

Software Engineer | Azure & .NET Full Stack Developer | Leader