This fast tip demonstrates how to use the Excel Power Query Modulo feature to clean up a report and make it more user friendly.
This fast tip demonstrates how to use the Excel Power Query Modulo feature to clean up a report and make it more user friendly.
4 Comments
Wow, that is amazing functionality. I’d go so far as to say this is massive. Although apparently Gen Y would call this hectic.
What is really interesting is that the query editor appears to be very similar to how you get data into and massage in PowerBI.
Another use would be say to do with metadata – for example generating metadata from PDFs (via the commandline supertooltik – XPDF and pdfinfo), this will typically generate 5-10 pieces of metadata for each PDF, and you then have to massage it to get it into a workable format. I could see that using Power Query could streamline this.
Massive / Hectic
could you help me with installing pdfinfo on a windows machine? I know it’s off-topic from the Excel Power Query, but we recently were discussing how to extract metadata from pdfs. I’m baffled with the installation process.
Ok sure:
My view is that there are two awesome and open source PDF tools for this kind of task being:
PDFTK – https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ – looking at this now, it appears that its a GUI product, we have an old version that is commandline, however I won’t branch out into using PDFTK here – suffice to say that it is also a very useful and powerful tool.
XPDF – http://www.foolabs.com/xpdf/
Download the relevant (Windows) zip files from: http://www.foolabs.com/xpdf/download.html, and navigate to where you downloaded the zip file – on my Windows10 machine the default download path is C:usersMatthewDownloads
Opening the zip file, I see that it contains a folder called xpdfbin-win-3.04.
Assuming you have full write access on your computer, what I like to do is to have a folder under C:Temp called tools ie the path is C:TempTools, I would then create a new subfolder of tools called XPDF. To keep it simple, lets instead assume that you make a folder called XPDF under C: ie C:XPDF
Open the zip file and extract the contents and navigate to C:XPDF. Upon completion open C:XPDF and you’ll see a subfolder called xpdfbin-win-3.04. That’s too messy for me, so go into that subfolder, select all and cut. Return back to C:XPDF and paste.
Now C:XPDFcontains 10 items being:
bin32
bin64
doc
xpdfbin-win-3.04
as well as 6 files without file extensions.
the ones that are important to us are bin32 bin64 and doc. If you are running 32bit Windows then you want bin32, otherwise bin64. Although my machine is 64bit Windows, I’m going to use bin32 (it should work fine).
Go into bin32 folder and you’ll see 9 files that start with pdf and have file extensions of .exe. Each of these files can do things [Pro tip – pdftotext is excellent, as is pdfinfo, we don’t give the others as much love and attention as we probably should].
These are all commandline tools and so are inert in Windows GUI world. That’s ok.
Go back up one directory and then go into doc (ie C:XPDFdoc) and you’ll see that there are 12 files, of which 11 are text files. These are the help / usage / documentation files.
Go ahead and open up pdfinfo.txt. Amongst lots and lots of [at first glance probably cryptic] instructions, what it is simply saying is that the usage of this is:
pdfinfo [options] [PDF-file]
To keep it simple what it is saying is you need to call upon pdfinfo from the commandline, then you tell it a PDF filename. Easy peasey.
Ok, go back to Windows Explorer and C:XPDF, and create a sub-directory called A. ie C:XPDFA. Go and grab a simple PDF that isn’t encyrpted. Lets pretend that your PDF file is called B.pdf, and copy and paste B.pdf into C:XPDFA ie the A folder contains a single file B.pdf.
Go up one directory to C:XPDF and click in the address bar and type CMD.exe.
you’ll see that the command prompt launches and is patiently sitting at the location of C:XPDF.
Now its going to get a little bit complex. What we want to do is to call up pdfinfo.exe and get it to look at the file B.pdf. The conundrum is that pdfinfo.exe is in a directory bin32pdfinfo.exe and B.pdf is in a directory AB.pdf.
So, our usage is:
bin32pdfinfo.exe AB.pdf
I then get returned the metadata on-screen from B.pdf. This is nice, however I need it in a text file. So change the syntax above to:
bin32pdfinfo.exe AB.pdf >B-metadata.txt
Run that and if you have a look at C:XPDF you’ll see a new file called B-metadata.txt. Open it up and you can see that it has the metadata.
This is all well and good, however its quite tricky to do this for large volumes of files. Not impossible just tricky.
Note – if you weren’t able to launch cmd.exe from Windows explorer, launch it from the start menu – when I do this it goes to C:usersmatthew, what you then want is to type CD….XPDF which is saying change directory, go up one level, go up another level then go into XPDF.
To take it to the next level, you can write a script to do all this so that you don’t need to do it manually. Roughly the steps would be:
create a directory listing of the PDF files, ideally they should all be in the same directory. If the file names or directories have spaces in their names ie docs for affidavitabc 001.pdf and def 002.pdf, then you need to wrap the directory and file names in double quotes.
Hope this helps.
Phew – that was hectic.
Thank you so much for your help. This is incredibly nice of you!