Saturday, May 13th, 2017

darkoshi: (Default)
About 6 years ago at work, I set up an online group on one of the corporate websites intended for that purpose. It was a group for our developers to share information and post questions & answers. Many coworkers joined the group, and it got a good bit of use in the first years (about 180 posts/threads). But the activity eventually lessened, and the last post was 2 years ago (for various reasons, I suppose).

Recently all group owners were notified that the website was being shut down soon, in favor of some other new site on different technology. We were told that if we wanted to save our group's content, no tools were being provided for doing so, but that we could copy and paste the content into Word documents.

I harrumphed at the thought. Opening each and every post, and copying/pasting it into a Word doc? You've got to be kidding. As the group hadn't even been used in 2 years, and much of the info there was no longer pertinent, there didn't seem much point in trying to save the content.

But yesterday I took some screenshots of the pages which listed the post titles, for memory's sake, or nostalgia, or because maybe that could somehow be useful.

Today a coworker emailed me a question. It reminded me of one of those posts, which explained how to find the foreign key relationships of a table in SQL Explorer. So I went back and read that post. It helped me answer the question.

Then I wondered if I could find an easier way to save the group data after all. I discovered that each thread had an option for saving to a PDF file - and to get that PDF, you only had to append ".pdf" to the URL of the thread's page.

If I could get a list of all those URLs, then I could save off the PDFs. Scrolling through the posts, 20 titles & URLs are shown per page. So I saved off about 10 HTML pages like that. Then I used File Locator Pro (an awesome tool; I highly recommend it) to parse out the URLs along with the titles. I used a reg-ex search query, and saved off the matches, using this method: export just the content found by a regex expression.

Then I determined how to save off the PDFs from the URLs. After logging into the website in my browser, entering the command "start firefox [URL]" in a command window would open the URL in a new tab of the browser. So I divided the URLs into groups of 10, and used a batch file to open the URLs, ten at a time. (I didn't want to do all 180 at once, as I had a feeling that would either crash the browser and/or get me into some kind of trouble, as in who's this person fetching a zillion pages from our webserver all at once?).

Then I used a Firefox plugin, Mozilla Archive Format, to save all open tabs to a MAFF file. A MAFF file is a zip file containing a folder for each tab. Each folder has an index.html (or in my case index.pdf) file, along with a RDF file which has metadata including the page's original filename.

So, once I had saved off MAFF files for all the URLs (about 18 MAFF files), I unzipped them all, extracted the PDFs, used another batch file to rename them back to the original numeric filenames (which puts the posts in order by date), and to include the post titles as part of the filenames.

For creating the batch files, I use Notepad++'s column editing to edit a bunch of lines at once, and macros to apply the same changes to each line.

And voila, I now have the group's entire content exported as PDF files which can be browsed or searched. And it only took me a few hours to do, most of which was figuring out how to do it as opposed to actually doing it.

I'm not sure what I'm going to do with the files now, but at least I have them.

Figuring out how to do things like that makes me feel clever.

google weirdness

Saturday, May 13th, 2017 02:25 am
darkoshi: (Default)
This search query ("I'm the fire of your") returns 5 results:

This search query ("I'm the fire of your desire") returns 10 results:

*and* the 2nd one says at the bottom, "In order to show you the most relevant results, we have omitted some entries very similar to the 10 already displayed. If you like, you can repeat the search with the omitted results included."

It doesn't make sense. Yeah, I got the lyrics wrong, but still, it doesn't make sense that adding more words to an exact-match query (within quotes) makes it return more results.

Hmmm. Google must automatically filter out pages with the word "desire" unless you specifically use it in your query. Or maybe it filters it when you put it together with another word like fire. Maybe if I were logged into a Google account when doing my queries, and had my settings set to not filter anything, it would act differently.
darkoshi: (Default)
The day before yesterday, there was a tickle at the bottom of my leg, and I found a flea. I grabbed it, and unsure of what to do, held it in a stream of water at the bathroom faucet for a long time. Then I let the water wash it down the drain, and ran the water for a while longer. I was afraid it might still be alive, and might jump back out.

In the past when we had cats, I remembered submerging any caught fleas into a cup of water mixed with dishwashing liquid to drown them.

Yesterday evening, it happened again, in the same bathroom. I don't remember ever finding fleas in the house before*, so I believe it was the same flea. This time, I took it to the kitchen, dumped some water and dishwashing liquid into a plastic container, and held it submerged in that for a while before letting go.

(Feeling the flea struggling in my grasp. Trying to drown the poor thing a second time, after it survived the first traumatic attempt. I'm such a bad person. Life is so cruel. I don't like killing, but fleas are one of the few things that I long ago decided should always be killed, because of the severe misery they can inflict on other beings, and because there's no way to peacefully coexist with them.)

This morning, the flea was motionless at the bottom of the container. But now I don't trust dumping it down the drain again. And I don't want to dump it in the yard. So I considered dumping it outside the fence, to be on the safe side.

But first... how long does it take a flea to drown? The answers given on this page are rather scary:

Can dish soap really be used to kill ticks and fleas?

Now I've decided to leave the flea soaking for at least another day.

*Our dogs have been at Qiao's house the past week rather than here. But I've been going over to feed the neighbor's dogs this last week, while they were away on a trip. One of their dogs had a bad flea problem in the past from what they told me, so I think the flea must have jumped on me while I was in their yard. Although I didn't notice their dogs scratching much while I was there.