In this post we’ll start the process of porting a simple command line C++ application to the Nokia Qt framework. Well take a quick look at the code in question, then talk about how we’ll port it.
The application in question is a simple tool for cleaning up the export of a Print Publishing CMS system for eventual use in a Web-based CMS. By simple I mean we open a folder and loop through it, diving into sub-folders as needed to:
- Remove all images tagged with a _bw_
- Open XML ’story’ files and remove extra line breaks from headline elements, and clean up content HTML.
The application started as an Apple Xcode project, and has now moved into NetBeans 6.8 on Linux. In the end I hope to learn a good deal about the Qt framework, as well as proper C++ application development.
How and why are we doing this?
Command line apps are only so useful, so the next natural step was to port it to a GUI. I had not done this before, so it would be an education experience. Perhpas you find yourself in a similar situation, stuck with a command line tool when a nice GUI would be preferred. Of course doing so can be intimidating at first, as in most cases the libraries needed to accomplish this task can be quite daunting.
To that end I we’ll use Qt (read: cute) because it’s open source for non-commercial apps, is incredibly powerful, cross-platform, easy to learn, and well documented.
The Code
So what have we got to work with? I did a fair bit of experimentation during the first pass, what we see here is the sanitized version with most comments and junk taken out. That said, it should work on *NIX based distributions, Windows is iffy, and not tested.
-
/**
-
* Clean Up Prestige Export Files
-
*
-
* Removes all _bw_ images
-
* TODO: Clean up image names (remove non-acceptable characters)
-
* Removes all <br/> and \r \n from headlines
-
* Transforms body <br/> items into valid p tags
-
*
-
* @author Matthew Grdinic
-
* @organization Journal Interactive
-
* @date 5/19/10
-
* @version 0.2.0008
-
*
-
* TODO:
-
*
-
*/
-
-
#include <iostream>
-
#include <string>
-
#include <dirent.h>
-
#include <stdio.h>
-
#include <fstream>
-
#include <iterator>
-
-
#include <sys/types.h>
-
#include <sys/stat.h>
-
#include <unistd.h>
-
-
#define DEBUG false
-
-
using namespace std;
-
-
int main (int argc, char * const argv[]) {
-
-
if(argc != 2)
-
{
-
fprintf(stderr,"Usage: %s [base xmlout dirname]\n",argv[0]);
-
return(1);
-
}
-
-
const string br = "<br/>";
-
const string rl = "\r";
-
const string nl = "\n";
-
const string rn = "\r\n";
-
-
const string openp = "<p>";
-
const string closep = "</p>";
-
const string doublebr = "<br/><br/>";
-
-
-
std::cout << endl << "======================";
-
std::cout << endl << "WELCOME TO THE CLEANER";
-
std::cout << endl << "======================";
-
-
// path data
-
string basePath = argv[1];
-
string imagePath = basePath + "/images";
-
string storyPath = basePath + "/stories";
-
-
int counter = 0;
-
-
-
// images
-
std::cout << endl << "Starting image clean…\n";
-
-
DIR *d;
-
struct dirent *dir;
-
string name;
-
string location;
-
string path;
-
string replaceString;
-
int p;
-
d = opendir(imagePath.c_str());
-
-
if(d){
-
-
int found;
-
-
while((dir = readdir(d)) != NULL){
-
name = dir->d_name;
-
found = name.rfind("_bw_");
-
if(found != -1){
-
path = imagePath;
-
location = imagePath + "/" + name;
-
cout << location << endl;
-
int t = remove(location.c_str());
-
if(t != -1){
-
counter++;
-
}
-
}
-
}
-
-
cout << counter << " image file(s) were removed." << endl << endl;
-
-
}
-
-
// xml files
-
std::cout << "Starting file clean…\n";
-
-
string fileData;
-
-
d = opendir(storyPath.c_str());
-
-
if(d){
-
-
// low memory, no parallelization
-
while((dir = readdir(d)) != NULL){
-
name = dir->d_name;
-
location = storyPath + "/" + name;
-
-
if(name != "." && name != ".." && name != ".DS_Store"){
-
-
// ==> buffer
-
-
// alternative version of buffer –
-
string fileData2;
-
string buff;
-
ifstream file;
-
file.open(location.c_str());
-
-
while(getline(file, fileData2)){
-
buff += fileData2 + "\r\n";
-
}
-
fileData = buff;
-
-
-
// == remove extra headline data
-
int hl1Start = fileData.find("<hl1>");
-
int hl1End = fileData.find("</hl1>");
-
-
if(hl1Start != -1 && hl1End != -1){
-
-
replaceString = fileData.substr(hl1Start, (hl1End - hl1Start));
-
-
-
p = replaceString.find(br);
-
while(p != -1){
-
replaceString.erase(p, 5);
-
p = replaceString.find(br);
-
}
-
-
p = replaceString.find(rl);
-
while(p != -1){
-
replaceString.erase(p, 2);
-
p = replaceString.find(rl);
-
}
-
-
p = replaceString.find(nl);
-
while(p != -1){
-
replaceString.erase(p, 2);
-
p = replaceString.find(nl);
-
}
-
-
// join data
-
fileData.replace(hl1Start, (hl1End - hl1Start), replaceString);
-
-
} // hl1 exists
-
-
int bodyStart = fileData.find("<body.content>");
-
int bodyEnd = fileData.find("</body.content>");
-
-
int innerp;
-
-
if(bodyStart != -1 && bodyEnd != -1){
-
-
replaceString = fileData.substr(bodyStart, (bodyEnd - bodyStart));
-
-
p = replaceString.find(doublebr);
-
counter = 0;
-
int matchingP = 0;
-
while(p != -1){
-
-
replaceString.replace(p, doublebr.size(), openp);
-
matchingP = 1;
-
-
// next \r\n [Mac Uses nl, Windows rn]
-
innerp = replaceString.find(rn, p);
-
-
if(innerp != -1){
-
replaceString.replace(innerp, rn.size(), closep);
-
matchingP = 0;
-
}
-
-
p = replaceString.find(doublebr, p);
-
-
if(matchingP == 1){
-
#if DEBUG == true
-
cout << "Hanging p" << endl;
-
#endif
-
replaceString.append(closep);
-
}
-
-
counter++;
-
}
-
-
// join data
-
fileData.replace(bodyStart, (bodyEnd - bodyStart), replaceString);
-
-
} // bodyEnd exists
-
-
-
-
// save file
-
#if DEBUG == true
-
ofstream outfile((location + "-debug.xml").c_str(), ofstream::binary);
-
#else
-
ofstream outfile((location).c_str(), ofstream::binary);
-
#endif
-
-
outfile.write(fileData.c_str(), fileData.size());
-
-
}
-
-
} // while()
-
-
}
-
-
-
return 0;
-
}
The Current Program
The program is broken into several logical sections: The first part of the program takes the user supplied command line argument of a folder and creates a simple variable: basePath. This path is then used to build the other locations of our sub-folder structure. Here we also define a few text constants used in the XML file cleanup step.
With a valid path, we loop though the images folder and delete any files with a _bw_ in the name. The _bw_ is for Black and White, and is a known quantity in this work-flow.
I should add at this point the folder structure we are traversing looks like:
xmlout
–images/
—-/image_files.jog
–null/
–stories/
—-/story.xml
—-/story.xml
The next step is slightly longer, here we need to clean up XML files. The process really isn’t important for the porting process, as we’ll probably end up with very similar code in the end anyway. The long and short of it though is at several places in the XML output are unclosed <p> tags, extra line breaks and formatting, and other general messiness that cause an XML import of these files to fail when going to the ‘target’ system.
The one thing I do want to pay attention to is the file read process. It’s in this area that we’ll probably find the most diversion from the command line app. In particular, I want to see how Qt makes file reading easier, and also, I want to explore parallelization at this step. I can envision reading in a batch of files, say, 50 at a time, and then spawning new threads to handle the clean up tasks.
Of course we then write the cleaned up files back to the system, something that again, could probably be done in concert with a parallel processing model.
More generally though, with a port I want to explore a more object oriented way of performing these steps, as well as adding more error and sanity checks.
Coming In Part 2
Stay tuned for part 2 where we’ll take a look at the ported code I’ve created thus far. We’ll talk about general Qt development, and what goals I have the project in general.
Code Download
A note of caution: This code has not been tested, and may not work on Windows. Of course this is one area a proper Qt port will help with!