Python sctipt for MAXQDA timestamps

My dissertation is focused on the sense-making practices of college-students when they examine representations of their achievement. In order to get a handle of this phenomenon, I opted for a multiple-method approach—so I’ve been collecting both quantitative data, as well as qualitative data. As of now I have a couple of dozen interviews under my belt, and have both the audio and transcripts of those interviews to work with.

To analyze the qualitative data I’m using MAXQDA. One of the main reasons I chose it is because of its ability to sync transcripts to the original audio. Here’s an example:

Screenshot 2015-03-31 16.34.05

The image above is a screenshot of what this looks like. I can read what speaker 1 (S1) said, and click on the little clock icon to hear the audio. To me this feature was a must-have because it enables me to contextualize the text quickly and easily.

There’s a catch, however. (There’s always a catch!)

For this to work the timestamps in the transcript text files have to follow the (standard?) hh:mm:ss-ms format, and have “#” on either side. So a comment made 6 minutes into an interview would read #00:06:00-0#. Much to my dismay when I opened my transcripts I noticed that the timestamps were simply in mm:ss, so the 6 minutes read 06:00.

Needless to say, I wasn’t about to edit each transcript by hand. Doing so could lead to errors, and it would be time consuming (even if I used find and replace). Instead I opted to write a script in python to do all of the replacements for me. It took a bit of doing, but I was able to get the script to work. A special shout out goes to Jeff Stern and Adam Levick who helped me figure the script out!

If this is something you’re interested in trying yourself, read on!

Prep: What you’ll need

  1. A Mac (sorry PCs, I’ sure it works in roughly the same way but this is a mac-centric blog post)
  2. TextWrangler (or another text editor)
  3. MAXQDA (or other qualitative software that can link text to audio via a specific timestamp format)
  4. Transcripts, saved as test (.txt) files
  5. Patience

Step 1: Tell python which files it will be working with

First things first. Open your text editor and add the following lines of code, which simply tell python which file to open (“infile”), and which new file to save (“outfile”):

infile = open('my_raw_text.txt')
outfile = open('(MAXQDA) my_raw_text.txt', 'w')

Note that I didn’t specify a file path. This only works because I saved the python script in the same place that I saved my transcripts. I found this to be the easiest solution, but if you like keeping your transcripts in one folder and python scripts in another, then make sure to specify the file path in the code above.

Step 2: Specify the rules python will use for find and replace

Next you’ll need to define what it is python will be doing. To do that you’ll need to know all of your replacement rules. In my case this meant 1) fixing the beginning of the timestamp (adding # and hh), and 2) fixing the end of the timestamp (adding ms and #). You’ll notice that I do this second step twice, once for each speaker. (In retrospect this may be redundant, but it worked so I’m not complaining!)

I chose to name the rules “replacements.” This tells python: when you see the word ‘replacements,’ it’s referring to this set of relationships. For example:

00: will become #00:00:
02: will become #00:01:
03: will become #00:03:

…etc


replacements = [('00:','#00:00:'), ('01:','#00:01:'), ('02:','#00:02:'),

				('1 S1','1-0# S1'), ('2 S1','2-0# S1'), ('3 S1','3-0# S1'),

				('1 S2','1-0# S2'), ('2 S2','2-0# S2'), ('3 S2','3-0# S2'),
				('0 S2','0-0# S2')]  

I just pasted a few of the rules. Scroll to the end to see the full script with all of them.

Step 3: Tell python what to do with the relationship you just defined

Here’s where the magic happens. The final step is to tell python what do actually do. Line 1 tells python where to look (your file). Line 2 tells python to look for t1 and t2 in replacements (which you defined in Step 2).  Line 3 defines what will happen (replace t1 with t2), and line 4 tells python to write everything in your new file outfile, which was defined in Step 1.

for line in infile:
    for t1, t2 in replacements:
   		line = line.replace(t1, t2)
    outfile.write(line)

Step 4: Tell python to close your source file, and your new file

Now that all the work has been done you simply tell python to close the two files it used.

infile.close()
outfile.close()

Step 5: Run the script!

You’re ready to go. If you’re using TextWrangler you can save the file with the extension .py so that TextWrangler knows that you’ve written a script. (I would recommend doing so early so that your syntax gets highlighted.) Once you’ve saved your file you can select run from the last drop down menu:

Screenshot 2015-03-31 17.50.20

 

Note that if you’re successful it’ll seem like nothing has happened. Rest assured something has! Just go to the folder where your python and transcript files are, and you should see a new text file with the name you used for “outfile.”

Summary: The full script

That’s it! Using python for simple—but tedious—find and replace tasks isn’t that painful after all. Here’s the entire script, including all of the find and replace relationships. Note that I only went up to 59 minutes. If your interviews are longer than that you’ll need to add the extra times into the replacements part of the code.


infile = open('my_raw_text.txt')
outfile = open('(MAXQDA) my_raw_text.txt', 'w')

replacements = [('00:','#00:00:'), ('01:','#00:01:'), ('02:','#00:02:'),
			    ('03:','#00:03:'), ('04:','#00:04:'), ('05:','#00:05:'), 
			    ('06:','#00:06:'), ('07:','#00:07:'), ('08:','#00:08:'),
			    ('09:','#00:09:'), ('10:','#00:10:'), ('11:','#00:11:'), 
			    ('12:','#00:12:'), ('13:','#00:13:'), ('14:','#00:14:'),
			    ('15:','#00:15:'), ('16:','#00:16:'), ('17:','#00:17:'),
			    ('18:','#00:18:'), ('19:','#00:19:'), ('20:','#00:20:'),
			    ('21:','#00:21:'), ('22:','#00:22:'), ('23:','#00:23:'), 
				('24:','#00:24:'), ('25:','#00:25:'), ('26:','#00:26:'),
				('27:','#00:27:'), ('28:','#00:28:'), ('29:','#00:29:'),
				('30:','#00:30:'), ('31:','#00:31:'), ('32:','#00:32:'),
				('33:','#00:33:'), ('34:','#00:34:'), ('35:','#00:35:'), 
				('36:','#00:36:'), ('37:','#00:37:'), ('38:','#00:38:'),
				('39:','#00:39:'), ('40:','#00:40:'), ('41:','#00:41:'),
				('42:','#00:42:'), ('43:','#00:43:'), ('44:','#00:44:'),
				('45:','#00:45:'), ('46:','#00:46:'), ('47:','#00:47:'), 
				('48:','#00:48:'), ('49:','#00:49:'), ('50:','#00:50:'),
				('51:','#00:51:'), ('52:','#00:52:'), ('53:','#00:53:'),
				('54:','#00:54:'), ('55:','#00:55:'), ('56:','#00:56:'),
				('57:','#00:57:'), ('58:','#00:58:'), ('59:','#00:59:'),
				
				('1 S1','1-0# S1'), ('2 S1','2-0# S1'), ('3 S1','3-0# S1'),
				('4 S1','4-0# S1'), ('5 S1','5-0# S1'), ('6 S1','6-0# S1'),
				('7 S1','7-0# S1'), ('8 S1','8-0# S1'), ('9 S1','9-0# S1'),
				('0 S1','0-0# S1'),

				('1 S2','1-0# S2'), ('2 S2','2-0# S2'), ('3 S2','3-0# S2'),
				('4 S2','4-0# S2'), ('5 S2','5-0# S2'), ('6 S2','6-0# S2'),
				('7 S2','7-0# S2'), ('8 S2','8-0# S2'),	('9 S2','9-0# S2'),
				('0 S2','0-0# S2')]  
			
for line in infile:
    for t1, t2 in replacements:
   		line = line.replace(t1, t2)
    outfile.write(line)       
infile.close()
outfile.close()