Convert PowerPoint Speaker Notes to Audio Using Amazon Polly

This week I had a chance to work on an interesting request from another department in our company. They wanted to generate voiceover audio for a number of training PowerPoint decks so that they could make them into training videos. The script/content for this audio was embedded in the speaker notes of the slide deck.

Since I have been prototyping with Amazon Polly for several other solutions, I suggesting making use of Polly rather than having someone record their voice.

While this could easily have been done by getting someone (not me!) to copy and paste the notes into the Amazon Polly console and manually download the audio, I wanted to create something a little less manual.

I decided to write a simple script in Python to perform this task. There are three main activities for each file:

Extract the speaker notes for each slide.
Insert some tags into the text to add some pauses during the audio rendering.
Render each slide’s notes to audio via Amazon Polly and save to disk.

Extract speaker notes from slides

To get the speaker notes out of the .PPTX file, I used a library called python-pptx. The the python-pptx documentation for instructions for how to install and use the library.

The function below takes a string containing the path to the PowerPoint file to be processed, and returns a array of tuples containing (slide number, speaker notes text).

from pptx import Presentation

def getNotes(file):
    # Use the Presentation() function to 
    # create a Presentation object for the
    # specified PPTX file.
    ppt=Presentation(file)

    notes = []

    # Iterate over the slides in the presentation
    for page, slide in enumerate(ppt.slides):
        # Extract the speaker notes for the given slide
        textNote = slide.notes_slide.notes_text_frame.text
        
        # Add some SSML tags to the text
        textNote = addTags(textNote)

        notes.append((page,textNote)) 
        
    return notes

Add some tags for pauses

In order to reduce some of the manual clean up required to produce “nice” audio, I found I had to add some SSML tags to the extracted notes (the SSML tags supported by Amazon Polly are listed here).

The tags I am adding are:

<speak>: outermost element around SSML content
<prosody>: used (in this case) to control the “speed” of the rendered audio
<break/>: inserts a short pause in the rendered audio. I insert these after commas to make the rendered audio sound a little more natural.
<s/>: inserts a slightly longer pause after a period.

I also do a little bit of cleanup here to get rid of some “ugly” characters that came out of the PowerPoint notes (like “\x0b”)

Here is the function which accomplishes this (it is very basic strong manipulation):

def addTags(textNote):
    # Add <speak> tag and speed control
    textNote = '<speak><prosody rate="medium">' + textNote + '</prosody></speak>'
    # Get rid of some character codes
    textNote = textNote.replace("\x0b", "")
    # Replace "EYE" with "E.Y.E."
    textNote = textNote.replace("EYE", "E.Y.E.")
    # Replace "\n" with "<s/>"
    textNote = textNote.replace("\n", "<s/>") 
    # Add pauses after commas
    textNote = textNote.replace(",", ",<break/>")
    # Add pauses after colons
    textNote = textNote.replace(":", ":<break/>")
    
    return textNote

Render audio using Amazon Polly

To access the Amazon Polly service from Python, I use the boto3 library. Follow that link to see how to install boto3 as well as the basic instruction for using it to access your AWS account.

First you create a client to access the AWS service (in this case Polly). From there you use the synthesize_speech method of the client to render the text to audio. There are a number of parameters you can pass in to control the rendering, including:

the speech engine to use (standard or neural)
the language
the voice to use (Amazon Polly supports a large number fo different voices)
the desired output format
the text to be rendered.

See the boto3 documentation and the Amazon Polly documentation to see the available parameters and the supported input values.

The response object from the synthesize_speech method contains an audio stream, which read from and then write out to disk.

import boto3

def renderAudio(file_root, slide_number, input_text):
    # Instantiate a boto3 client for Amazon Polly
    client = boto3.Session(aws_access_key_id='********************',
                           aws_secret_access_key='****************************************',
                           region_name='ca-central-1').client('polly')
    
    # Render the text to audio
    response = client.synthesize_speech(
        Engine = 'neural',
        LanguageCode = 'en-US',
        TextType = 'ssml',
        VoiceId='Matthew', 
        OutputFormat='mp3', 
        Text = input_text)
    
    # Save it to disk
    file = open(file_root + '_slide_' + str(slide_number) + '_audio.mp3', 'wb')
    file.write(response['AudioStream'].read())
    file.close()

Putting it all together

That’s about is. Given these methods all you have to do now is something like this (note that this will try to write the output audio to the same folder where it finds your PPTX file):

input_file = <path to your PPTX file>
 
file_root = input_file[:-5]  
notes = getNotes(input_file)
print("notes from " + file_root + "\n")
for note in notes:
    renderAudio(file_root, note[0]+1, note[1])

NOTE: I have not added any validation or error handling to this code, so copy/paste at your own risk!

Be thankful you cannot understand their pain

This may not be the most elegant or coherent thing I have ever written, but it is 4 in the morning, and I cannot sleep because this bothers me so much.

Over the past week, we have all heard much about the graves of some 215 children found at the site of the Kamloops Indian Residential School. While I do not think this comes as a surprise to anyone who has been paying any attention, I think that facing the reality of this tragedy and knowing that it is likely just the tip of the iceberg should be a source of immense pain and outrage not just for Indigenous communities, but for each and every one of us.

I would like to be able to say that I understand or even imagine what survivors and affected communities are feeling but in truth I cannot. I cannot even begin to comprehend.

I have a five year-old granddaughter, with whom I have been very close. Unfortunately, for reasons I will not get into, I have not been allowed to see her for the past 6 months (and do not honestly know if I will ever see her again) and this has been extremely difficult for me.

But I know where she is. I know she is safe. I know she is taken care of. And I know she is with people who love her.

I look at her face, and I cannot imagine knowing that she has been taken away. Knowing she is alone and afraid. Not knowing where she is, or who is caring for her, or even if they are caring for her. Not knowing when or even if I will ever see her again. Knowing or suspecting that she is being abused. Knowing that her very identity is being stripped from her.

Every one of these children was someone’s child, someone’s grandchild, and some community’s future. Every single one, and thousands more. This is not abstract. This is real, and it is horrendous.

This breaks my heart. From what I have read and heard in the media it breaks everyone’s hearts.

But that is not enough – not by a long shot. Where is outrage?

I applaud the Indigenous community’s focus on Truth and Reconciliation, and greatly respect their strength and wisdom in following that path.

But for the rest of us, where is the outrage ay the things done in our name? Where is the absolute outrage that our government, the Government of Canada, not just allowed this to happen but actively participated? That the government elected by Canadians, that represents Canadians, was complicit in these atrocities?

We as a species and as a society can and must be better than this!

I would like to end with 3 calls to action:

Listen. Listen mindfully to the stories of survivors, and to the communities. A few minutes of mindful listening can contribute greatly to understanding and healing.
I ask that everyone who reads this take the time today to look at the faces of your children, of your grandchildren, and be damn grateful that you cannot comprehend the pain of these children, these parents, these grandparents, and these communities.
And I ask, how are we and our government(s) going to make this right?

Data doesn’t lie, does it?

“There are three kinds of lies: lies, damned lies, and statistics.”

An interesting article I saw recently (well, actually I listened to podcast about it, then looked it up) about how people feel about different data visualizations.

There were a couple of outcomes that I found very interesting. First is that people are much more engaged with visualizations which speak to them in a direct in personal way (I know, not surprising, but still interesting). Whether it is through demographics, geography, or the story the visualization tells, people are drawn to visualizations that feel specific to them. One important implication of this is that the common practice of starting from an overview of the data and allowing the user to drill in to what is most relevant to them does not engage as well as a visualization which is personal to them (but then allows them to further explore the data if they want).

A second, more surprising (and a little terrifying) result was found when the sources of the visualizations were revealed (initially the sources were hidden to reduce bias). When the sources were revealed and the subjects were given the opportunity to change their rankings, the majority chose not to. Superficially, this seems like a great result – the value of a visualization is perceived as being independent of its origin.

Unfortunately, the reason the subjects gave for not changing their rankings was more surprising:

“We found that many people suggested that information has an objective quality that is immutable regardless of where that data may be showcased…”

So the majority of the subjects indicated essentially that “numbers don’t lie,” a disturbing conclusion to say the least. This obviously discounts the fact that a visualization is typically tailored to show exactly what its creator wants it to, and will often (intentionally or not) misrepresent the data in some way.

I recommend reading the full blog post, and/or listening to the podcast, in case I misrepresented anything (unintentionally, of course!)

View at Medium.com

What Gets Me Out of Bed in the Morning?

n-CAT-IN-BED-628x314 — (obligatory cat picture)

Late last fall in a meeting with our CEO, she asked me what should have been a very simple question: “What gets you out of bed and into work in the morning?” Now, at the time I had been sick for several days (maybe weeks, I can’t remember) and my mind really wasn’t in a great place, and I didn’t have a good answer for her.

I didn’t really even have a bad answer.

Over Christmas, I became even more ill. By the new year, a major tragedy hit my family, something far worse than I ever imagined having to face. I then ended up needing surgery. All this to say, I was not feeling much better about how to answer this question.

Now, I am sure we have all been there at various times in our careers, times where we just weren’t quite sure why we work so hard. I have hit that point several times. Sometimes it is a signal that it is time to change (like when I left physics to work in the “real world”). Other times, it just means you have to remember the deeper purpose behind what you do.

As I often do when life threatens to become too much for me, I fall back on meditation as a way to cope. And as usually happens, through meditation I begin to see connections and patterns in my life.

Over the last month or so, I have been giving this question of purpose and motivation a lot of thought. In my career I have had the opportunity to work on a lot of interesting and just plain cool things, from astrophysics, to satellite operations and astrodynamics, to major military projects, to enterprise software start-ups. I am now at a point in my career, however, where working on things that matter is very important to me, more so than what is cool or just interesting.

So why does what I am doing now matter?

In conjunction with all of this introspection, at work the executive team (including me) was just finishing up on an extensive exercise to define (or at least articulate) the Vision, Mission, and Values of The Learning Bar. Very much the same “why are we here?” question I have been trying to answer personally.

As you will see if you follow the above link, our vision, mission, and values are all about helping children, specifically “giving all children the opportunity to thrive”. We do this through our values of inclusion, innovation, trustworthiness, social engagement, and leadership. We as an executive team worked very hard (and occasionally argued passionately!) to agree on these words, as these words represent who we are and why we are here.

As I meditated on life in general, it became more and more clear to me why I do what I do. It is so easy to get lost in the day-to-day details of your job, and lose site of the why. And whether you are an individual or an organization, it all has to start with why (yes, I know that is someone else’s phrase). When things are difficult, it becomes even more important to remember why.

In addition to being on the executive team and contributing in some small way to TLB’s strategy, as CTO I am of course very involved in the company’s technology (hence the title!). The problem with being on the technology side of a company like ours, is that it is easy feel somewhat removed from our end users, and even more so from the children those end users are helping. But it is important to remember the connection between what we do, and the children who are helped.

And this led me to my answer to the original question: “What gets you out of bed and in to work in the morning?” It was not until late last week (actually, driving home from Fredericton on Friday) that I was able to clearly articulate the answer in my mind, though I think it had been sort of congealing for some time. And here it is:

Any day where I do even one thing, whether it is strategy, execution, technological decision, or a casual conversation with a co-worker, that helps even one child improve their education and their life, it was worth getting out of bed that morning. And I am pretty sure that is true almost every day.

Of course, this is just for work, I have other reasons to get out of bed – first and foremost my family, but also just the fun of learning new things. But this is what gets me to the office.

So, what gets you out of bed and into work everyday?

The most popular programming languages are rapidly changing

Interesting post on Quartz: The most popular programming languages are rapidly changing

While it is an interesting post, a number of questions came to mind on reading it:

While StackOverflow is indeed a dominant system for developers seeking and sharing information, I wonder if its demographics is really representative of the entire software/technology industry.
The dominance of JavaScript is hardly surprising, since if you want to do anything in the browser, you really have no choice (and choice is always a bad thing, right?).
Is SQL a “programming language”? I never thought so. You cannot “build a system” in SQL – SQL may be a major part of a large number of systems, but you need a programming language to make use of it.
NodeJS and AngularJS are not programming languages. They are frameworks for JavaScript. Making these two their own categories makes no sense, anymore than it would make sense to have separate categories for Python and Django and Tornado. It might make sense to have separate categories for server-side versus client-side Javascript, but not specific frameworks.
Merging together the JavaScript, NodeJS and AngularJS would give a more clear indication of the use of Javascript – rather than showing a decline in JavaScript usage over the last three years.

5 Steps to Faster Mobile Web App Development

New Brunswick start-up Agora Mobile has developed a revolutionary platform for the visual development of mobile web applications.

As we move closer to launch, we are beginning a private beta targeting developers (and other forward-thinking sorts). To kick off this beta, we are beginning a series of webinars which introduce the platform and concepts. The first webinar is this Thursday (June 26).

NFA on Gun Control: Bad Taste, Bad Timing, and Bad Logic

I actually wrote this on the evening of Thursday, June 5, 2014 after reading the press release by the National Firearms Association. However, I refrained from posting it, as I felt that the timing was in appropriate.

After reading this article, I felt that I could now post it.

I (and others, it seems) were not particularly impressed with the NFA’s decision to make a political statement regarding gun control at the height of the recent crisis in Moncton. Many felt that the press release issued by the NFA demonstrated tremendously bad taste, bad timing, and bad judgement.

However, we do have free speech in Canada (unless you are a government scientist), so the NFA is free to say what they want to on the subject.

Free speech is a good thing. I like free speech. Especially because it also permits me to point out how horrendously, absurdly bad is the logic of both the NFA’s statement and their associated position.

The fundamental argument by the NFA (beyond “laws interfere with our fun”) is that even with all or Canada’s gun control efforts, someone with a gun has killed three RCMP officers. Thus, all gun control laws should be abandoned. The basic shape of this argument is this:

We do X to prevent Y
Sometimes, in spite of doing X, Y still happens
Therefore, we should stop doing X because it is a waste of time

Lets try this argument in a few other situations, and see how it works…

We put locks on our doors, and install security systems in order to prevent our homes and business from being robbed. Sometimes, even with locks and security systems, we do get robbed. Therefore we should stop using locks and security systems.

Hmmmmm. That doesn’t seem quite right. Lets try another one…

We put in place traffic laws in order to prevent accidents and death. Sometimes, in spite of these laws, traffic accidents and deaths still occur. Therefore we should not bother with traffic laws.

Well, that doesn’t seem quite right either. How about one from personal health…

We eat healthy in order to prevent (among other things) heart disease. Sometimes, people who eat healthy still have heart attacks and die. Therefore, we should not bother eating healthy.

Still doesn’t sound right. Could it be that problem is that the structure of the argument is fundamentally flawed?

I had planned to go into the absurdity of the fact that people view gun ownership as some sort of “fundamental human right”, or the idea that the “right to bear arms” really means “the right to bear any kind of weapon (even those not invented yet) at any time in any situation without any rules or constraints”, or the silliness of believing that owners of dangerous weapons should be subject to lower licensing and registration requirements than car owners or ham radio operators.

Instead, I will just leave it at pointing out the bad timing, bad taste and bad logic of the NFA’s press release.

Some thoughts on Apple Swift and Mobile Programming

Check out my post entitled Apple Swift – A step in the right direction (or is it?) over in the Vizwik blog

(spoiler alert: I don’t hate it, I just think it solves the wrong problem!)

6 Technologies From My First Job

I was sitting around on New Year’s Eve playing Zork, and I got to reminiscing about technologies I have used which either no longer exist or have passed into no usage. Thinking back to my first summer job where I actually got paid to program (actually, I was paid to do physics, but programming was a big part of it), here are six tools I used…

KIM-1

First, we used a KIM-1 microcomputer. This 6502-powered beast had a whole 1024 bytes of memory, and no persistent storage. We used this to control a Perturbed Angular Correlation Gamma Ray Spectroscopy experiment.

After the experiment ran for a while (collecting data in scalar registers), the KIM-1 would dump these registers out to a more “permanent” storage – in this case paper tape. This was great stuff to work with, frequently breaking, sometimes absorbing moisture and swelling.

The experiment would generally run for a couple of days, after which we would have to process the data – which meant uploading it to the mainframe. For the upload, I used a very old (even then) teletype machine, connected to a screaming 300 baud acoustic coupler.

Using this, we uploaded the data to the university mainframe, where I got to analyze it in one of my favourite languages of all time, APL!

Computing was different then!

A New Phone – Galaxy S4, but not really by choice

I finally upgraded my phone last week, having given up my previous phone when I switched employers at the end of June. My previous phone was a Windows phone (a LG Optimus Quantum), which I really liked, but it was 2 and a half years old, showing its age, and stuck on Windows 7.8.

I struggled for quite a while trying to decide what phone to get. A big challenge is that I do not really use a phone as a phone very much. Almost all of my communications is email, sms, Facebook, Twitter, etc., all which I could do as well or better on a small tablet (except SMS). Still, I do need a phone sometimes, just not very often.

My first choice was to get a new Windows 8 phone, because I love the whole Windows Phone user experience. Unfortunately, there a number of obstacles to getting a Windows Phone:

All of the Windows phones on the market here in Canada are almost a year old, which is pretty old in this market. The only new activity is with the Lumia line, which unfortunately only available from Rogers in Canada (and I absolutely, positively will NOT do business to Rogers).
The carrier I deal with primarily is Bell, and Bell’s interest in Windows phone has always been marginal at best. There is one device listed on their web site, and none available locally at their retail outlets.
Microsoft’s whole story on Windows Phone scares the crap out of me. I have no confidence in their commitment to the platform, and no confidence that if I buy a Windows Phone 8 device now that I won’t be orphaned in 6 months.

So, Windows Phone was pretty much a non-starter this time around.

So, I started looking at Android devices (I am not quite crazy enough to drink the Apple koolaid yet!). I was primarily considering three devices:

Galaxy Note 2
Galaxy Note 8
Galaxy S4

As most who know me know really well, I love devices that I can write on. Hence my interest in the Galaxy Note products.

I was really excited in late June and early July when I read that the LTE version of the Note 8 was coming to Canada, and that it supports phone calls. Yes, I know, it would be a big-ass phone, but for the amount I use it as a phone, it would be fine (with a bluetooth headset, or in hands-free mode in the car). Unfortunately, the version released in Canada does not support phone calls (we are screwed once again – not sure if this is Samsung’s decision, or Canada’s carriers, or the CRTC, but it really pisses me off!) So my dream of having a single device covering all of my needs was dashed.

I also gave serious consideration to the Note 2. While it does support handwriting, it is a little too small to really be useful for document review, note-taking, etc. In addition, the Note 2 is approaching obsolescence with the Note 3 rumoured to be due out in a few months. Again, not crazy about the idea of being stranded on last-generation hardware. Finally, it is a little big as a phone. In a way, it is a “worst of all worlds” device, being too small to be a good tablet and too big to be a good phone.

So in the end, I went ahead with the S4 (despite the fact that Canada got screwed on the processor). I have had it for a few days now, and while the user experience does not come close to Windows Phone, it is adequate. The camera is a huge leap from my previous phone, especially the low-light performance. I am just discovering the apps that I like (beyond the basics that I found right away). One thing that is annoying (though I knew about it before buying) is the amount of storage taken up by the OS + Samsung bloatware. On a 16 gb device, to have over half of it taken up by the OS and vendor components that cannot be uninstalled is just sick. I immediately picked up a memory card, and that alleviates the problem somewhat, but it is still annoying.

I may post a more thorough review once I know better how I feel about the device.

Extract speaker notes from slides

Add some tags for pauses

Render audio using Amazon Polly

Putting it all together

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: