Aiton English

Learning Languages for Life

Category: Python

Python NLP Task: Scrape website to get words in bold

This is a student vocabulary list. The student copied a lot of text and put in bold the new expressions he is interested in. I only want the words in bold. Please use Python + the beautiful soup library to remove all words not in bold

TEXT BELOW….

What does this really mean in the near term, and what do we think we’ll see in a few years?

To separate the hype from the substance

I want to dive right in to the terminology for our listeners

Let me start with a couple of framing elements

We’re just getting started.

Think about that now from an enterprise business perspective

I do want to make sure we sort of simplify a little bit at least for me and for some of the listeners

At the most basic level

Jack, let me turn it over to you before we go deeper into some of these other areas.

So there’s a lot of capital that’s going to be required to realize this.

What new participants do you see emerging in the connectivity ecosystem?

We think that there’s both an offensive and defensive perspective;

from a defensive perspective they have massive customer bases and assets, ….. on the offensive side, they also have a chance to explore a plethora of new business models and develop new solutions that drive more top-line growth for their business and adjacent businesses

If we can overcome this fiber challenge, Dave, what are the new business models that we may begin to see evolving?

They’ll be able to choose to be part of the connected economy, and hopefully at price points that they can afford.

I would encourage all industries to be thinking about how they can best capitalize and take advantage of the opportunities that these higher speeds and increased access are going to enable

Consumers are without a doubt going to benefit from it

new players and technology, as well as new industries, are beginning to rewrite the rules.

5 G geopolitics

When it comes to your business and Russia, what do you think about this idea?

In the race to dominate the next generation of cellular networks, both the United States and China know there’s much more at stake than ultrafast internet.

United States believes that whoever controls 5G, the fifth generation of wireless communication, will have a global advantage for decades to come.

The fear is that China is almost there

David, tell us about what happened in Germany earlier this month.

Michael, it was a really remarkable scene at the Munich Security Conference

Mike Pence

It is my honor to join you for the 55th Annual Munich Security Conference.

Under President Donald Trump, the United States will seize every opportunity to achieve peace.

But we will approach every challenge with our eyes wide open.

We will deal with the world as it is, not as we wish it to be.

The United States has also been very clear with our security partners on the threat posed by Huawei and other Chinese telecom companies.

America is calling on all our security partners to be vigilant and to reject any enterprise that would compromise the integrity of our communications technology or our national security systems.

We must protect our critical telecom infrastructure. We cannot ensure the defense of the West if our allies grow dependent on the East.

Do not let Chinese companies and the Chinese government into your communications systems, because you will forever poison the security of your countries and perhaps your relationship with the United States.

5G overview,

This is the future and it will be powered by 5G.

·     self-driving cars,

·     smart cities,

·     fully connected homes,

·     robots.

These are the networks that will connect the internet of things,

the billions of different devices we’re now attaching to the internet to the central networks and to the cloud.

5G is the next generation of wireless service. And it may be closer than many people think.

·      It’ll be the way that our autonomous vehicles run.

·      It’ll be the way our machinery runs.

·      It’ll be the way our gas pipelines, our water systems run.

All of them connected into these networks.

Whoever dominates these fifth-generation networks will have an economic intelligence and military edge for decades to come.

Because in future conflicts, the war starts not with nuclear weapons, not with artillery. It starts with unplugging a country— their electricity. And it starts, of course, with their communication networks.

The 5G networks are getting ready to get rolled out.

It’s billions and billions of dollars of investment.

And the decisions on those investments will be made in the next 6 to 18 months.

This is the new arms race. In the old Cold War, people counted missiles. In the new era, you’re going to count who controls which networks.

it became one of the fastest growing tech companies in the world.

Well, what they’re largely offering, Michael, is a lower price.

First, there are developing countries that just don’t have very much money to go build these networks. So if the Chinese come along with incredibly good terms, that’s pretty appealing, right?

you better think twice about letting the Chinese build the core of the networks that connect the political and military leadership to the rest of the world

We don’t really want all of our messages going directly to Beijing.

Well, you know, if I strike a really good trade deal with Xi Jinping, the Chinese president, maybe we’ll just release Ms. Meng.

China and the United States are engaged in the last stages of this enormously complex set of trade talks that President Trump has escalated.

think there’s one strong argument in favor of letting Huawei compete in some of these Western countries, and maybe even compete in the United States. It’s that if they want to have their equipment and their software inside the United States, they have to show it to American authorities.

Python NLP Task: Clean HTML and measure speed

Problem

I recently created a task for a student learning Russian, but when I copied the text from a webpage I got the HTML as well as the text.


Here is the “raw” text with HTML..

<p>Джейме — старший сын лорда <a href=”https://gameofthrones.fandom.com/ru/wiki/Тайвин_Ланнистер“>Тайвина
Ланнистера</a>, главы дома <a href=”https://gameofthrones.fandom.com/ru/wiki/Ланнистеры“>Ланнистеров</a>,
(богатый) семьи <a href=”https://gameofthrones.fandom.com/ru/wiki/Семь_Королевств“>Семи
Королевств</a>. В детстве Джейме не любил
читать, чтение (даваться)
ему ___ трудом, и ему приходилось упражняться
(час), с (тот) пор он не очень-то любит это
занятие.
</p>


Task 1


 

Your task is to use two different libraries to clean the raw text of HTML. Library 1 = regex (import re) / Library two = Beautiful Soup (bs4 )Ci.e. you will have two different programs that do the same thing, clean the HTML from a raw text.

The “clean” text should look like this………………….

Джейме — старший сын лорда Тайвина
Ланнистера, главы дома Ланнистеров,
(богатый) семьи Семи
Королевств. В детстве Джейме не любил
читать, чтение (даваться)
ему _ трудом, и ему приходилось упражняться
(час), с (тот) пор он не очень-то любит это
занятие.

Task 2

The final task is to measure speed. Almost always in programming there is more than one way to achieve a goal, and often the one that is quicker is better. USe google to find out how to measure the time it takes the two code solutions


Python NLP Task: How many words?

My student wants to know how many words she will need to learn for the Chinese intermediate exam. Use Python to give her an answer. Each word is on a new line

拿上
你猜
听不懂
读书
走路
来接我
骑车
报纸
护士
服务员
售票员
暖和
散步
考试
成绩
开会
地址
放假
沙发
草地
超市里
变化
到处
附近
图书馆
准时到了
经常
最近

邻居
花了钱

应该

饼干
常常
盒子里面
两种糖
必须

记得
除了


葡萄
教室
打扫

干净
认真
聊天

另外
出去的时候
别忘了
窗户

西边
方便

校园
打排球
肚子都圆了

介绍
今天多云
可能会
奇怪
停电了
讨厌
游戏
打算
暑假
参观
长城
担心
安全
钢琴弹
真不错
开始
她看起来
不但。。而且
健康
年轻
她看起来不但健康,而且年轻。
或者
空调
春天
夏天
关于
一会儿
讨论

快点儿
打扰一下
地铁站
离这儿远吗
公里
在路东
准备
如果
没什么其他事儿
离开
毕业
见面

好像
声音
手表
决定
刚才
生气
厉害
熟悉
表演
大声地笑了起来
国外
电子邮件
联系
和我联系
发烧了
虽然
没有出汗
一直很不舒服
参加
比赛
紧张
心情
练习
努力
一点儿也不难
我太马虎了
做错了
重要
容易
复习
玩具
在排队
节日
爬山
有礼貌
警察
写作业
秋天
凉快
初中的知识
已经
学会了
洗手间
二楼
楼下
下课的时候
突然
明白
蝴蝶

放哪儿
手机
手机响了
句子
照相
太辣了
打网球
体育
体育馆
虫子
天气越来越热了
将来
我希望
刷牙

危险
眼镜
戴眼镜
兴趣
兴趣
对画画儿感兴趣
坚持

锻炼
身体
锻炼
坚持
饮料
非常
号码
公园
迷路了
清楚
看不清楚
安静
别害怕
新闻
然后
烤鸭
进去
放寒假
旅游

地图
在三层
打针
习惯
搬到
洗脸
刷牙
电视声音
懂礼貌
骑自行车
破了
合适
变 瘦 了
上班
累坏了
盘子
办法
受不了
注意
注意安全
记住
其他
其他人都
同意
信封
都行
冰箱
身体健康
祝您身体健康
幸福
幸福快乐
大概
离开
马虎
毕业
饭馆
尝尝
电梯
做游戏


Python NLP Task: lines in a text

Convert the text below so that is looks like in the photo below. i.e. putting the “broken” lines into normal paragraphs

Blazing Trails

Thanks to the Appen TTS engine, GuildLink was the frst company in the world to
provide audio CMIs and is still one of the only ones doing it. “There’s an increase
in demand for accurate and current medicines information and being able to
provide the information in multiple ways has helped alleviate that,” says Paonne.
“Appen’s text-to-speech conversion has been pivotal in helping us provide more
information in more ways and helping patients understand what they’re taking.”

GuildLink’s pioneering service has led to partnerships with several Australian
government-sponsored websites that use the information in the company’s
database. Those include the Therapeutic Goods Administration site, and other
distributors of medicine information.

But the company doesn’t want to stop there. “To further improve accessibility,
we’d like to look into translating the CMIs into multiple languages in the future,”
says Paonne. “It’s so important that consumers have high-quality, accurate
information they can understand. Our philosophy is the more information you
give people, the better.”

© 2019 Aiton English

Theme by Anders NorenUp ↑