regex to match HTML <p ...> tag starting with a lowercase letter - Stack Overflow
I am editing an epub file in
Sigil and would like to match
HTML <p ...>
tags when the 1st char after the closing tag >
is
in lower case. I saw some answers on this site to match p tags
with attributes, but not a p tag without attributes. I don't know
how regular expressions work, so I'm trying to figure out what
change I need to do to match both?
Examples:
<p class="calibre1">All the while I was...</p>
<p>All the while I was...</p>
<p class="calibre1">all the while I was...</p>
<p>all the while I was...</p>
The regex should match the last 2 tags in the example above.
The code that I have (/<\/?([^p](\s.+?)?|..+?)>[a-z]/
) matches
only the 3rd, not the 4th tag.
Important: Sigil has no HTML parser, so I have to stick to using the simple search engine which accepts regular expressions.
I am editing an epub file in
Sigil and would like to match
HTML <p ...>
tags when the 1st char after the closing tag >
is
in lower case. I saw some answers on this site to match p tags
with attributes, but not a p tag without attributes. I don't know
how regular expressions work, so I'm trying to figure out what
change I need to do to match both?
Examples:
<p class="calibre1">All the while I was...</p>
<p>All the while I was...</p>
<p class="calibre1">all the while I was...</p>
<p>all the while I was...</p>
The regex should match the last 2 tags in the example above.
The code that I have (/<\/?([^p](\s.+?)?|..+?)>[a-z]/
) matches
only the 3rd, not the 4th tag.
Important: Sigil has no HTML parser, so I have to stick to using the simple search engine which accepts regular expressions.
Share Improve this question edited Nov 18, 2024 at 8:20 Patrick Janser 4,3271 gold badge19 silver badges22 bronze badges asked Nov 15, 2024 at 19:38 MichaelMichael 1175 bronze badges 5 |1 Answer
Reset to default 1The following regex looks like a good starting place:
<p[^>]*?>[a-z]
From there I'm not sure what you want to capture, but it'll work. And yes, of course you should you an HTMLParser for this, but for something as simple as this I don't see why regex is an issue (provided you know the input, it won't work on a generalized html input).
- 下个月Win7正式“退休”,数据显示国内近60%电脑用户仍在使用
- 甲骨文:安卓让谷歌赚了220亿
- reveal.js - Why do the divs have odd formating and non-matching colors in revealjs Quarto presentation? - Stack Overflow
- c++ - Level-Order Traversal of a Binary Tree Without Recursion? - Stack Overflow
- angular - Dynamic Data in Add-to-calender-button - Stack Overflow
- c# - Getting username of logged in user with NegotiateWindows domain credentials - Stack Overflow
- ruby on rails - ActiveRecord not getting id after save - Stack Overflow
- sql server - Entity Framework Core Migration - FluentAPI - Extraneous DB Columns - Stack Overflow
- azure - .Net Core C# IMAP integration for outlook - Stack Overflow
- Keeping Data In Denormalized NoSql Databases (CassandraScyllaDb) Tables In Sync? - Stack Overflow
- flutter - Connect user to stripe connect to allow him receiving money - Stack Overflow
- apple swift: is forKey:fileSize considered accessing non-public API? - Stack Overflow
- c# - Having trouble getting the correct position to draw my sprite to when rotating it - Stack Overflow
- python - Why do I get AttributeError: module 'tensorflow.keras.backend' has no attribute 'placeholder&am
- expo - Thread problem with expo_av in React Native app development - Stack Overflow
- epub - When extracting Kobo bookmarks frm KoboReader.sqlite, how do I determine pageslocations - Stack Overflow
- excel - VBA script to remove duplicates when the gap values between duplicate pair meets certain criteria - Stack Overflow
document.querySelectorAll('p')
. Then on each of them, look at theinnerText
property and test it against/^\p{Ll}/u
, to see if it starts with a lowercase letter in any language (using the Unicode flag). The reasons are multiple: A. HTML entities:é
isé
and is a lowercase letter. B. Your paragraph could start with an inner tag like<p><strong>shit</strong> starts with a lowercase...</p>
. C. Spaces, tabs, new lines, HTML comments before the first letter. – Patrick Janser Commented Nov 15, 2024 at 23:49<p([^>]*)>((?:\s*|<!--.*?-->|<\s*\w+[^>]*>)*)(\p{Ll})
with the regex options "Dot All" and "Unicode Property" and replace by<p\1>\2\U\3\E
to directly convert the lowercase letter to its uppercase version.\U
will uppercase the capturing group n°3, which is the lowercase letter.\E
stops the uppercase modifier. – Patrick Janser Commented Nov 18, 2024 at 11:46