#1 : 20/05-24 15:55 Doug
Posts: 3
|
i broke my naming structureI'm struggling with regex...
I need to move segments of various lengths to the end of the filenames, just before the extension (for subtitles); ie: video (2010).eng [etc etc].srt video (2010).eng.sdh [etc etc].srt video (2010).eng.forced [etc etc].srt to.... video (2010) [etc etc].eng.srt video (2010) [etc etc].eng.sdh.srt video (2010) [etc etc].eng.forced.srt I'm pretty sure this is possible with regular expressions, but I'm just not getting it... Any help would be appreciated! |
#2 : 20/05-24 19:46 Miguel
Posts: 148
|
Reply to #1:
Hi Doug Not a very elegant solution but with your examples work. METHOD REPLACE: Replace : ^(\D+\d+\D)(.\w*.\w+) (\[.*]) Replace with: $1 $3$2 (space between $1 and $2) https://i.ibb.co/YX9jcz9/Captura-20-05-2024-23s. png Miguel EDIT: Use regular expressions checked. I know "Gone with the wind" was made in 1939. :( |
#3 : 20/05-24 23:08 Doug
Posts: 3
|
Reply to #2:
Perfect! You sir, are a gentleman and a scholar! Thank You!! |
#4 : 21/05-24 04:19 Delta Foxtrot
Posts: 324
|
Reply to #3:
Hi Miguel and Doug, My motto is, if it works and doesn't cause catastrophic backtracking or computer coma it is elegant. My other motto (actually I saw it in a movie last night): Life is like a skating rink. Everybody falls down eventually. :) That said, I do see one thing that might be more generalized. If there are digits in the title it short-circuits that expression. Obviously this didn't happen with Doug's filenames, so the regex IS perfect. But it's useful to think about edge cases (I think...). At first I thought, possibly the best way to improve is to avoid the title and date altogether. Taking your regex and basically just cutting out the first part: TTR: (\.\w*?\.?\w+) (\[.*]) RW: " $2$1" (one space before$2, no double-quotes) In this case I just looked for the first period, and it works on a small sample, but then I realized that a literal period in the title would break it. There are 84 titles in my database of 10,000 blu-rays & DVDs that . Not 717, like digits in the title, but still not optimal. Finding a \) close-parenthesis brought it down to 64, but the \)\. combination got me to zero matches. Thinking that there could be a space between the ")" and "." on some, I tried that combination and also got no matches in the titles. So my final answer, no lifeline in the regex fun category for 100: TTR: "\) ?\K\.(\w*?\.?\w+) (\[.*])" or "\)\K\.(\w*?\.?\w+) (\[.*])" without optional space RW: " $2.$1" (No quotes of course) So: look for something unmistakable, forget (\K) what we don't really want, then get what we really need and manipulate it. I did fiddle the first capture group here a little from your second group. It makes very little difference in this application but in a database with 50,000 rows it might save some processor cycles. Or it might not. I think it avoids excess reading of the first section of the subtitle part if there's a second period. That's IF the lazy modifier (*? instead of *) even works in ARen. And making the periods literal may help, I think I had a reason for doing that at the time but... That's about as good as I can get given the current regex engine. Not sure I could do better even with lookaround, subroutines or conditionals. Since I tried to eliminate backtracking by the regex engine as much as possible this *could* be more efficient than any of those possible methods. PCREtest could tell us but I don't really care that much - it's good enough for me! :) My favorite, though was this (which I just stumbled on by accident, there was a New Name method in the ARen script I brought up to start thinking about the problem, so of course I started playing with it first) New Name method: <Substr:1:.> [<Rsubstr:1:[>.<Substr:.: > I'm not sure why I love this so much, but it's different right? Proof that there's more than one way to break a piano. Comparison of methods: https://drive.google.com/file/d/1FLR2CLpdCTnzUck MGioiuE-yGJTBHlTE/view?usp=sharing Best regards, DF |
#5 : 21/05-24 19:09 Miguel
Posts: 148
|
Reply to #4:
Hello DF. You're right. I don't know why I didn't think about that possibility. From now on I must weigh all possibilities before posting the answer. Thank you. I think this new version of my regex fixes the problem. REPLACE: (.\D+\d+\w.).(\w.*).(\[.*]) REPLACE WITH: $1 $3.$2 Deleting ^ and some minors fixs seems to have solved the problem you raise. I hope I haven't caused any trouble for Doug. I know that yet isn´t elegant but I´m in my learning curve with regex. Give me time. I'm surprised with the possibilities of <Subsrt> and <RSubsrt>. They are incredible. Miguel |
#6 : 23/05-24 17:46 Doug
Posts: 3
|
These solutions have been fantastic! I was able to correct all of my files with minimal input using the 'replace' method for files with a date (and therefore a ")") and the 'new name' script for everything without.
You guys are wizards and regex hurts my brain. Thanks again! -D |
#7 : 25/05-24 20:41 Delta Foxtrot
Posts: 324
|
Reply to #6:
>> regex hurts my brain. Amen brother! It's definitely an acquired pain. But it hurts so good! :) Best, DF |