InvalidPathException

Hello,

I am using SCM Manager 2.38.1 version. I only have subversion enabled in my system.

I just finished importing my dump files in SCM Manager and I saw following errors logged.

2022-08-14 12:26:07.872 [CentralWorkQueue-3] [          ] ERROR sonia.scm.work.UnitOfWork - task sonia.scm.search.LuceneSimpleIndexTask@3093b9ff failed after 27.71 s
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: trunk/D?k?mantasyon/Sat?? sonras? Operasyon.doc
        at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:145)
        at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:69)
        at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:279)
        at java.base/java.nio.file.Path.of(Path.java:147)
        at java.base/java.nio.file.Paths.get(Paths.java:69)
        at com.cloudogu.spotter.internal.FilenameBasedLanguageDetectionStrategy.detect(FilenameBasedLanguageDetectionStrategy.java:40)
        at com.cloudogu.spotter.internal.BestEffortMatchingStrategy.detect(BestEffortMatchingStrategy.java:54)
        at com.cloudogu.spotter.ContentTypeDetector.detect(ContentTypeDetector.java:71)
        at sonia.scm.io.DefaultContentTypeResolver.lambda$resolve$1(DefaultContentTypeResolver.java:70)
        at java.base/java.util.Optional.orElseGet(Optional.java:369)
        at sonia.scm.io.DefaultContentTypeResolver.resolve(DefaultContentTypeResolver.java:70)
        at sonia.scm.io.DefaultContentTypeResolver.resolve(DefaultContentTypeResolver.java:37)
        at com.cloudogu.scm.search.FileContentFactory.create(FileContentFactory.java:48)
        at com.cloudogu.scm.search.Indexer.store(Indexer.java:64)
        at com.cloudogu.scm.search.IndexSyncWorker.updateIndex(IndexSyncWorker.java:99)
        at com.cloudogu.scm.search.IndexSyncWorker.reIndex(IndexSyncWorker.java:112)
        at com.cloudogu.scm.search.IndexSyncWorker.ensureIndexIsUpToDate(IndexSyncWorker.java:72)
        at com.cloudogu.scm.search.IndexSyncer.ensureIndexIsUpToDate(IndexSyncer.java:79)
        at com.cloudogu.scm.search.IndexSyncer.ensureIndexIsUpToDate(IndexSyncer.java:59)
        at com.cloudogu.scm.search.IndexerTask.update(IndexerTask.java:57)
        at sonia.scm.search.LuceneIndexTask.run(LuceneIndexTask.java:68)
        at sonia.scm.work.UnitOfWork.run(UnitOfWork.java:120)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

These directories/files have Turkish specific national characters in their names.

I located the repository in filesystem but I cannot find such files there. They must be inside subversion format somewhere.

Browser can display folders and use them (go inside and display files).
image

Files however, they are displayed, but cannot be downloaded

There is no problem fetching files over subversion (TortoiseSVN). Directory/Filename is correct. Data in files is correct.

Problem only related to SCM Manager interface as far as I can tell.

I can provide a sample subversion dump having just a few sample directory and files if needed.

Thanks & Regards,
Ertan

Hey @ertank,

thanks your your bug report. We will try to reproduce this error and find a solution for the character issues. Providing us a example repo would be helpful.

Regards, Eduard

Hi @eheimbuch,

I prepared a sample subversion repo. There is no sensitive information in it.
Can be downloaded from here

Thanks & Regards,
Ertan

I tried to reproduce your error using your example repository. Until now it works as expected for me…

What is the default encoding on your machine?

Hi,
Client machine where I use SCM in browser run Windows 11 with English default encoding.
Server machine which run SCM Manager has below settings

root@omv:~# locale
LANG=
LANGUAGE=en_US:en
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
root@omv:~#

Thanks & Regards,
Ertan

Hey Ertan,

could you please try to change your encoding like this:

LANG=en_US.UTF-8

I could change encoding in this system. There was no problems modifying it in terms of software installed and running.

Doing this and a reboot solves the problem.

Thank you.