Using OpenDocumentParser on Tika 1.19

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Using OpenDocumentParser on Tika 1.19

aravinth thangasami
Hi all,

While migrating to Tika 1.19, I have tried to parse an ODT document using Tika OpenDocumentParser API. 
It resulted in an IOException but the same file gets passed when I'm using AutoDetectParser. 


After Tika-2675, the behaviour got changed I think. The Parser accepts only it passed as ZIp File


How to use OpenDocumentParser directly for an OpenOffice Documents?




Thanks 

Reply | Threaded
Open this post in threaded view
|

Re: Using OpenDocumentParser on Tika 1.19

aravinth thangasami
The Issue comes when I set Parser Context to OpenDocumentParser, In Older version, the file gets parsed even If I set the context 

Thanks

On Mon, Sep 24, 2018 at 7:57 PM, aravinth thangasami <[hidden email]> wrote:
Hi all,

While migrating to Tika 1.19, I have tried to parse an ODT document using Tika OpenDocumentParser API. 
It resulted in an IOException but the same file gets passed when I'm using AutoDetectParser. 


After Tika-2675, the behaviour got changed I think. The Parser accepts only it passed as ZIp File


How to use OpenDocumentParser directly for an OpenOffice Documents?




Thanks 


Reply | Threaded
Open this post in threaded view
|

Re: Using OpenDocumentParser on Tika 1.19

Tim Allison
Can you share your full code for how you're calling Tika?  Does this
happen on every ODT or only a few?
On Mon, Sep 24, 2018 at 10:52 AM aravinth thangasami
<[hidden email]> wrote:

>
> The Issue comes when I set Parser Context to OpenDocumentParser, In Older version, the file gets parsed even If I set the context
>
> Thanks
>
> On Mon, Sep 24, 2018 at 7:57 PM, aravinth thangasami <[hidden email]> wrote:
>>
>> Hi all,
>>
>> While migrating to Tika 1.19, I have tried to parse an ODT document using Tika OpenDocumentParser API.
>> It resulted in an IOException but the same file gets passed when I'm using AutoDetectParser.
>>
>>
>> After Tika-2675, the behaviour got changed I think. The Parser accepts only it passed as ZIp File
>>
>>
>> How to use OpenDocumentParser directly for an OpenOffice Documents?
>>
>>
>>
>>
>> Thanks
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Using OpenDocumentParser on Tika 1.19

aravinth thangasami
Hi, 

It happened to all my ODT files

The Test code I have used is, 
If didn't set the ParserContext the file gets parsed successfully on Tika 1.19 

        context.set(Parser.class, parser);

But in Older version, setting Parser Context doesn't have an impact 

        Parser parser = new OpenDocumentParser();
        FileOutputStream fileOutputStream = new FileOutputStream("out.txt");
        ParseContext context = new ParseContext();
        context.set(Parser.class, parser);
        Metadata metadata = new Metadata();
        FileInputStream in = new FileInputStream("test.odt");
        parser.parse(TikaInputStream.get(in), new WriteOutContentHandler(new OutputStreamWriter(fileOutputStream), 1024 * 1024 * 1024), metadata, context);
        System.out.println(metadata);


Thanks 
Aravinth


On Mon, Sep 24, 2018 at 8:24 PM, Tim Allison <[hidden email]> wrote:
Can you share your full code for how you're calling Tika?  Does this
happen on every ODT or only a few?
On Mon, Sep 24, 2018 at 10:52 AM aravinth thangasami
<[hidden email]> wrote:
>
> The Issue comes when I set Parser Context to OpenDocumentParser, In Older version, the file gets parsed even If I set the context
>
> Thanks
>
> On Mon, Sep 24, 2018 at 7:57 PM, aravinth thangasami <[hidden email]> wrote:
>>
>> Hi all,
>>
>> While migrating to Tika 1.19, I have tried to parse an ODT document using Tika OpenDocumentParser API.
>> It resulted in an IOException but the same file gets passed when I'm using AutoDetectParser.
>>
>>
>> After Tika-2675, the behaviour got changed I think. The Parser accepts only it passed as ZIp File
>>
>>
>> How to use OpenDocumentParser directly for an OpenOffice Documents?
>>
>>
>>
>>
>> Thanks
>>
>