Tika option to keep XML tags

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Tika option to keep XML tags

Feng Ye-2

Hi Experts,

I found that XML tags are removed when using Tika to process the xml files. As tags contain useful metadata info (such as author etc), is there an option to keep the tags? Your timely reply will be appreciated!

 

Thanks!

feng

 

Reply | Threaded
Open this post in threaded view
|

Re: Tika option to keep XML tags

Nick Sincaglia
I believe this was corrected recently in Tika 1.9

Are you using the latest version of Tika? What tags in particular have you noticed are missing?

Nick

On Nov 27, 2018, at 2:57 PM, Feng Ye <[hidden email]> wrote:

Hi Experts,
I found that XML tags are removed when using Tika to process the xml files. As tags contain useful metadata info (such as author etc), is there an option to keep the tags? Your timely reply will be appreciated!
 
Thanks!
feng


Reply | Threaded
Open this post in threaded view
|

Re: Tika option to keep XML tags

Tim Allison
To confirm, you're processing XML files, and you'd like to see the
entity names and attribute values?  Are these XML files of any
general/common type or are these "any old" xml files you happen to
find -- is this a specific subset/subclass of XML (like kml), and
you'd like us to include specific tags?
On Tue, Nov 27, 2018 at 4:06 PM Nick Sincaglia <[hidden email]> wrote:

>
> I believe this was corrected recently in Tika 1.9
> https://issues.apache.org/jira/browse/TIKA-2761
>
> Are you using the latest version of Tika? What tags in particular have you noticed are missing?
>
> Nick
>
> On Nov 27, 2018, at 2:57 PM, Feng Ye <[hidden email]> wrote:
>
> Hi Experts,
> I found that XML tags are removed when using Tika to process the xml files. As tags contain useful metadata info (such as author etc), is there an option to keep the tags? Your timely reply will be appreciated!
>
> Thanks!
> feng
>
>
>