Fixed invalid number mappings for datetime strings#157
Conversation
osxcollector/osxcollector.py
Outdated
| """ | ||
| # If the key hints this is a timestamp, try to use some popular formats | ||
| if key and any([-1 != key.lower().find(hint) for hint in ['time', 'utc', 'date', 'accessed']]): | ||
| if key and any([-1 != key.lower().find(hint) for hint in ['time', 'utc', 'date', 'accessed']]) and not 'times' in key.lower(): |
There was a problem hiding this comment.
Any ideas if we could refactor the first part of the if statement to also use the in keyword instead of any + find + in combo?
osxcollector/osxcollector.py
Outdated
| if ts: | ||
| return _datetime_to_string(ts) | ||
| if not ts: | ||
| ts = datetime.fromtimestamp(0) |
There was a problem hiding this comment.
Please add a comment about what is going on here. It maybe not clear why we are assigning the default datetime here when there is not ts calculated before.
…t conforming to timestamp heuristics
9c2e5c1 to
2c1e80e
Compare
|
Currently the heuristics for checking if a key maps to a timestamp is seeing if the key A contains certain keywords, and B has a value that is within a certain epoch timeframe. Timestamps failing to meet B should thus be given a default timestamp value. These fields must not however be grouped together with non-timestamp keys meeting A, which represent a huge subset. One solution would be to whitelist all known timestamp keys for timestamp rendering, and giving those whose values fail to be rendered the default timestamp. However, this would be overkill since there are hundreds if not thousands of osxcollector-output timestamp-based keys. The most cost-effective solution right now in my opinion is to simply check for known timestamp keys failing to meet B . From what I've seen, this list is quite small, and we can for now just check it in the code by means of list filtering as done here. As an extension this could be instead made into a JSON containing all keys with known mapping issues. |
This fixes GH-156, where timestamp keys whose values could not be correctly rendered to timestamp strings were ultimately mapped to their original Number types instead of string types. As described in the issue, the problem with this was that when later ingested by analytics frameworks like Elasticsearch, type conflicts could arise since some values were interpreted as longs and others as strings. The fix is to by default return a very early
datetimeif rendering fails.