pdr
fastread(fp: Union[str, Path], debug: bool = False, search_paths: Union[Collection[str], str] = (), **kwargs) -> Data
Read a file with PDR, with the assumption that the label is either
attached to fp or that fp is itself a detached label file, and ignoring
the usual double-check for fp's actual existence in the filesystem.
Intended for cases when you want access to a product's metadata very
quickly and you know exactly where its label is.
Source code in pdr/__init__.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
read(fp: Union[str, Path], debug: bool = False, label_fn: Optional[Union[Path, str]] = None, search_paths: Union[Collection[str], str] = (), skip_existence_check: bool = False, **kwargs) -> Data
Read a data product with PDR. fn can be any file associated with the
product, preferably a detached label file if it exists. Returns a Data
object that provides an interface to the data and metadata in all available
files associated with the product.
Source code in pdr/__init__.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
_scaling
find_special_constants(data: PDRLike, obj: np.ndarray, name: str) -> dict[str, Number]
attempts to find special constants in an ndarray associated with a PDS3 object by referencing the label and "standard" special constant values.
Source code in pdr/_scaling.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
fit_to_scale(arr: np.ndarray, scale: Union[Integral, Real], offset: Union[Integral, Real]) -> np.ndarray
Return a version of arr cast to the minimum dtype that will hold its
range of values after multiplying by offset and adding scale.
Supports:
float32, float64, uint8, int8, uint16, int16, uint32, int32, uint64, int64.
Source code in pdr/_scaling.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
mask_specials(obj, specials)
Source code in pdr/_scaling.py
55 56 57 58 59 60 61 62 63 64 | |
scale_array(meta: PDRLike, obj: np.ndarray, object_name: str, inplace: bool = False, float_dtype: Optional['np.dtype'] = None)
Source code in pdr/_scaling.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
scale_pds4_tools_struct(struct: object) -> np.ndarray
see pds4_tools.reader.read_arrays.new_array
Source code in pdr/_scaling.py
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
bit_handling
utilities for parsing BIT_COLUMN objects in tables.
convert_byte_column_to_bits(byte_column: pd.Series, byte_order: ByteOrder) -> pd.Series
Converts byte strings in a Series into binary strings (e.g. b"" -> "10"). All elements of the Series must be byte strings, and all of them must have the same length.
Source code in pdr/bit_handling.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | |
convert_to_full_bit_string(table: pd.DataFrame, fmtdef: pd.DataFrame) -> pd.DataFrame
Converts the elements of a DataFrame's bit string columns from bytes to binary strings (e.g. '00100011').
Source code in pdr/bit_handling.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
expand_bit_strings(table: pd.DataFrame, fmtdef: pd.DataFrame) -> pd.DataFrame
Top-level handler function for the bit column workflow. Converts a binary table's bit string columns (if any) from raw bytes to lists of strings (e.g. ['0010, 0011']).
Source code in pdr/bit_handling.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
factor_to_dtype(field_length: int, byte_order: ByteOrder) -> np.dtype
Determine the smallest (in terms of length) structured dtype composed of
unsigned integer dtypes that can parse binary blob of a particular length
and byteorder into a list of bytes. Optimizing the dtype length here
reduces the number of times we have to call bin() in
convert_byte_column_to_bits(), which is one of the biggest performance
bottlenecks in this module.
Source code in pdr/bit_handling.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
get_bit_start_and_size(obj: dict, definition: MultiDict, identifiers: DataIdentifiers) -> dict
Parse the BIT_COLUMN information from a MultiDict that represents a COLUMN
definition into lists of bit string start positions and sizes that can
later be used to parse byte strings into bit strings, then add that
information to a parsed column definition. A subcomponent of the
queries.read_format_block() workflow.
Source code in pdr/bit_handling.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | |
set_bit_string_data_type(obj: dict, identifiers: Mapping[str, Any]) -> dict
Infer a bit string column's data type and add it to obj (a parsed column
definition). A subcomponent of the queries.read_format_block() workflow.
Source code in pdr/bit_handling.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
splice_bit_string(table: pd.DataFrame, fmtdef: pd.DataFrame) -> pd.DataFrame
Split the elements of a table's bit string columns into lists of binary strings according to the bit boundaries specified in the label. This function expects to be called after convert_to_full_bit_string(), because the columns must already have been converted into binary strings.
Source code in pdr/bit_handling.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
split_bits(bit_string: Sequence, start_bit_list: Sequence[int], bit_size_list: Sequence[int]) -> list
Split a sequence into a list of subsequences based on start and size specifications. Intended here to be used on binary strings.
Source code in pdr/bit_handling.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | |
browsify
functions for producing browse versions of products
_browsify_array(obj: np.ndarray, outbase: str, purge: bool = False, image_clip: Union[float, tuple[float, float], None] = None, mask_color: Optional[tuple[int, int, int]] = (0, 255, 255), band_ix: Optional[int] = None, save: bool = True, override_rgba: bool = False, image_format: str = 'jpg', slice_axis: int = 0, rgb_channels: Optional[tuple[int, int, int]] = None, **_) -> 'Union[Image.Image, list[Optional[Image.Image]]]'
Attempt to render (and optionally save) an ndarray as one or more images.
Source code in pdr/browsify.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 | |
_browsify_recarray(obj: np.recarray, outbase: str, **_)
Some tabular data with column groups ends up as numpy recarray, which is challenging to turn into a useful .csv file in some cases. This tries to save it as a CSV file, and if it fails, punts and pickles it.
Source code in pdr/browsify.py
192 193 194 195 196 197 198 199 200 201 202 203 | |
_format_as_rgb(obj, rgb_channels)
Source code in pdr/browsify.py
309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | |
_format_as_single_band(band_ix, obj)
for multiband arrays that are not presumably rgb(a), or if we have been instructed to by the override_rgba argument, only export a single band.
Source code in pdr/browsify.py
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 | |
_format_multiband_image(obj, band_ix, override_rgba, slice_axis, rgb_channels)
helper function for _browsify_array -- truncate, stack, or burst multiband images and send for further processing.
Source code in pdr/browsify.py
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 | |
_render_array(obj: np.ndarray, outbase: str, purge: bool, image_clip: Union[float, tuple[float, float]], mask_color: Union[int, tuple[int, int, int]], save: bool, image_format: str, nice_clip: bool) -> 'Optional[Image.Image]'
Handler function for array-rendering pipeline, used by browsify() on
most ndarrays and by show() always. Render an ndarray as a PIL Image,
optionally clipping and masking it. If save is True, save it to disk;
if False, return it.
Source code in pdr/browsify.py
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | |
browsify(obj: Any, outbase: Union[str, Path], **dump_kwargs) -> None
attempts to dump a browse version of a data object, writing it into a file type that can be opened with desktop software: .jpg for most arrays, .csv for tables, .txt for most other things. if it can't find a reasonable translation, it attempts to dump it as .pkl (a serialized binary 'blob').
Source code in pdr/browsify.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
colorfill_maskedarray(masked_array: np.ma.MaskedArray, color: Union[int, tuple[int, int, int]] = (0, 255, 255)) -> np.ndarray
masked_array: 2-D masked array or a 3-D masked array with last axis of length 3. for likely uses, this should probably be 8-bit unsigned integer. color: optionally-specified RGB color (default cyan) return a 2-D or 3-D array with masked values filled with color.
Source code in pdr/browsify.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
eightbit(array: np.array, clip: Union[float, tuple[float, float]] = 0, inplace: bool = False, nice_clip: bool = False) -> np.ndarray
return an eight-bit version of an array, optionally clipped at min/max percentiles. if inplace is True, normalization may transform the original array, with attendant memory savings and destructiveness.
Source code in pdr/browsify.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
find_masked_bounds(image: np.ma.MaskedArray, cheat_low: int, cheat_high: int) -> tuple[Optional[Number], Optional[Number]]
relatively memory-efficient way to perform bound calculations for normalize_range on a masked array.
Source code in pdr/browsify.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | |
find_unmasked_bounds(image: np.ndarray, cheat_low: int, cheat_high: int) -> tuple[Number, Number]
straightforward way to find unmasked array bounds for normalize_range
Source code in pdr/browsify.py
49 50 51 52 53 54 55 56 57 58 59 60 61 | |
normalize_range(image: np.ndarray, bounds: Sequence[int] = (0, 1), clip: Union[float, tuple[float, float]] = 0, inplace: bool = False, nice_clip: bool = False) -> np.ndarray
simple linear min-max scaler that optionally percentile-clips the input at clip = (low_percentile, 100 - high_percentile). if inplace is True, may transform the original array, with attendant memory savings and destructive effects.
Source code in pdr/browsify.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
datatypes
definitions of sample types / data types / dtypes / ctypes, file formats and extensions, associated special constants, and so on.
IMPLICIT_PDS3_CONSTANTS = MappingProxyType({'uint8': {'NULL': 0, 'ISIS_SAT_HIGH': 255}, 'int8': {}, 'int16': {'N/A': -32768, 'UNK': 32767, 'ISIS_LOW_INST_SAT': -32766, 'ISIS_LOW_REPR_SAT': -32767, 'ISIS_HIGH_INST_SAT': -32765, 'ISIS_HIGH_REPR_SAT': -32764}, 'uint16': {'NULL': 0, 'N/A': 65533, 'UNK': 65534, 'ISIS_LOW_INST_SAT': 2, 'ISIS_LOW_REPR_SAT': 1, 'ISIS_HIGH_INST_SAT': 65534, 'ISIS_HIGH_REPR_SAT': 65535}, 'int32': {'N/A': -214743648, 'UNK': 2147483647}, 'int64': {'N/A': -214743648, 'UNK': 2147483647}, 'uint32': {'N/A': 4294967293, 'UNK': 4294967294, 'ISIS_NULL': read_hex('FF7FFFFB', '>I'), 'ISIS_LOW_INST_SAT': read_hex('FF7FFFFD', '>I'), 'ISIS_LOW_REPR_SAT': read_hex('FF7FFFFC', '>I'), 'ISIS_HIGH_INST_SAT': read_hex('FF7FFFFE', '>I'), 'ISIS_HIGH_REPR_SAT': read_hex('FF7FFFFF', '>I')}, 'float32': {'NULL': -3.4028226550889045e+38, 'N/A': -1e+32, 'UNK': 1e+32, 'ISIS_LOW_INST_SAT': read_hex('FF7FFFFD', '>f'), 'ISIS_LOW_REPR_SAT': read_hex('FF7FFFFC', '>f'), 'ISIS_HIGH_INST_SAT': read_hex('FF7FFFFE', '>f'), 'ISIS_HIGH_REPR_SAT': read_hex('FF7FFFFF', '>f')}, 'float64': {'NULL': -3.4028226550889045e+38}})
module-attribute
This constant defines common "implicit" (not specified in the label) PDS3 special constants. Its keys are bits per array element. Some of these constants are derived from ISIS (although sometimes used in products that were not generated by ISIS!); others are suggested in the PDS3 Standards.
Note that the Standards specifically permit other special constants to exist, undefined in the label, and determined only by the operating environment of the data provider, so there can be no guarantee that other special constants do not exist in any particular product.
The "implicit" use of ISIS constants may in fact be illegal, but appears common. also note that some ISIS values collide with Standards-specified N/A / UNK / NULL values -- again, we have no way to automatically distinguish them, and interpret them as the Standards values when we find them unless a label specifically states otherwise.
References: PDS3 Standards Reference v3.8, p.172 (https://pds.nasa.gov/datastandards/pds3/standards/sr/StdRef_20090227_v3.8.pdf) GDAL PDS3 driver TODO: -32768 is noted in this driver as NULL but defined in the Standards as an N/A value -- should clarify (https://github.com/OSGeo/gdal/blob/master/frmts/pds/pdsdataset.cpp) ISIS special pixel values (https://isis.astrogeology.usgs.gov/Object/Developer/_special_pixel_8h_source.html)
PDS3_CONSTANT_NAMES = tuple(PDS3_ISIS_CONSTANT_NAMES + PDS3_CONSTANT_NAMES)
module-attribute
basic" PDS3 special constant parameter names
PDS3_ISIS_CONSTANT_NAMES = tuple([f'{category}{direction}{entity}{prop}' for category, direction, entity, prop in (product(('CORE_', 'BAND_SUFFIX_', 'SAMPLE_SUFFIX_', 'LINE_SUFFIX_', ''), ('HIGH_', 'LOW_', ''), ('INST_', 'REPR_', ''), ('NULL', 'SATURATION', 'SAT')))])
module-attribute
some (all?) of these special constants are derived from ISIS properties; these are names they take on when they are made explicit in a PDS3 label
determine_byte_order(sample_type: str) -> ByteOrder
defines generic byte order for PDS3 physical data types
Source code in pdr/datatypes.py
42 43 44 45 46 | |
integer_code(byteorder: ByteOrder, signed: bool, sample_bytes: int, for_numpy: bool = False) -> str
Translation from integer width, signedness, and byteorder to struct or numpy dtype string.
Source code in pdr/datatypes.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
sample_types(sample_type: str, sample_bytes: int, for_numpy: bool = False) -> str
Defines a translation from PDS3 physical data types to Python struct or numpy dtype format strings, using both the type and byte width specified (because the mapping to type alone is not consistent across PDS3).
Source code in pdr/datatypes.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
errors
AlreadyLoadedError
Bases: Exception
We already loaded this object and haven't been instructed to reload it.
Source code in pdr/errors.py
4 5 6 7 8 | |
DuplicateKeyWarning
Bases: UserWarning
This product has duplicate object names; we're renaming them.
Source code in pdr/errors.py
11 12 13 | |
formats
This module implements a wide variety of special-case behaviors for
nonconforming or malformatted data products. It implements these behaviors as
functions in distinct submodules organized by 'dataset' (mission, instrument,
etc.); the checkers submodule contains dispatch functions that preempt
generic behaviors and redirect them to functions from one of the dataset
submodules. See the documentation for checkers for details on this behavior.
ImageProps
Bases: TypedDict
Standard image properties dict used in image-processing workflows.
Source code in pdr/pdrtypes.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
check_special_bit_column_case(identifiers: Mapping[str, Any]) -> tuple[bool, Optional[str]]
Special case checker used by bit_handling.set_bit_string_data_type()
to preempt generic data type inference.
Source code in pdr/formats/checkers.py
748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 | |
check_special_bit_format(obj: dict, definition: MultiDict, identifiers: DataIdentifiers) -> tuple[bool, Optional[dict]]
Special case checker used by add_bit_column_info() to fix problems in obj
and/or definition caused by mistakes in an external format file. Intended
for cases where check_special_block() doesn't touch the relevant metadata,
and errors are hit before check_special_structure() can be useful.
Source code in pdr/formats/checkers.py
780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 | |
check_special_bit_start_case(identifiers, list_of_pvl_objects_for_bit_columns, start_bit_list) -> tuple[bool, Optional[list[int]]]
Special case checker used by get_bit_start_and_size() to fix incorrectly-defined bit offsets.
Source code in pdr/formats/checkers.py
766 767 768 769 770 771 772 773 774 775 776 777 | |
check_special_block(name: str, data: PDRLike, identifiers: Mapping) -> tuple[bool, Optional[MultiDict]]
specialize() target for queries.get_block(). Intended for cases in
which label pointers don't correspond to label block names AND/OR if a
value within the block needs to be changed before going to other functions.
Source code in pdr/formats/checkers.py
815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 | |
check_special_compressed_file_reader(identifiers: DataIdentifiers, fn: str)
Distribute to correct specialized image loader, otherwise return
False/None. Preempt loaders.datawrap.ReadImage's dispatch to read_image()
Source code in pdr/formats/checkers.py
1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 | |
check_special_fits_start_byte(identifiers: DataIdentifiers, name: str, hdulist: HDUList) -> tuple[bool, Optional[int]]
Preempts generic PDS3 data object -> FITS start byte mapping. Wraps
get_fits_start_byte().
Source code in pdr/formats/checkers.py
1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 | |
check_special_fn(data: PDRLike, object_name: str, identifiers: DataIdentifiers) -> tuple[bool, Optional[str]]
Preempts generic filename specification. Called inline by
Data._object_to_filename().
Source code in pdr/formats/checkers.py
1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 | |
check_special_label(fn: Union[str, Path])
Used primarily to check for labels with known characters invalid in utf-8.
We then read the label with a more correct or lenient encoding. Preempt
loaders.datawrap.ReadLabel's dispatch to read_label(). Also used in
read_pvl().
Source code in pdr/formats/checkers.py
1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 | |
check_special_objects(identifiers: DataIdentifiers)
Check to add objects not correctly ID'd as objects in a label, or remove
objects ID'd in a label. Called inline by _find_objects().
Source code in pdr/formats/checkers.py
1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 | |
check_special_offset(name: str, data: PDRLike, identifiers: DataIdentifiers, fn: str) -> tuple[bool, Optional[int]]
Preempt generic inference of an object's byte offset within a file. Wraps
loaders.queries.data_start_byte().
Source code in pdr/formats/checkers.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
check_special_pds4_cases(structure, filename, object_name)
Load objects from PDS4 files with known issues that do not currently work with pds4_tools. Mostly utilized by datasets not verified by the PDS but that have PDS4 labels (ISRO, ESA, CNSA etc).
Source code in pdr/formats/checkers.py
1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 | |
check_special_position(identifiers: DataIdentifiers, block: MultiDict, target: PhysicalTarget, name: str, fn: str, start_byte: int) -> tuple[bool, Optional[int]]
Preempt generic detection of a table's row or byte offset within a file.
Wraps table_position(). Used for table-specific cases that are partially
but not wholly handled by data_start_byte(), so should not be defined
in check_special_offset().
Source code in pdr/formats/checkers.py
577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 | |
check_special_qube_band_storage(identifiers: DataIdentifiers)
Defines band storage types for QUBE procuts whose labels do not correctly
specify them. Wraps get_qube_band_storage_type().
Source code in pdr/formats/checkers.py
1216 1217 1218 1219 1220 1221 1222 1223 | |
check_special_sample_type(identifiers: DataIdentifiers, base_samp_info: dict) -> tuple[bool, Optional[str]]
Preempt generic mapping of PDS3 data types to numpy dtype strings. Wraps
image_sample_type(); called inline by insert_sample_types_into_df().
Source code in pdr/formats/checkers.py
704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 | |
check_special_structure(name: str, block: MultiDict, fn: str, data: PDRLike, identifiers: DataIdentifiers) -> tuple[bool, Optional[tuple[pd.DataFrame, Optional[np.dtype]]]]
Preempt generic ARRAY/TABLE/SPREADSHEET format definition parsing. Wraps
parse_array_structure() and parse_table_structure().
Source code in pdr/formats/checkers.py
378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 | |
check_special_table_reader(identifiers: DataIdentifiers, name: str, fn: str, fmtdef_dt: tuple[pd.DataFrame, np.dtype], block: MultiDict, start_byte: int)
Preempt loaders.datawrap.ReadTable's dispatch to read_table().
Source code in pdr/formats/checkers.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 | |
check_trivial_case(pointer: str, identifiers: DataIdentifiers, fn: str) -> bool
Supplement generic definition of 'trivial' pointers. Intended primarily to
preempt attempts to load known-unsupported data objects associated with
otherwise-supported products. Called inline by pointer_to_loader().
Source code in pdr/formats/checkers.py
1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 | |
is_trivial(pointer: str) -> bool
Returns True if this is the name of a data object we want to handle trivally, in the sense that we never ever want to load it directly.
Source code in pdr/loaders/utility.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
special_image_constants(identifiers: DataIdentifiers) -> dict[str, int]
Defines 'secret' special constants for a dataset or product type. Called
inline by Data.find_special_constants().
Source code in pdr/formats/checkers.py
1147 1148 1149 1150 1151 1152 1153 1154 1155 | |
specialblock(data: PDRLike, name: str)
Special-purpose wrapper for check_special_block() intended for use outside of the query workflow.
Source code in pdr/formats/checkers.py
804 805 806 807 808 809 810 811 812 | |
formats.cassini
cda_table_filename(data)
HITS: * cassini_cda * cda_area * cda_stat * cda_events * cda_spectra * cda_settings * cda_counter * cda_signals
Source code in pdr/formats/cassini.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 | |
coiss_1006_offset(data, name, identifiers)
Start bytes (given in RECORD_BYTEs) are off by 1 for products from volume coiss_1006. ("Range (SCLK): 1359362956 - 1363539029") Easy to validate: if the TELEMETRY_TABLE's NULL_PADDING column is not 0, then start_byte is off for all that product's pointers except IMAGE_HEADER
HITS: * cassini_iss * calib_evj (partial)
Source code in pdr/formats/cassini.py
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | |
get_offset(filename, identifiers)
HITS: * cassini_hp * ddr * img_table * strip * solar * sun * time * vis
Source code in pdr/formats/cassini.py
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | |
get_position(identifiers, block, target, name, filename, start_byte)
HITS: * cassini_hp * dark * ddr * misc_img_text * img_table * strip * solar * sun * time * vis_extra * vis
Source code in pdr/formats/cassini.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
get_special_qube_band_storage()
HITS: * cassini_uvis * fuv * euv
Source code in pdr/formats/cassini.py
219 220 221 222 223 224 225 226 227 | |
get_structure(block, name, filename, data, identifiers)
the data type that goes here double defines the 32 byte prefix/offset. By skipping the parse_table_structure we never add the prefix bytes so it works as is.
HITS: * cassini_hp * hasi_acc * hasi_ppi * hasi_pwa * hasi_tem * hasi_dpu * hasi_prof
Source code in pdr/formats/cassini.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
iss_cal_trivial_loader(pointer)
A subset of the ISS calibration images (those with "FILE_RECORDS = 1025") appear to not actually have LINE_PREFIX_TABLEs or TELEMETRY_TABLEs
HITS * cassini_iss * calib (partial)
Source code in pdr/formats/cassini.py
292 293 294 295 296 297 298 299 300 301 302 303 304 | |
iss_calib_da_special_block(data, name)
The labels for some Cassini ISS calibration images with a .DA filename extension incorrectly use LINE_PREFIX_BYTES. A subset of calibration images with a .IMG filename extension are formatted like the .DA products, and also incorrectly reference LINE_PREFIX_BYTES
HITS * cassini_iss * calib_da * calib (partial)
Source code in pdr/formats/cassini.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
iss_edr_special_block(data, name)
Some of the ISS EDR and calibration products give their ^STRUCTURE and ^LINE_PREFIX_STRUCTURE filenames in the format: "../../label/prefix3.fmt"
HITS * cassini_iss * edr_sat * edr_evj * calib (partial)
Source code in pdr/formats/cassini.py
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | |
iss_telemetry_bit_col_format(obj, definition)
The format file for Cassini ISS telemetry tables incorrectly uses BIT_DATA_TYPE instead of DATA_TYPE when defining its top-level COLUMN (causing a key error in add_bit_column_info()). It also says the data type is BINARY instead of (presumably) MSB_BIT_STRING.
HITS: * cassini_iss * calib * calib_atm * edr_evj * edr_sat
Source code in pdr/formats/cassini.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | |
line_prefix_sample_type(base_samp_info)
Each time byte order is specified for these products it is LSB. However, for columns whose values can be verified, it is always actually MSB. This special case forces all such types to MSB, and assumes BIT_STRING refers to MSB_BIT_STRING. "N/A" samples are treated as CHARACTER / void.
HITS * cassini_iss * calib * calib_atm * edr_evj * edr_sat
Source code in pdr/formats/cassini.py
307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
looks_like_ascii(data, pointer)
Source code in pdr/formats/cassini.py
89 90 91 92 93 94 95 | |
rpws_ancil_position(identifiers, block, target, name, start_byte)
Most of the labels have the wrong ROWS value. This special case uses the FILE_RECORDS instead.
HITS: * cassini_rwps * ancil_tol
Source code in pdr/formats/cassini.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
spreadsheet_loader(filename, fmtdef_dt, data_set_id)
HITS: * cassini_mimi * edr_lemms (partial) * rdr_chems_avg * rdr_chems_fullres * rdr_inca * rdr_lemms_avg * rdr_lemms_fullres * rdr_ancil * cassini_radar * asum * cassini_rpws * refdr_wbr * refdr_wfr
Source code in pdr/formats/cassini.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
trivial_loader(pointer)
HITS * cassini_iss * calib * edr_evj * edr_sat
Source code in pdr/formats/cassini.py
175 176 177 178 179 180 181 182 183 184 185 186 187 | |
xdr_redirect_to_image_block(data)
HITS: * cassini_hp * img_xdr
Source code in pdr/formats/cassini.py
208 209 210 211 212 213 214 215 216 | |
formats.checkers
This module contains functions that preempt generic metadata- or data-parsing behaviors. They are intended to manage idiosyncracies common to all products of a particular type (or even all products in a whole dataset), including but not limited to:
- Malformatted labels
- Incorrect metadata
- Malformatted data
- Technically correct but extremely unusual data formatting
To put this another way, they facilitate single-dispatch polymorphism on the semantic level of data product type.
Most functions in this file are intended to be applied by func.specialize
as wrappers to other functions, typically a query function in loaders.queries
or the loader_function attribute of a loaders.queries.Loader subclass.
However, this is not strict; they may also wrap functions in other
modules, and functions may call them inline rather than use them as wrappers.
Every function in this module should be named check_special_{something},
where 'something' clearly designates the metadata-parsing or data-loading
behavior it may sometimes preempt.
Most functions in this module should return a tuple whose first element is a
bool and whose second element is the "special" value. If the first element
is True, it means that there is a relevant special case, so the caller
should use the "special" value instead of engaging in its normal behavior; if
it is False, there is no relevant special case and the caller should continue
with its normal behavior. The second element of the tuple should always be
None if the first element is False.
If the function is intended to wrap a generic function, the second element of
this tuple, when not None, must always share the return type of that generic
function. Also, if it is intended for the func.softquery() workflow, it
should follow that workflow's argument naming and type annotation conventions.
Exceptions to these naming and signature conventions can be made for checkers designed specifically to be called inline of a specific handler function.
check_special_bit_column_case(identifiers: Mapping[str, Any]) -> tuple[bool, Optional[str]]
Special case checker used by bit_handling.set_bit_string_data_type()
to preempt generic data type inference.
Source code in pdr/formats/checkers.py
748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 | |
check_special_bit_format(obj: dict, definition: MultiDict, identifiers: DataIdentifiers) -> tuple[bool, Optional[dict]]
Special case checker used by add_bit_column_info() to fix problems in obj
and/or definition caused by mistakes in an external format file. Intended
for cases where check_special_block() doesn't touch the relevant metadata,
and errors are hit before check_special_structure() can be useful.
Source code in pdr/formats/checkers.py
780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 | |
check_special_bit_start_case(identifiers, list_of_pvl_objects_for_bit_columns, start_bit_list) -> tuple[bool, Optional[list[int]]]
Special case checker used by get_bit_start_and_size() to fix incorrectly-defined bit offsets.
Source code in pdr/formats/checkers.py
766 767 768 769 770 771 772 773 774 775 776 777 | |
check_special_block(name: str, data: PDRLike, identifiers: Mapping) -> tuple[bool, Optional[MultiDict]]
specialize() target for queries.get_block(). Intended for cases in
which label pointers don't correspond to label block names AND/OR if a
value within the block needs to be changed before going to other functions.
Source code in pdr/formats/checkers.py
815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 | |
check_special_compressed_file_reader(identifiers: DataIdentifiers, fn: str)
Distribute to correct specialized image loader, otherwise return
False/None. Preempt loaders.datawrap.ReadImage's dispatch to read_image()
Source code in pdr/formats/checkers.py
1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 | |
check_special_fits_start_byte(identifiers: DataIdentifiers, name: str, hdulist: HDUList) -> tuple[bool, Optional[int]]
Preempts generic PDS3 data object -> FITS start byte mapping. Wraps
get_fits_start_byte().
Source code in pdr/formats/checkers.py
1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 | |
check_special_fn(data: PDRLike, object_name: str, identifiers: DataIdentifiers) -> tuple[bool, Optional[str]]
Preempts generic filename specification. Called inline by
Data._object_to_filename().
Source code in pdr/formats/checkers.py
1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 | |
check_special_label(fn: Union[str, Path])
Used primarily to check for labels with known characters invalid in utf-8.
We then read the label with a more correct or lenient encoding. Preempt
loaders.datawrap.ReadLabel's dispatch to read_label(). Also used in
read_pvl().
Source code in pdr/formats/checkers.py
1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 | |
check_special_objects(identifiers: DataIdentifiers)
Check to add objects not correctly ID'd as objects in a label, or remove
objects ID'd in a label. Called inline by _find_objects().
Source code in pdr/formats/checkers.py
1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 | |
check_special_offset(name: str, data: PDRLike, identifiers: DataIdentifiers, fn: str) -> tuple[bool, Optional[int]]
Preempt generic inference of an object's byte offset within a file. Wraps
loaders.queries.data_start_byte().
Source code in pdr/formats/checkers.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
check_special_pds4_cases(structure, filename, object_name)
Load objects from PDS4 files with known issues that do not currently work with pds4_tools. Mostly utilized by datasets not verified by the PDS but that have PDS4 labels (ISRO, ESA, CNSA etc).
Source code in pdr/formats/checkers.py
1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 | |
check_special_position(identifiers: DataIdentifiers, block: MultiDict, target: PhysicalTarget, name: str, fn: str, start_byte: int) -> tuple[bool, Optional[int]]
Preempt generic detection of a table's row or byte offset within a file.
Wraps table_position(). Used for table-specific cases that are partially
but not wholly handled by data_start_byte(), so should not be defined
in check_special_offset().
Source code in pdr/formats/checkers.py
577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 | |
check_special_qube_band_storage(identifiers: DataIdentifiers)
Defines band storage types for QUBE procuts whose labels do not correctly
specify them. Wraps get_qube_band_storage_type().
Source code in pdr/formats/checkers.py
1216 1217 1218 1219 1220 1221 1222 1223 | |
check_special_sample_type(identifiers: DataIdentifiers, base_samp_info: dict) -> tuple[bool, Optional[str]]
Preempt generic mapping of PDS3 data types to numpy dtype strings. Wraps
image_sample_type(); called inline by insert_sample_types_into_df().
Source code in pdr/formats/checkers.py
704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 | |
check_special_structure(name: str, block: MultiDict, fn: str, data: PDRLike, identifiers: DataIdentifiers) -> tuple[bool, Optional[tuple[pd.DataFrame, Optional[np.dtype]]]]
Preempt generic ARRAY/TABLE/SPREADSHEET format definition parsing. Wraps
parse_array_structure() and parse_table_structure().
Source code in pdr/formats/checkers.py
378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 | |
check_special_table_reader(identifiers: DataIdentifiers, name: str, fn: str, fmtdef_dt: tuple[pd.DataFrame, np.dtype], block: MultiDict, start_byte: int)
Preempt loaders.datawrap.ReadTable's dispatch to read_table().
Source code in pdr/formats/checkers.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 | |
check_trivial_case(pointer: str, identifiers: DataIdentifiers, fn: str) -> bool
Supplement generic definition of 'trivial' pointers. Intended primarily to
preempt attempts to load known-unsupported data objects associated with
otherwise-supported products. Called inline by pointer_to_loader().
Source code in pdr/formats/checkers.py
1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 | |
special_image_constants(identifiers: DataIdentifiers) -> dict[str, int]
Defines 'secret' special constants for a dataset or product type. Called
inline by Data.find_special_constants().
Source code in pdr/formats/checkers.py
1147 1148 1149 1150 1151 1152 1153 1154 1155 | |
specialblock(data: PDRLike, name: str)
Special-purpose wrapper for check_special_block() intended for use outside of the query workflow.
Source code in pdr/formats/checkers.py
804 805 806 807 808 809 810 811 812 | |
formats.clementine
get_fn(data, object_name)
HITS * clem_GEO * bsr_rdr_data
Source code in pdr/formats/clementine.py
18 19 20 21 22 23 24 25 | |
get_offset(data, pointer)
HITS * clem_GEO * bsr_rdr_data
Source code in pdr/formats/clementine.py
8 9 10 11 12 13 14 15 | |
get_structure(block, name, filename, data, identifiers)
HITS: * clem_GEO * bsr_rdr_data
Source code in pdr/formats/clementine.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
formats.dawn
DoesNotExistError
Bases: Exception
Source code in pdr/formats/dawn.py
3 4 5 | |
dawn_history_hdu_exception()
filter out spurious HISTORY pointer
HITS * dawn * fc_edr_fit * fc_rdr_fit
Source code in pdr/formats/dawn.py
8 9 10 11 12 13 14 15 16 17 18 19 | |
formats.diviner
diviner_l4_table_loader(fmtdef_dt, filename)
because these can contain the value "NaN", combined with the fact that they are space-padded, pd.read_csv sometimes casts some columns to object, turning some of their values into strings and some into float, throwing warnings and making it obnoxious to work with them (users will randomly not be able to, e.g., add two columns together without a data cleaning step).
HITS * diviner * l4
Source code in pdr/formats/diviner.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
formats.epoxi
cart_model_get_position(identifiers, block, target, name, start_byte)
The cartesian shape model's RECORD_BYTES and all three of the tables' ROW_BYTES should be 79 but the label lists them as 80.
HITS * epoxi * shape
Source code in pdr/formats/epoxi.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
hriv_deconv_mask_start_byte(name, hdulist)
The EPOXI HRIV deconvolved radiance files have incorrect start byte specifications for the MASK HDU.
HITS * epoxi * hriv_deconvolved
Source code in pdr/formats/epoxi.py
22 23 24 25 26 27 28 29 30 31 32 33 | |
formats.galileo
epd_special_block(data, name)
All 'E1' EPD SUMM products incorrectly say ROW_BYTES = 90; changing them to the RECORD_BYTES values.
HITS * gal_particles * epd_summ (partial)
Source code in pdr/formats/galileo.py
97 98 99 100 101 102 103 104 105 106 107 108 | |
epd_structure(block, name, filename, data, identifiers)
E1PAD_7.TAB has an extra/unaccounted for byte at the start of each row
HITS * gal_particles * epd_samp (partial)
Source code in pdr/formats/galileo.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
galileo_table_loader()
Source code in pdr/formats/galileo.py
28 29 30 31 | |
mdis_fits_start_byte(name: str, hdulist: HDUList) -> int
The MDIS cal labels do not include accurate offsets for data objects. (There's also an additional HDU they don't label as a PDS object at all!)
HITS * messenger_grnd_cal * mdis
Source code in pdr/formats/galileo.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
nims_edr_sample_type(base_samp_info)
Each time byte order is specified for these products it is LSB, so this assumes BIT_STRING refers to LSB_BIT_STRING. N/A samples are read as CHARACTER
HITS * gal_nims * pre_jup
Source code in pdr/formats/galileo.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
nims_sample_spectral_qube_trivial_loader()
HITS * gal_nims *cube
Source code in pdr/formats/galileo.py
171 172 173 174 175 176 177 178 179 | |
probe_structure(block, name, filename, data, identifiers)
Several NMS products have an incorrect BYTES value in one column. One ASI product has incorrect BYTES values in multiple columns
HITS * gal_probe * asi * nms
Source code in pdr/formats/galileo.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
pws_special_block(data, name)
The PWS SUMM products sometimes undercount ROW_BYTES by 2
HITS * gal_plasma * pws_summ * vg_pws * jup_summ * sat_summ * sys_summ_vg1 * sys_summ_vg2 * sys_ancillary * ur_rdr_bin * ur_rdr_asc * ur_summ_bin * ur_summ_asc * newp_summ_bin * nep_summ_asc
Source code in pdr/formats/galileo.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
pws_table_loader(filename, fmtdef_dt)
HITS * gal_plasma * pws_ddr
Source code in pdr/formats/galileo.py
156 157 158 159 160 161 162 163 164 165 166 167 168 | |
ssi_cubes_header_loader()
The Ida and Gaspra cubes have HEADER pointers but no defined HEADER objects
HITS * gal_ssi * sb_cube
Source code in pdr/formats/galileo.py
34 35 36 37 38 39 40 41 42 43 | |
ssi_prefix_block(data, name)
These are binary tables, but the format file has one column with "DATA_TYPE = ASCII_REAL". This special case changes it to CHARACTER because the column's DESCRIPTION calls it a "Real number represented as an ascii string in the form 123.12"
HITS * gal_ssi * redr_late
Source code in pdr/formats/galileo.py
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 | |
ssi_redr_bit_col_format(definition)
Some of the bit columns defined in the Galileo SSI telemetry and line prefix table format files have multiple items, but their ITEM_BITS are mislabled as BITS.
HITS: * gal_ssi * redr_early * redr_mid * redr_late * sl9_jupiter_impact * go_ssi
Source code in pdr/formats/galileo.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | |
ssi_redr_prefix_fn(data)
For the early-mission (volumes go_0002-go_0006) SSI REDR line prefix tables. Calling pdr.read() on the .lbl file instead of the .img outputs a different table; it tries to populate with data from the label. TODO: Keep an eye out for more under specified line prefix tables with this issue, in case it is more comman than just a few special cases
HITS * gal_ssi * redr_early
Source code in pdr/formats/galileo.py
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 | |
ssi_redr_structure(block, name, filename, data, identifiers)
Similar to the ssi_redr_bit_col_format() special case above. Columns with multiple ITEMS in the telemetry and line prefix table format files define BYTES but leave out ITEM_BYTES.
HITS * gal_ssi * redr_early * redr_mid * redr_late * sl9_jupiter_impact * go_ssi
Source code in pdr/formats/galileo.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
formats.ground
ebrocc_geom_get_position(identifiers, block, target, name, start_byte)
ROW_BYTES = 45 in the labels, but it should be 47
HITS * ground_based * ring_occ_1989_geometry
Source code in pdr/formats/ground.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
mssso_cal_start_byte(name, hdulist)
A small subset of MSSSO CASPIR calibration images have the wrong start byte for the IMAGE pointer in their PDS3 labels
HITS * sl9_jupiter_impact * mssso_cal
Source code in pdr/formats/ground.py
4 5 6 7 8 9 10 11 12 13 14 15 | |
trivial_header_loader()
The HEADER pointer is just the SPREADSHEET table's header row, and it does not open because "BYTES = UNK"
HITS * apollo * BUG
Source code in pdr/formats/ground.py
55 56 57 58 59 60 61 62 63 64 65 66 67 | |
wff_atm_special_block(data, name)
One WFF/ATM DEM image opens fine (BBMESA2X2), the other two (SCHOONER2X2 and SEDAN2X2) have their LINES and LINE_SAMPLES values backwards.
HITS * wff_atm * dem_img
Source code in pdr/formats/ground.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
formats.ihw
add_newlines_table_loader(fmtdef_dt, block, filename, start_byte)
Some Halley V1.0 tables (MSN, PPN, and IRSN datasets) are missing newline characters between rows. (Also applies to some ICE ephemeris tables)
HITS * ihw * ms_radar * ms_vis * ice * ephem_tbl
Source code in pdr/formats/ihw.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
curve_table_loader(filename, fmtdef_dt)
The labels do not always count column bytes correctly.
HITS * ihw_isrn * curve
Source code in pdr/formats/ihw.py
3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
get_special_block(data, name)
A handful of MSN Radar tables have column names that were not reading correctly and were ending up as "NaN". Which also caused an AttributeError when running ix check.
HITS * ihw * ms_radar
Source code in pdr/formats/ihw.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
get_structure(block, name, filename, data, identifiers)
SSN products with a SPECTRUM pointer were opening with an incorrect column name.
HITS * ihw * spec_hal_cal
Source code in pdr/formats/ihw.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
formats.juno
bit_start_find_and_fix(list_of_pvl_objects_for_bit_columns, start_bit_list)
HITS * juno_jiram * LOG_IMG_RDR * LOG_SPE_RDR * LOG_IMG_EDR * LOG_SPE_EDR * mgs_tes * ATM * BOL * OBS * RAD_tab * pvo * pos_sedr
Source code in pdr/formats/juno.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
jiram_rdr_sample_type()
JIRAM RDRs, both images and tables, are labeled as MSB but are actually LSB.
HITS * juno_jiram * IMG_RDR * SPE_RDR
Source code in pdr/formats/juno.py
3 4 5 6 7 8 9 10 11 12 13 | |
uvs_edr_start_byte(name, hdul)
Sometimes, the start byte is incorrectly recorded in the PDS3 labels (It is always wrong in the PDS4 labels. We do not have a "check" for that yet, so I recommend using the PDS3 labels). Here we use the FITS index defined by the mission for each object to look up the correct start_byte in the HDU fileinfo.
This won't work if HDUs are missing etc, but I have not encountered that.
HITS * juno_uvs * EDR
Source code in pdr/formats/juno.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
uvs_rdr_start_byte(name, hdul)
Sometimes, the start byte is incorrectly recorded in the PDS3 labels (It is always wrong in the PDS4 labels. We do not have a "check" for that yet, so I recommend using the PDS3 labels). Here we use the FITS index defined by the mission for each object to look up the correct start_byte in the HDU fileinfo.
This won't work if HDUs are missing etc, but I have not encountered that.
HITS * juno_uvs * RDR
Source code in pdr/formats/juno.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
waves_burst_fix_table_names(data, name)
WAVES burst files that include frequency offset tables have mismatched pointer/object names.
HITS * juno_waves * CDR_BURST
Source code in pdr/formats/juno.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
formats.lro
DoesNotExistError
Bases: Exception
Source code in pdr/formats/lro.py
132 133 134 | |
crater_bit_col_sample_type(base_samp_info)
HITS * lro_crater * edr_sec * edr_hk
Source code in pdr/formats/lro.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
get_crater_offset()
lro crater edr products have a header table with 64 bytes per row, the second table start byte is given in rows (also the wrong row) but had a different number of row bytes
HITS * lro_crater * edr_sec * edr_hk
Source code in pdr/formats/lro.py
23 24 25 26 27 28 29 30 31 32 33 34 | |
lamp_edr_hdu_exceptions(name, hdulist)
Sometimes all the LAMP EDR table pointers exist, sometimes they aren't actually there.
HITS * lro_lamp * edr
Source code in pdr/formats/lro.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | |
lamp_rdr_hdu_start_byte(name, hdulist)
This special case raises an error if a pointer's data doesn't actually exist, and returns the correct start byte if it does.
HITS * lro_lamp * rdr
Source code in pdr/formats/lro.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | |
lamp_rdr_histogram_image_loader(data)
Products can have multiple unique pointers that are defined by a single image object (CAL_HISTOGRAM_DATA_IMAGE).
Source code in pdr/formats/lro.py
15 16 17 18 19 20 | |
mini_rf_image_loader(data, name)
one of the mosaic labels has the wrong values for lines/line_samples
HITS * lro_mini_rf * mosaic
Source code in pdr/formats/lro.py
77 78 79 80 81 82 83 84 85 86 87 88 | |
mini_rf_spreadsheet_loader(filename, fmtdef_dt)
Mini-RF housekeeping CSVs have variable-width columns but the labels treat them as fixed-width.
HITS * lro_mini_rf * housekeeping
Source code in pdr/formats/lro.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
rss_get_position(identifiers, block, target, name, start_byte)
The RSS WEA products' WEAREC_TABLE undercounts ROW_BYTES by 1
HITS * lro_rss * wea
Source code in pdr/formats/lro.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
wea_table_loader(filename, fmtdef_dt)
Some, but not all, wea files have more bytes than the labels define per row.
HITS * lro_rss * wea
Source code in pdr/formats/lro.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
formats.lroc
lroc_edr_sample_type()
LROC EDRs specify signed integers but appear to be unsigned.
HITS * lroc * NAC_EDR * WAC_EDR
Source code in pdr/formats/lroc.py
3 4 5 6 7 8 9 10 11 12 | |
formats.mariner
get_special_block(data, name)
Mariner 9 IRIS tables have 316 ROW_PREFIX_BYTES followed by 1 column with 1500 ITEMS. The column's START_BYTE = 317, but it should be 1.
HITS * mariner * iris
Source code in pdr/formats/mariner.py
3 4 5 6 7 8 9 10 11 12 13 14 | |
formats.mer
rss_spreadsheet_loader(filename, fmtdef_dt)
The RSS UHFD labels have the wrong ROWS value for most products.
HITS * mer_rss *uhfd
Source code in pdr/formats/mer.py
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
formats.mex
aspera_ima_ddr_structure(block, name, filename, data, identifiers)
The ASPERA IMA DDR table opens correctly as written in its label, but the BYTES values for columns 3 and 4 are wrong.
HITS * mex_aspera * ima_ddr
Source code in pdr/formats/mex.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
aspera_table_loader(filename, fmtdef_dt)
The ASPERA IMA EDRs are ascii csv tables containing 2 data types: SENSOR and MODE. The VALUES column is repeated and has 96 items total. In the MODE rows only the first VALUES item contains data, and should be followed by 95 'missing' items. In reality these rows have 96 empty/missing items because of an extra comma. This special case cuts off the extra column during the pd.read_csv() call.
HITS * mex_aspera * ima
Source code in pdr/formats/mex.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
marsis_get_position(identifiers, block, target, name, start_byte)
HITS * mex_marsis * TEC_EDR
Source code in pdr/formats/mex.py
6 7 8 9 10 11 12 13 14 15 16 | |
mrs_ddr_atmo_position(identifiers, block, target, name, start_byte)
The MRS derived atmosphere profiles were opening with data cut off at the ends of the tables. Recalculating the table length with ROW_BYTES = 278 instead of 276 fixes it.
HITS * mex_mrs * occ_atmo
Source code in pdr/formats/mex.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
mrs_get_position(identifiers, block, target, name, start_byte)
MRS ICL level 1b DOPPLER_TABLEs and ODF level 2 RANGING_TABLEs undercount ROW_BYTES by 1.
HITS * mex_mrs * lvl_1b_icl (partial) * lvl_2_odf (partial)
Source code in pdr/formats/mex.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
mrs_l1b_odf_rmp_redirect(data)
RMP tables are a subset of MRS level 1b ODFs that were not opening because their pointer and object names do not match.
HITS: * mex_mrs * lvl_1b_odf (partial)
Source code in pdr/formats/mex.py
148 149 150 151 152 153 154 155 156 157 158 159 | |
mrs_l1b_odf_table_loader(filename, fmtdef_dt)
MRS level 1b ODF labels have variable and sometimes incorrect ROW_BYTES values.
HITS * mex_mrs * lvl_1b_odf
Source code in pdr/formats/mex.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
pfs_edr_special_block(data, name)
The PFS EDRs have a few errors in their labels prior to orbit 8945, after which they are corrected.
HITS * mex_marsis * raw_lwc * raw_swc * cal_lwc * cal_swc * hk_early_mission * orb001_lwc * orb001_swc
Source code in pdr/formats/mex.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
vmc_rdr_hdu_selection(name, hdulist)
The VMC RDRs have 1 IMAGE pointer and 2 IMAGE objects. From the volume's readme: "The first layer includes the calibrated values, and the second layer includes the raw values." It is unclear whether or not the 'second layer' is a copy of the EDR image or if intermediate calibration steps have been applied to it. Assuming the single band image is akin to the EDRs, this special case returns the multiband calibrated image.
HITS * mex_vmc * rdr
Source code in pdr/formats/mex.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
formats.mgn
geom_table_loader(filename, fmtdef_dt)
The Magellan radar system geometry tables include null bytes between rows.
HITS * gal_nims * impact * mgn_image * midr_tables
Source code in pdr/formats/mgn.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
get_fn(data)
HITS * mgn_post_mission * fmap * fmap_browse
Source code in pdr/formats/mgn.py
44 45 46 47 48 49 50 51 52 | |
occultation_loader(identifiers, fmtdef_dt, block, filename)
Checks end of each row for newline character. If missing, removes extraneous newline from middle of the row and adjusts for the extra byte. Adapted from _interpret_as_ascii()
HITS * mgn_occult * ddr
Source code in pdr/formats/mgn.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
orbit_table_in_img_loader()
HITS * mgn_post_mission * fmap * fmap_browse
Source code in pdr/formats/mgn.py
34 35 36 37 38 39 40 41 | |
formats.mgs
get_ecs_structure(block, name, filename, data, identifiers)
HITS * mgs_rss_raw * ecs
Source code in pdr/formats/mgs.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
get_odf_structure(block, name, filename, data, identifiers)
Source code in pdr/formats/mgs.py
6 7 8 9 10 11 12 13 14 15 16 | |
mola_pedr_special_block(data, name, identifiers)
Fix for FILE_RECORDS = "UNK" and ROWS = "UNK" in the MOLA PEDR labels. This special case calculates ROWS using the count_from_bottom_of_file() logic in reverse.
HITS * mgs_mola * pedr * mgs_sampler * pedr
Source code in pdr/formats/mgs.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
formats.mro
ancil_table_loader(filename, fmtdef_dt)
In the CRISM ancillary OBS tables, missing values are variations of "N/A", which causes mixed dtype warnings when the first row contains N/A's.
HITS * crism * extras_obs
Source code in pdr/formats/mro.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
crism_mrdr_ancill_position(identifiers, block, target, name, start_byte)
ROW_BYTES = 14 in the labels, but it should be 16 (the RECORD_BYTES)
HITS * crism * ancil_mrdr
Source code in pdr/formats/mro.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
get_structure(block, name, filename, data, identifiers)
The first column in the MCS (EDR/RDR/DDR) format files are just named "1" which is being read as 'int'. This was causing problems in read_table during the table.drop call
HITS * mro * mcs_edr * mcs_rdr
Source code in pdr/formats/mro.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
mcs_ddr_oldformat_trivial()
These files are outdated and have formatting issues that make the current table reader (mcs_ddr_table_loader below) not work.
The tables can be sometimes loaded by mcs_ddr_table_loader if you subtract two from the start_byte, but this has not been exhaustively tested.
HITS: * mro * mcs_ddr_v1
Source code in pdr/formats/mro.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
mcs_ddr_table_loader(block, filename, start_byte)
The newer (V6.0 and above) DDR files can be opened into a dataframe with some massaging. The dataset records have a metadata block (described by MCS_DDR1.FMT) followed by 105 lines of data (each described by MCS_DDR2.FMT, the 105 is "repetitions" in the label). This continues until the end of the file.
For the purposes of outputting a single table, the metadata block info is added to each row of 105 data rows that follow it. So per record block, 105 lines are added to the dataframe. This is because the metadata and data rows have different columns, so they can't be in the same table as alternating rows as in the .tab file structure.
The MCS DDR V1.0, which is now out of date at the node, doesn't quite work with this code. If the start_byte is set to 2888 it seems to work on a few cases, but this has not fully been tested via ix.
HITS: * mro * mcs_ddr
Source code in pdr/formats/mro.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
formats.msl_apxs
table_loader(pointer)
we don't support these right now, or maybe ever
HITS * msl_apxs * APXS_SCIENCE_EDR
Source code in pdr/formats/msl_apxs.py
6 7 8 9 10 11 12 13 14 15 16 17 | |
trivial_header_loader()
The HEADER pointer is just the SPREADSHEET table's header row, and it does not open because "BYTES = UNK"
HITS * msl_apxs * APXS_OXIDE_RDR * APXS_SPECTRUM_RDR
Source code in pdr/formats/msl_apxs.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 | |
formats.msl_ccam
image_reply_table_loader()
HITS * msl_ccam * CCAM_RMI_EDR
Source code in pdr/formats/msl_ccam.py
6 7 8 9 10 11 12 13 14 15 16 | |
formats.msl_cmn
fix_mangled_name(data)
HITS * msl_cmn * HOUSEKEEPING
Source code in pdr/formats/msl_cmn.py
44 45 46 47 48 49 50 51 52 | |
get_offset(object_name)
incorrectly specifies object length rather than start byte
HITS * msl_cmn * DIFFRACTION_ALL_RDR * ENERGY_SINGLE_RDR * MINERAL_TABLES * CCD_FRAME * DIFFRACTION_SINGLE * DIFFRACTION_SPLIT * DIFFRACTION_ALL * ENERGY_ALL * ENERGY_SINGLE * ENERGY_SPLIT * HOUSKEEPING * TRANSMIT_RAW
Source code in pdr/formats/msl_cmn.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
spreadsheet_loader(filename)
HITS * msl_cmn * DIFFRACTION_ALL_RDR * ENERGY_SINGLE_RDR * MINERAL_TABLES * msl_sam * l0_qms * l1a_qms * l1b_qms
Source code in pdr/formats/msl_cmn.py
3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
trivial_header_loader()
HITS * msl_cmn * DIFFRACTION_ALL_RDR * ENERGY_SINGLE_RDR * MINERAL_TABLES * msl_sam * l0_hk * l0_qms * l0_gc * l0_tls * l1a_hk * l1a_qms * l1a_gc * l1a_tls * l1b_qms * l1b_gc * l2_qms * l2_gc * l2_tls
Source code in pdr/formats/msl_cmn.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
formats.msl_places
spreadsheet_loader(filename, fmtdef_dt)
HITS * msl_places * localizations
Source code in pdr/formats/msl_places.py
3 4 5 6 7 8 9 10 11 12 13 14 15 | |
formats.msl_rems
edr_offset(data, name)
HITS: * msl_rems * edr_HSDEF # edr_HSREG
Source code in pdr/formats/msl_rems.py
29 30 31 32 33 34 35 36 37 | |
edr_table_loader(filename, fmtdef_dt, block, start_byte)
The ROW_SUFFIX_BYTES are either miscounted by a few bytes, or we don't handle them correctly. There appears to be a related issue with the tables' start bytes as well. This special case bypasses both issues.
HITS * msl_rems * edr_SP
Source code in pdr/formats/msl_rems.py
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
rdr_table_loader(filename, fmtdef_dt)
Missing values are variations of "UNK" and "NULL", which cause mixed dtype warnings when using the default pd.read_csv() parameters.
HITS * msl_rems * rdr_rmd * rdr_rnv * rdr_rtl
Source code in pdr/formats/msl_rems.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
formats.nh
get_fn(data)
The PEPSSI DDRs have an extra space at the start of the SPREADSHEET pointer's filename that causes 'file not found' errors.
HITS * nh_derived * atmos_comp * nh_pepssi * flux_resampled
Source code in pdr/formats/nh.py
6 7 8 9 10 11 12 13 14 15 16 17 18 | |
formats.odyssey
grs_e_kernel_loader(name, fn)
The GRS Experimenter's Notebook products have two "FILE" objects with one "TIME_SERIES" pointer each. The first object/pointer is for the time series table, the other is for a .TXT notes file. Because the text file's pointer has "SERIES" in it, pointer_to_loader() sends it to ReadTable().
This special case reads it with read_text() instead.
HITS * mars_odyssey * edr_e_kernel
Source code in pdr/formats/odyssey.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
grs_e_kernel_structure()
Handles the same files as grs_e_kernel_loader() above, and is needed to avoid an error thrown before that special case can be called. Because the second TIME_SERIES pointer is not actually a table, parse_table_structure() fails when trying to make a fmtdef.
HITS * mars_odyssey * edr_e_kernel
Source code in pdr/formats/odyssey.py
37 38 39 40 41 42 43 44 45 46 47 48 | |
map_table_loader(filename, fmtdef_dt)
A few products open fine from their labels, but most do not. Seems like a byte counting issue in the labels.
HITS * mars_odyssey * maps
Source code in pdr/formats/odyssey.py
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
formats.phoenix
afm_rdr_structure(block, name, filename, data, identifiers)
AFM RDR header tables: Several columns' NAME fields start with lowercase letters, which is_an_assignment_line() in /parselabel/pds3.py evaluates as NOT an assignment statement.
HITS * phoenix * afm_rdr
Source code in pdr/formats/phoenix.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
afm_table_loader(filename, fmtdef_dt, name)
AFM RDR tables: Several labels miscount bytes somewhere in the tables
HITS * phoenix * afm_rdr
Source code in pdr/formats/phoenix.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
elec_em6_structure(block, name, filename, data, identifiers)
ELEC EDR em6/TBL tables: All the START_BYTEs in TBL_0_STATE_DATA.FMT are off by 36 bytes.
HITS * phoenix * elec_edr (partial)
Source code in pdr/formats/phoenix.py
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
led_edr_structure(block, name, filename, data, identifiers)
TEGA_LED.FMT: the CONTAINER's REPETITIONS should be 1000, not 1010
HITS * phoenix * lededr
Source code in pdr/formats/phoenix.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
phxao_header_position(identifiers, block, target, name, start_byte)
PHXAO tables: Some table headers have lost trailing whitespace assumed to be present by the label. Treat as newline-delimited instead; the record count is correct.
HITS * phoenix * atm_phxao
Source code in pdr/formats/phoenix.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
phxao_table_offset(filename, identifiers)
PHXAO tables: Some table headers have lost trailing whitespace assumed to be present by the label. Recalculate the table offset assuming that the table itself is still fixed-width.
HITS * phoenix * atm_phxao
Source code in pdr/formats/phoenix.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
sc_rdr_structure(block, name, filename, data, identifiers)
TEGA_SCRDR.FMT: most of the START_BYTEs are off by 4 because column 2 ("TEGA_TIME") is actually 8 bytes, not 4
HITS * phoenix * scrdr
Source code in pdr/formats/phoenix.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | |
wcl_edr_special_block(data, name)
WCL EDR ema/emb/emc tables: the START_BYTE for columns 13 and 14 are off by 1 and 2 bytes respectively. (The em8/em9/emf tables are fine.)
HITS * phoenix * wcl_edr (partial)
Source code in pdr/formats/phoenix.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
wcl_rdr_offset(data, name)
WCL RDR CP/CV tables: in the labels, each pointer's start byte is
missing '
Source code in pdr/formats/phoenix.py
144 145 146 147 148 149 150 | |
formats.pvo
oims_12s_loader(data, name)
OIMS 12 second averages: all labels say 'ROWS = 42' regardless of the data's actual length
HITS * pvo * oims_12s
Source code in pdr/formats/pvo.py
19 20 21 22 23 24 25 26 27 28 29 30 | |
orpa_low_res_loader(data, name)
ORPA low resolution: labels for earlier orbits have the correct ROW_BYTES, but there is a typo introduced later that says 'ROW_BYTES = 241' instead of 243
HITS * pvo * orpa_lowres
Source code in pdr/formats/pvo.py
4 5 6 7 8 9 10 11 12 13 14 15 16 | |
formats.rosetta
fix_pad_length_structure(block, name, filename, data, identifiers)
The MIDAS FSC tables and several CONSERT ptypes have ROW_PREFIX_BYTES, ROW_SUFFIX_BYTES, and a COLUMN with multiple ITEMS. compute_offsets() calculates the wrong end_byte and pad_length values from the BYTES and ROW_BYTES values in their labels.
HITS * rosetta_consert * l2_land * l2_orbit * l3_land * l3_land_fss * l3_orbit * l3_orbit_fss * l4_land * l4_orbit * l4_orbit_grnd * rosetta_dust * RDR_midas_fsc
Source code in pdr/formats/rosetta.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
midas_rdr_sps_structure(block, name, filename, data, identifiers)
SPS TIME_SERIES tables are made up of a repeated container with 4 columns
followed by a non-repeated checksum column. In compute_offsets() the
block_names list ends up out of order, so SB_OFFSET is not calculated
correctly for columns in the repeated CONTAINER.
TODO: This seems like a more general issue with how compute_offsets() handles a repeated container followed by a single column
HITS * rosetta_dust * RDR_midas_sps
Source code in pdr/formats/rosetta.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
rosetta_table_loader(filename, fmtdef_dt)
HITS * rosetta_rpc * RPCMIP
Source code in pdr/formats/rosetta.py
3 4 5 6 7 8 9 10 11 12 13 14 | |
formats.saturn_rpx
rpx_img_hdu_start_byte(name, hdulist)
The multiple *_IMAGE pointers in these files all point at the same FITS HDU (each pointer illegally represents one band of the image).
HITS * saturn_rpx * hst_raw_img * hst_raw_mask * hst_cal_img * hst_cal_mask * hst_eng_data * hst_eng_mask
Source code in pdr/formats/saturn_rpx.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
formats.themis
check_gzip_fn(data, object_name)
Some THEMIS QUBEs are stored in gzipped formats. The labels do not always bother to mention this.
HITS * themis * BTR * ABR * PBT_v1 * PBT_v2 * ALB_v2 * ir_GEO_v2 * vis_GEO_v2 * ir_EDR * vis_EDR * vis_RDR
Source code in pdr/formats/themis.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
get_qube_offset(data)
some THEMIS QUBEs mis-specify file records.
HITS * themis * ir_GEO_v2 * vis_GEO_v2
Source code in pdr/formats/themis.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
get_visgeo_qube_offset(data)
Source code in pdr/formats/themis.py
11 12 13 | |
trivial_themis_geo_loader(pointer)
HITS * themis * ir_GEO_v2 * vis_GEO_v2
Source code in pdr/formats/themis.py
16 17 18 19 20 21 22 23 24 | |
formats.ulysses
gas_table_loader(filename, fmtdef_dt)
GASDATA.FMT has the wrong START_BYTE for columns in the container. After manually changing the labels during testing, START_BYTE was still not incrementing correctly with each repetition of the container. This fixes both issues with 1 special case.
HITS * ulysses * gas
Source code in pdr/formats/ulysses.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
get_sample_type(base_samp_info)
The bit column's data_type is BIT_STRING, which throws errors. Guessing this should be MSB_BIT_STRING. The tables look correct when compared to their ASCII versions.
HITS * ulysses * epac_pha_bin
Source code in pdr/formats/ulysses.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
get_special_block(data, name, identifiers)
START_BYTE is wrong for repeated columns within the container. ITEM_BYTES is also off by 1.
HITS * ulysses * epac_all_chan * epac_omni_ele * epac_omni_pro * epac_pha_asc * epac_pha_bin * epac_prtl * epac_pstl
Source code in pdr/formats/ulysses.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
formats.vega
fix_array_structure(name, block, fn, data, identifiers)
HITS
- giotto
- pia
- vega
- puma_mode
Source code in pdr/formats/vega.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
get_structure(block, name, filename, data, identifiers)
"Encounter data" tables miscount the last column's START_BYTE by 1
HITS * vega * ducma
Source code in pdr/formats/vega.py
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
formats.viking
seis_table_loader(filepath, fmtdef_dt)
The Viking 2 seismometer tables have mangled labels. The raw data tables are variable length CSVs, and labels for the summary tables count column bytes wrong. Half the labels define columns that do not match the data.
HITS * viking * seis_raw * seis_summary
Source code in pdr/formats/viking.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
formats.voyager
get_fn(data)
Some of the PPS Jitter tables' SERIES pointers have the wrong filename.
HITS: * vg_ring_profiles * pps_jitter
Source code in pdr/formats/voyager.py
157 158 159 160 161 162 163 164 165 166 167 168 | |
get_structure(block, name, filename, data, identifiers)
The VGR_PLS_HR_2017.FMT for PLS 1-hour averages undercounts the last column by 1 byte.
HITS * vg_pls * sys_1hr_avg (partial)
Source code in pdr/formats/voyager.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | |
lecp_table_loader(filename, fmtdef_dt)
VG1 LECP Jupiter SUMM Sector tables reference a format file with incorrect START_BYTEs for columns within a CONTAINER. Columns are consistently separated by whitespace. The VG2 Uranus 12.8 minute step table (ascii version) was missing values from some rows, not sure why. Reusing this special case fixes it.
HITS vg_lecp * j_summ_sector_vg1 * u_rdr_step_12.8 (partial)
Source code in pdr/formats/voyager.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
lecp_vg1_sat_table_loader(filename, fmtdef_dt)
VG1 Saturn RDR step products have an extra header row partway through their tables. This special case skips those rows by treating them as comments. PDS volume affected: VG1-S-LECP-3-RDR-STEP-6MIN-V1.0
HITS vg_lecp * s_rdr_step (partial)
Source code in pdr/formats/voyager.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | |
mag_special_block(data, name)
ROW_BYTES are listed as 144 in the labels for Uranus and Neptune MAG RDRs. Their tables look the same, but the Neptune products open wrong. Setting ROW_BYTES to 145 fixes it.
HITS * vg_mag * rdr_nep
Source code in pdr/formats/voyager.py
4 5 6 7 8 9 10 11 12 13 14 15 16 | |
pls_avg_special_block(data, name)
Because VGR_PLS_HR_2017.FMT undercounts by 1 byte, the products that reference it also undercount their ROW_BYTES by 1.
HITS * vg_pls * sys_1hr_avg
Source code in pdr/formats/voyager.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 | |
pls_fine_special_block(data, name)
Most of the PLS FINE RES labels undercount the ROW_BYTES. The most recent product (2007-241_2018-309) is formatted differently and opens correctly.
HITS * vg_pls * sys_fine_res
Source code in pdr/formats/voyager.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |
pls_ionbr_special_block(data, name)
SUMRY.LBL references the wrong format file
HITS * vg_pls * ur_ionbr (partial)
Source code in pdr/formats/voyager.py
68 69 70 71 72 73 74 75 76 77 78 | |
pra_special_block(data, name, identifiers)
PRA Lowband RDRs: The Jupiter labels use the wrong START_BYTE for columns in containers. The Saturn/Uranus/Neptune labels define columns with multiple ITEMS, but ITEM_BYTES is missing and the BYTES value is wrong.
HITS * vg_pra * lowband_jup * lowband_other
Source code in pdr/formats/voyager.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
func
call_kwargfiltered(func: Callable, *args, **kwargs) -> Any
call a function, filtering out any keyword arguments it doesn't actually accept. intended to help unify signatures to call functions in a dispatched or sequenced fashion. NOTE: will not fix attempts to pass positional-only arguments by name.
Source code in pdr/func.py
86 87 88 89 90 91 92 93 94 | |
filterkwargs(func: Callable, kwargdict: Mapping[str, Any]) -> dict[str, Any]
return a copy of kwargdict, discarding all keys that are not argument names of func.
Source code in pdr/func.py
76 77 78 79 80 81 82 83 | |
get_all_argnames(*funcs: Callable, nonoptional=False) -> set[str]
return all parameter names found in the signatures of funcs. if nonoptional is True, don't include parameters marked as optional according to the conventions of this module, meaning that any of the following are true:
1. string representation of their annotation begins with "Optional"
2. string representation of their annotation ends with "| None"
or begins with "None |"
3. they are named _ or __
Source code in pdr/func.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
get_argnames(func: Callable) -> set[str]
return names of all parameters of a function
Source code in pdr/func.py
15 16 17 | |
get_non_optional_argnames(func: Callable) -> set[str]
determine names of arguments a function must receive by filtering out arguments explicitly annotated as Optional or named "" or "__". Note that "nonoptional" here describes a _convention of this module, not a Python typing requirement.
Source code in pdr/func.py
48 49 50 51 52 53 54 55 56 | |
not_optional(param: Parameter) -> bool
is this Parameter flagged as not required according to the conventions of this module?
Source code in pdr/func.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | |
paramsort(params: Collection[Parameter]) -> list[Parameter]
sorts signature parameters into legal order
Source code in pdr/func.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
sig_union(*funcs: Callable) -> Signature
examine multiple functions and produce a Signature object describing the union of the parameters of all functions -- i.e., the expected signature of a function that routes all its arguments to the appropriate elements of funcs and calls them in a dispatched, sequenced, or parallel fashion, rather than composed)
Source code in pdr/func.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
sigparams(func: Callable) -> set[Parameter]
examine a function and extract a set of inspect.Parameter objects from its signature
Source code in pdr/func.py
97 98 99 100 101 102 | |
softquery(func: Callable, querydict: Mapping[str, Callable], kwargdict: dict[str, Any]) -> dict[str, Any]
implements a pipeline that accumulates 'information' -- more literally a dictionary of named parameters (kwargdict). querydict describes the sequence of functions to call and the parameter names they will populate in kwargdict. a function in querydict may use information gathered by preceding functions or passed explicitly to softquery in kwargdict, so long as the keys of kwargdict / querydict correspond to the parameter names of that function.
Source code in pdr/func.py
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 | |
specialize(func: Callable, check: Callable[..., tuple[bool, Any]], error: Optional[Callable[[Exception], str]] = None, tracker: TrivialTracker = TrivialTracker()) -> Callable
function decorator that permits dispatch of calls to func to an arbitrary set of special-case functions defined in check. replaces the pre-1.0 pdr special case checks.
Source code in pdr/func.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | |
loaders
loaders._helpers
Simple utility functions for assorted loaders and queries.
HETERODOX_ENDING = re.compile('\\r\\n?')
module-attribute
Pattern for heterodox but not deeply bizarre line endings.
_cle = curry(re.sub, HETERODOX_ENDING, '\n')
module-attribute
partially evaluated replacer of heterodox with orthodox line endings.
_check_delimiter_stream(identifiers: DataIdentifiers, name: str, target: PhysicalTarget, block: MultiDict) -> bool
Does it look like this object is a delimiter-separated table without an explicitly-defined row length?
Source code in pdr/loaders/_helpers.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
canonicalize_line_endings(text: Any) -> Any
Attempt to replace common 'heterodox' line endings in a string or
list/tuple of strings with canonical endings (
). Does not attempt to perform sophisticated delimiter sniffing, and will only reliably handle only and endings, not
, EM / 0x19,
, etc. Ignores (returns unchanged) non-strings and non-string elements of lists/tuples.
Source code in pdr/loaders/_helpers.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
canonicalized(func: Callable) -> Callable
Creates a version of func that canonicalizes line endings of any string
(or top-level string elements of a list/tuple), returned by func
Source code in pdr/loaders/_helpers.py
138 139 140 141 142 143 144 145 146 147 148 | |
check_explicit_delimiter(block: MultiDict) -> str
Check if an ASCII TABLE/SPREADSHEET definition explicitly gives a field delimiter. If it doesn't, tentatively assume it's comma-separated.
Source code in pdr/loaders/_helpers.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
count_from_bottom_of_file(fn: Union[str, list, Path], rows: int, row_bytes: int) -> int
Fallback start-byte-finding function for cases in which a label gives the length of a table in terms of number of rows and row length, but does not specify where in the file the table starts. In these cases, the table usually goes to the end of the file, but may be preceded by a header or whatever, which means that we can often guess its start byte by subtracting the table size in bytes from the physical size of the file. This is not guaranteed to work!
Source code in pdr/loaders/_helpers.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
looks_like_ascii(block: MultiDict, name: str) -> bool
Is this probably an ASCII table?
Source code in pdr/loaders/_helpers.py
22 23 24 25 26 27 28 | |
quantity_start_byte(quantity_dict: dict[str, Union[str, int]], record_bytes: Optional[int]) -> Optional[int]
Attempt to infer an object's start byte from a dict parsed from a PVL quantity object associated with a PVL pointer parameter, along with, if known, the size of a product's records (relevant only if the quantity units are not bytes). Returns None if we can't infer it (usually meaning that the label gives the start position in records but doesn't say how big the records are).
Source code in pdr/loaders/_helpers.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
loaders.astrowrap
loaders.datawrap
Classes to wrap and manage complex data-loading workflows.
Loader
compact wrapper for loader functions, intended principally but not solely for library-internal use. provides a common interface, adds compactness, delays imports, etc.
Source code in pdr/loaders/datawrap.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
ReadArray
Bases: Loader
wrapper for read_array
Source code in pdr/loaders/datawrap.py
187 188 189 190 191 192 193 194 195 196 197 198 199 | |
ReadCompressedImage
Bases: Loader
wrapper for handle_compressed_image
Source code in pdr/loaders/datawrap.py
178 179 180 181 182 183 184 | |
ReadFits
Bases: Loader
wrapper for handle_fits_file
Source code in pdr/loaders/datawrap.py
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
ReadHeader
Bases: Loader
wrapper for read_header
Source code in pdr/loaders/datawrap.py
122 123 124 125 126 127 128 129 130 131 132 | |
ReadImage
Bases: Loader
wrapper for read_image
Source code in pdr/loaders/datawrap.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
ReadLabel
Bases: Loader
wrapper for read_label
Source code in pdr/loaders/datawrap.py
144 145 146 147 148 149 | |
ReadTable
Bases: Loader
wrapper for read_table
Source code in pdr/loaders/datawrap.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
ReadText
Bases: Loader
wrapper for read_text
Source code in pdr/loaders/datawrap.py
135 136 137 138 139 140 141 | |
TBD
Bases: Loader
wrapper for tbd
Source code in pdr/loaders/datawrap.py
202 203 204 205 206 207 208 | |
Trivial
Bases: Loader
wrapper for trivial
Source code in pdr/loaders/datawrap.py
211 212 213 214 215 216 217 | |
_format_exc_report(exc: Exception) -> dict
format an exception report for inclusion in another dict
Source code in pdr/loaders/datawrap.py
30 31 32 33 34 35 36 37 | |
loaders.dispatch
Functions to select appropriate Loader subclasses for data objects.
OBJECTS_TO_IGNORE = ('DATA_SET_MAP_PROJECT.*', '.*_DESC$', '.*DESCRIPTION(_[0-9]*)?$')
module-attribute
PDS3 objects we do not automatically load, even when loading greedily. These are reference files, usually throwaway ones, that are usually not archived in the same place as the data products and add little, if any, context to individual products (they are the same across an entire 'product type'). This means that in almost all cases, attempting to greedily load them has no purpose but to throw irrelevant warnings at the user.
file_extension_to_loader(fn: str) -> Loader
Attempt to select the correct Loader subclass for an object based solely on its file extension. Used primarily for objects only specified by a PDS3 FILE_NAME pointer or similar.
Source code in pdr/loaders/dispatch.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
image_lib_dispatch(pointer: str, data: Data) -> Optional[Loader]
check file extensions to see if we want to toss a file to an external library rather than using our internal raster handling. current cases are: pillow for tiff, gif, or jp2; astropy for fits
Source code in pdr/loaders/dispatch.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
pointer_to_loader(pointer: str, data: Data) -> Loader
Attempt to select an appropriate Loader subclass based on a PDS3 object name (and sometimes the file extension).
The apparently-redundant sequence of conditionals is not in fact redundant; it is based on our knowledge of the most frequently used but sometimes redundant object names in the PDS3 corpus.
Source code in pdr/loaders/dispatch.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
special_pointer_dispatch(pointer: str, identifiers)
Some pointers are misleadingly named and the wrong loader is selected in pointer_to_loader. To avoid making the pointer_to_loader logic too complex, we check for those special cases here and return the correct loader.
Source code in pdr/loaders/dispatch.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
loaders.handlers
Pointy-end functions used by Loaders that primarily work by calling external
libraries that provide high-level support for specific file formats, including
pillow and astropy.io.fits.
_check_prescaled_desktop(fn: Union[str, Path])
Check whether a desktop-format image -- i.e., one we loaded with pillow -- might need scaling / masking / etc. Currently we treat this as true for JP2 and GeoTIFF and False otherwise. There might be other heuristics.
Source code in pdr/loaders/handlers.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 | |
add_bit_column_info(obj: dict, definition: MultiDict, identifiers: DataIdentifiers) -> dict
Parse the bit column description (if any) from a dict created from a
COLUMN PVL object and add that parsed description to obj (most likely
that definition plus block info). Used in queries.read_format_block().
Source code in pdr/loaders/handlers.py
263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | |
handle_compressed_image(fn: Union[str, Path], frame: Optional[int] = None) -> np.ndarray
Open an image in a standard 'desktop' format (GIF, standard TIFF, GeoTIFF, classic JPEG, JPEG2000, PNG, etc.) using pillow. "Compressed" is slightly misleading, because this will work fine on uncompressed GeoTIFF etc.
Source code in pdr/loaders/handlers.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | |
handle_fits_file(fn: str, name: str, hdu_id: Union[str, int, tuple[int, int]], hdulist: Optional[HDUList] = None, hdu_id_is_index: bool = False) -> dict[str, Union[MultiDict, pd.DataFrame, np.ndarray]]
Create an object or objects from an HDU of a FITS file using
astropy.io.fits.
hdu_id may be the index of an HDU or the start byte of the HDU's header
or data section; hdu_id_is_index=True means that it's the HDU's index.
If it's a start byte, and it's the start byte of the HDU's header section,
return just the header; otherwise return the data and the header. If it's
an index, always return the data and the header (currently this is only
used for primary FITS files, which by construction never have headers
labeled as independent objects).
Source code in pdr/loaders/handlers.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
handle_fits_header(hdulist: HDUList, hdu_ix: int, skip_bad_cards: bool = False) -> MultiDict
Load the header of a specified HDU as a MultiDict, engaging in various sorts of gymnastics to stymie the attempts of astropy.io.fits to keep us safe from illegally-formatted headers.
Source code in pdr/loaders/handlers.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
hdu_byte_index(obj: Union[str, Path, HDUList]) -> dict
produce a dict describing the locations of HDUs and their headers within a FITS file.
Source code in pdr/loaders/handlers.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
reindex_dupe_names(hdu: BinTableHDU)
Astropy cannot construct the .data attribute of a BinTableHDU if the table has duplicate column names. This changes any duplicate column names in place following the same convention we use for PDS binary tables (appending incrementing integers).
Source code in pdr/loaders/handlers.py
164 165 166 167 168 169 170 171 172 173 174 175 176 | |
unpack_fits_headers(filename: Union[str, Path], hdulist: Optional[HDUList] = None) -> tuple[MultiDict, list[str], dict[str, int]]
Unpack all headers in a FITS file into a MultiDict and flattened list of
keys suitable for constructing a pdr.Metadata object, along with a
mapping between HDU names and indices. Used when opening a FITS file in
"primary" mode (i.e., directly from its own headers, without a supporting
PDS3 or PDS4 label).
Source code in pdr/loaders/handlers.py
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
loaders.image
Functions for the nitty-gritty array-shaping parts of image loading.
convert_if_vax(image: np.ndarray, props: dict) -> np.ndarray
If an array is in 32-bit VAX real format, convert it to 32-bit float.
Source code in pdr/loaders/image.py
76 77 78 79 80 | |
extract_axplanes(image: np.ndarray, props: ImageProps) -> tuple[np.ndarray, dict[str, np.ndarray]]
extract ISIS-style side/bottom/top/backplanes from an array
Source code in pdr/loaders/image.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
extract_bil_linefix(image: np.ndarray, props: ImageProps) -> tuple[np.ndarray, Optional[np.ndarray], Optional[np.ndarray]]
If they exist, extract line prefixes and/or suffixes from a raveled BIL (LINE_INTERLEAVED) image. Return the image shorn of pre/suffixes, the prefixes (if any), and the suffixes (if any).
Source code in pdr/loaders/image.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
extract_single_band_linefix(image: np.ndarray, props: ImageProps) -> tuple[np.ndarray, Optional[np.ndarray], Optional[np.ndarray]]
If they exist, extract line prefixes and/or suffixes from a single-band image (i.e., a 2D ndarray). Return the image shorn of pre/suffixes, the prefixes (if any), and the suffixes (if any).
Source code in pdr/loaders/image.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
make_format_specifications(props: ImageProps) -> tuple[str, np.dtype]
Given an image properties dict, construct a struct format string and a numpy dtype that could be used to interpret the described image using, respectively, struct or numpy.
Source code in pdr/loaders/image.py
42 43 44 45 46 47 48 49 50 51 52 | |
process_multiband_image(f: BufferedIOBase, props: ImageProps) -> tuple[np.ndarray, dict[str, np.ndarray], Optional[np.ndarray], Optional[np.ndarray]]
Load the elements of a multiband image from an open file stream, reshape
the resulting array as appropriate for the image's band storage type,
perform any cleanup / segmentation operations implied by the props dict,
and return it, along with any side/bottom/topplanes or line pre/suffixes.
Source code in pdr/loaders/image.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
process_single_band_image(f: BufferedIOBase, props: ImageProps) -> tuple[np.ndarray, dict[str, np.ndarray], Optional[np.ndarray], Optional[np.ndarray]]
Load a single-band image from an open file stream,
perform any cleanup / segmentation operations implied by the props dict,
and return it, along with any side/bottom/topplanes or line pre/suffixes.
Source code in pdr/loaders/image.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
read_image(name: str, gen_props: ImageProps, fn: str, start_byte: int) -> np.ndarray
Read an IMAGE object and return it as a numpy array.
Source code in pdr/loaders/image.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
loaders.queries
Functions used as part of Loader subclasses' softquery()-backed metadata-processing workflows.
DEFAULT_DATA_QUERIES = MappingProxyType({'identifiers': get_identifiers, 'block': specialize(get_block, check_special_block), 'fn': get_file_mapping, 'target': get_target, 'start_byte': specialize(data_start_byte, check_special_offset), 'debug': get_debug, 'return_default': get_return_default})
module-attribute
Queries common to most Loaders.
START_BYTE_QUERIES = MappingProxyType({'identifiers': get_identifiers, 'block': specialize(get_block, check_special_block), 'fn': get_file_mapping, 'target': get_target, 'start_byte': specialize(data_start_byte, check_special_offset)})
module-attribute
Queries for simply finding an object's start byte and containing file. Used for the standalone_start_byte() 'a la carte' function below, designed to support implicit object association.
_extract_table_records(block)
Attempt to get the number of 'records', which can mean either row count or records defined by byte length in a way that does not necessarily correspond to number of rows, from a TABLE/SPREADSHEET definition.
Source code in pdr/loaders/queries.py
428 429 430 431 432 433 434 435 436 437 438 | |
_fill_empty_byte_rows(fmtdef: pd.DataFrame) -> pd.DataFrame
Fill any missing byte rows in a format definition. This is typically used to fill
Source code in pdr/loaders/queries.py
538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 | |
_fix_up_line_prefix_table_block(data: PDRLike, name: str, parent_block: MultiDict)
Deal with assorted quirks of underspecified line prefix table definitions that will stymie the primary table format interpretation workflow.
Source code in pdr/loaders/queries.py
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | |
_probably_ascii(block: MultiDict, fmtdef: pd.DataFrame, name: str) -> bool
Attempt to determine whether a TABLE is ASCII from its label block and format definition.
Source code in pdr/loaders/queries.py
558 559 560 561 562 563 564 565 566 | |
_table_length(block, identifiers, n_records)
Source code in pdr/loaders/queries.py
466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 | |
_table_row_position(n_records, target: PhysicalTarget) -> tuple[Optional[int], int]
Get physical start row and number of rows for a delimited ASCII table with no explicitly-defined row byte length.
A return value of None for length implies that the table occupies the
entirety of the file including and after start.
Source code in pdr/loaders/queries.py
441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 | |
base_sample_info(block: MultiDict) -> dict
Determine basic sample-level type info for an image object.
Source code in pdr/loaders/queries.py
211 212 213 214 215 216 | |
check_array_for_subobject(block: MultiDict) -> bool
Does an ARRAY definition contain a definition for a subobject? If it (illegally) contains more than one, raise a ValueError.
Source code in pdr/loaders/queries.py
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 | |
check_fix_validity(props: ImageProps) -> None
"Integrity checker for 'conventional' line pre/suffix definitions.
Source code in pdr/loaders/queries.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
check_if_qube(name: str, block: MultiDict, band_storage_type: BandStorageType) -> tuple[bool, Optional[dict]]
If this is a metadata block associated with a qube-type object, parse its properties using the various special rules necessary to read ISIS2 parameters.
Source code in pdr/loaders/queries.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
data_start_byte(identifiers: DataIdentifiers, block: Mapping, target, fn) -> int
Determine the first byte of the data in a file from its pointer.
Source code in pdr/loaders/queries.py
395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 | |
extract_axplane_metadata(block: MultiDict, props: dict) -> dict
extract metadata for ISIS-style side/back/bottomplanes
Source code in pdr/loaders/queries.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
extract_linefix_metadata(block: MultiDict, props: dict) -> dict
extract metadata for line prefix/suffix 'tables'
Source code in pdr/loaders/queries.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
generic_image_properties(block: MultiDict, sample_type: str) -> ImageProps
Construct a dict of image properties later used in the image-loading workflow.
Source code in pdr/loaders/queries.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
generic_qube_properties(block: MultiDict, band_storage_type: BandStorageType) -> ImageProps
Parse metadata from an ISIS2-style QUBE definition
Source code in pdr/loaders/queries.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
get_array_num_items(block: MultiDict) -> int
How many total array elements does an ARRAY definition imply?
Source code in pdr/loaders/queries.py
273 274 275 276 277 278 279 280 | |
get_block(data: PDRLike, name: str) -> Optional[MultiDict]
query wrapper for pdr.Data.metablock_(). also checks for interleaved
objects.
Source code in pdr/loaders/queries.py
310 311 312 313 314 315 316 317 318 319 320 | |
get_debug(data: PDRLike) -> bool
Are we in debug mode?
Source code in pdr/loaders/queries.py
533 534 535 | |
get_file_mapping(data: PDRLike, name: str) -> Union[str, Path, list[Union[str, Path]]]
query wrapper for pdr.Data.file_mapping.__getitem__()
Source code in pdr/loaders/queries.py
323 324 325 326 327 | |
get_histogram_fields(block: MultiDict) -> list[dict]
Simplified version of read_format_block() for HISTOGRAM objects, whose
format specifications are much terser than TABLE/SPREADSHEET/ARRAY.
Source code in pdr/loaders/queries.py
798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 | |
get_identifiers(data) -> dict[str, Any]
Query wrapper for pdr.Data.__getattr__("identifiers")
Source code in pdr/loaders/queries.py
878 879 880 | |
get_image_properties(gen_props: ImageProps) -> ImageProps
Second-step cleaning/formatting function for an image properties dict,
typically derived from generic_image_properties(),
qube_image_properties(), or a special case.
Source code in pdr/loaders/queries.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
get_none() -> None
Don't get anything
Source code in pdr/loaders/queries.py
883 884 885 | |
get_qube_band_storage_type(block: MultiDict) -> Optional[BandStorageType]
Attempt to get band storage type from a QUBE definition.
Source code in pdr/loaders/queries.py
250 251 252 | |
get_return_default(data: PDRLike, name: str) -> MultiDict
Wrapper for data.metaget_ used to return default values for failed loads
in non-debug mode.
Source code in pdr/loaders/queries.py
525 526 527 528 529 530 | |
get_target(data: PDRLike, name: str) -> PhysicalTarget
Attempt to get the 'target' of a PDS3 pointer or other physical data
location marker for name. This typically becomes the target argument
of data_start_byte() and/or table_position(). Also redirects for
interleaved objects.
Source code in pdr/loaders/queries.py
330 331 332 333 334 335 336 337 338 339 340 341 342 | |
gt0f(seq: Collection[Number]) -> tuple[Number]
greater-than-0 filter
Source code in pdr/loaders/queries.py
140 141 142 | |
im_sample_type(base_samp_info: dict) -> str
Determine appropriate numpy dtype string for an IMAGE object
Source code in pdr/loaders/queries.py
201 202 203 204 205 206 207 208 | |
inject_format_files(block: list[tuple[str, Any]], name: str, fn: str, data: PDRLike) -> list[tuple[str, Any]]
Load format files referenced by a TABLE/SPREADSHEET/CONTAINER/COLLECTION definition (or recursively referenced by a referenced format file), parse them, and insert them into the referencing definition.
Source code in pdr/loaders/queries.py
816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 | |
load_format_file(data: PDRLike, format_file: str, name: str, fn: str) -> MultiDict
Attempt to find and read a PVL format file (usually referenced by
^STRUCTURE pointers in an object definition). Normal PVL-reading workflows
(including just pdr.read()) work fine on these files, but this function
includes additional code to attempt to find the format file.
Source code in pdr/loaders/queries.py
846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 | |
parse_array_structure(name: str, block: MultiDict, fn: str, data: PDRLike, identifiers: DataIdentifiers) -> tuple[Optional[pd.DataFrame], Optional[Union[str, np.dtype]]]
parse_table_structure() modified for the special needs of ARRAYs.
Source code in pdr/loaders/queries.py
642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 | |
parse_table_structure(name: str, block: MultiDict, fn: str, data: PDRLike, identifiers: DataIdentifiers) -> tuple[pd.DataFrame, Optional[np.dtype]]
Parse a TABLE or SPREADSHEET's format specification as a pd.DataFrame
(see read_table_structure(). If that specification contains byte-position
information for columns, further parse them into explicit offsets. If the
table is binary, also create a numpy dtype object (usually a compound
dtype). These typically become inputs for np.fromfile (for binary tables)
or for one of several ASCII parsers.
Source code in pdr/loaders/queries.py
569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 | |
read_format_block(block: MultiDict, object_name: str, fn: str, data: PDRLike, identifiers: DataIdentifiers, within_container: bool = False) -> tuple[list[dict], bool]
Parse a TABLE, ARRAY, SPREADSHEET, CONTAINER, or COLLECTION definition, recursing into ARRAY, CONTAINER, or COLLECTION subcomponents of that definition and loading external STRUCTURE specifications as needed.
This function's fields return value becomes the rows of the fmtdef
object used extensively in the table/array-reading workflow.
Source code in pdr/loaders/queries.py
695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 | |
read_table_structure(block: MultiDict, name: str, fn: str, data: PDRLike, identifiers: DataIdentifiers) -> pd.DataFrame
Try to turn a TABLE/SPREADSHEET/ARRAY/HISTOGRAM definition into a
format definition DataFrame whose rows represent the columns of the
defined object and whose columns represent various properties of those
columns (data type, byte offset, etc.). Due to the complexity of the PDS3
Standards for these objects, this can include a wide variety of behaviors,
including recursively unpacking subobjects, loading external format files,
and adding "placeholder" entries for 'padding' (e.g. extra whitespace,
separator characters, and row prefixes/suffixes). This is most often
called by parse_table_structure() or parse_array_structure(), but some
special cases use it on its own.
Source code in pdr/loaders/queries.py
609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 | |
table_position(identifiers: DataIdentifiers, block: MultiDict, target: PhysicalTarget, name: str, start_byte: int) -> dict[str, Union[bool, int, None]]
Determine the starting position of a TABLE/SPREADSHEET object from its definition and other previously-determined information.
In the returned dict, if as_rows is True, the table is a delimiter-
seperated ASCII table with no explicitly-defined row length, and both
"start" and "length" should be interpreted as rows; otherwise, both "start"
and "length" should be interpreted as bytes. If length is None, the table
occupies the entirety of the file including and after "start".
Source code in pdr/loaders/queries.py
492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 | |
loaders.table
Functions for the nitty-gritty byte-juggling parts of TABLE/SPREADSHEET/ARRAY/HISTOGRAM loading.
PAD_CHARACTERS = ' \t",'
module-attribute
Characters we want to strip from the beginning/end of every element of an ASCII table.
_interpret_as_ascii(fn: str, fmtdef: pd.DataFrame, block: MultiDict, table_props: dict)
Load text from a file and parse it as an ASCII table.
Source code in pdr/loaders/table.py
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
_interpret_as_binary(fn, fmtdef, dt, block, start_byte)
Source code in pdr/loaders/table.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
_read_as_delimited(sep: str, string_buffer: StringIO, fmtdef: pd.DataFrame) -> Optional[pd.DataFrame]
Attempt to read an ASCII table as a delimiter-separated file. We always try this first before moving to a fixed-width parser.
Source code in pdr/loaders/table.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
_read_fwf_with_colspecs(fmtdef: pd.DataFrame, string_buffer: StringIO) -> pd.DataFrame
Attempt to read an ASCII table as a fixed-width file using column boundaries specified by or inferred from its format definition.
Source code in pdr/loaders/table.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
_read_table_from_stringio(fmtdef: pd.DataFrame, block: MultiDict, string_buffer: StringIO) -> pd.DataFrame
Attempt to parse a string buffer, presumably containing an ASCII table, as a pandas DataFrame. First try to treat it as a delimiter-separated table; fall back to fixed-width parsing if that doesn't work.
Source code in pdr/loaders/table.py
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
read_array(fn, block, start_byte, fmtdef_dt)
Read an array object from this product and return it as a numpy array.
Source code in pdr/loaders/table.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
read_table(identifiers, fn, fmtdef_dt, table_props, block, start_byte)
Read a table. Parse the label format definition and then decide whether to treat the table as text or binary.
Source code in pdr/loaders/table.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
loaders.text
Pointy-end functions for text-handling Loader subclasses.
ignore_if_pdf(fn: Union[str, Path]) -> Optional[str]
Read text from a file if it's not a pdf.
Source code in pdr/loaders/text.py
131 132 133 134 135 136 137 138 139 | |
read_header(fn: Union[str, Path], table_props: dict, name: str = 'HEADER') -> str
Read a text header from a file.
Source code in pdr/loaders/text.py
30 31 32 33 34 35 36 | |
read_label(fn: Union[str, Path], fmt: Optional[str] = 'text') -> Union[str, 'PVLModule']
Read the entirety of a PDS3 label, optionally using pvl to parse it as
completely as possible into Python objects. This is not intended for use
in the primary pdr.Metadata initialization workflow, but rather to
handle cases when the user explicitly requests the entirety of the label
(typically by accessing the "LABEL" key of a pdr.Data object).
Source code in pdr/loaders/text.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
read_text(target: str, fn: Union[list[str], str]) -> Union[list[str], str]
Read text from a file or list of files.
Source code in pdr/loaders/text.py
15 16 17 18 19 20 21 22 23 24 25 26 27 | |
skeptically_load_header(fn: Union[Path, str], table_props: dict, name: str = 'header', fmt: Optional[str] = 'text') -> Union[str, 'PVLModule', None]
Attempt to read a text HEADER object from a file. PDS3 does not give a strict definition of the HEADER object, so there is no way to consistently load HEADERs in a coherent, well-formatted fashion. However, providers generally use HEADER to denote either attached file/product-level metadata, column headers for an ASCII table, or object-level contextualizing metadata for ASCII tables.
By default, simply read the designated byte range as unicode text. If
fmt is "pvl", also attempt to parse this text as PVL. (This will fail
on most products, because most HEADER objects are not PVL, but is useful
for some ancillary attached labels, especially ISIS labels.)
NOTE: HEADERs defined in labels very often do not actually exist and are never essential for loading primary data objects, so this function is always "optional", even in debug mode. If it fails, it will simply raise a UserWarning and return None.
WARNING: this function is not intended to load metadata of standard file formats (such as TIFF tags or FITS headers). These headers should always be handled by a format-specific parser. More generally, it will never work on binary files.
Source code in pdr/loaders/text.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
loaders.utility
Support objects for 'utility' Loader subclasses.
is_trivial(pointer: str) -> bool
Returns True if this is the name of a data object we want to handle trivally, in the sense that we never ever want to load it directly.
Source code in pdr/loaders/utility.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
looks_like_this_kind_of_file(filename: str, kind_extensions: Collection[str]) -> bool
Does this file have any of these extensions?
Source code in pdr/loaders/utility.py
71 72 73 74 75 76 | |
tbd(name: str, block: MultiDict, *_, **__)
This is a placeholder function for objects that are not explicitly supported elsewhere. It throws a warning and passes just the value of the pointer.
Source code in pdr/loaders/utility.py
61 62 63 64 65 66 67 68 | |
trivial(*_, **__)
This is a trivial loader. It does not load. The purpose is to use for any pointers we don't want to load and instead simply want ignored.
Source code in pdr/loaders/utility.py
53 54 55 56 57 58 | |
np_utils
Methods for working with numpy objects, primarily intended as components of pdr's image- and table-loading routines.
casting_to_float(array: np.ndarray, *operands: Number) -> bool
check: will this operation cast the array to float? return True if array is integer-valued and any operands are not integers.
Source code in pdr/np_utils.py
56 57 58 59 60 61 62 63 | |
enforce_order_and_object(array: np.ndarray, inplace=True) -> np.ndarray
Make an ndarray compatible for use with pandas or other similarly-strict interfaces. Determine which, if any, of the array's fields are in nonnative byteorder and swap them; also convert any void dtypes to object.
Source code in pdr/np_utils.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
ibm32_to_np_f32(ibm)
Convert an array of IBM System 360-style 32-bit floats (expressed as 32-bit unsigned integers) to numpy float64.
Source code in pdr/np_utils.py
121 122 123 124 125 126 | |
ibm64_to_np_f64(ibm)
Convert an array of IBM System 360-style 64-bit floats (expressed as 64-bit unsigned integers) to numpy float64.
Source code in pdr/np_utils.py
129 130 131 132 133 134 | |
ibm_to_np(ibm: np.ndarray, sreg: int, ereg: int, mmask: int) -> np.ndarray
Convert an array composed of IBM System 360-style floats (expressed as 4- or 8-byte unsigned integers, as appropriate for byte width) to numpy float64.
Source code in pdr/np_utils.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
make_c_contiguous(arr: np.ndarray) -> np.ndarray
If an ndarray isn't C-contiguous, reorder it as C-contiguous. If it is, don't mess with it.
Source code in pdr/np_utils.py
93 94 95 96 97 98 99 100 | |
np_from_buffered_io(buffered_io: BufferedIOBase, dtype: Union[np.dtype, str], offset: Optional[int] = None, count: Optional[int] = None) -> np.ndarray
Read a 1D numpy array of the specified dtype, size, and offset from a buffered IO object.
Source code in pdr/np_utils.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
parselabel
parselabel.pds3
Parsing utilities for PDS3 labels.
STRUCTUREPAT = re.compile('\\^(?:(?:\\w|_)+_)?STRUCTURE$')
module-attribute
regex pattern for format file pointers
BlockParser
Utility class for stateful recursive parsing and aggregation of a series of PVL statements.
Source code in pdr/parselabel/pds3.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
__init__()
Source code in pdr/parselabel/pds3.py
87 88 89 | |
_step_in(name)
Enter a block.
Source code in pdr/parselabel/pds3.py
95 96 97 98 | |
_step_out()
Exit a block.
Source code in pdr/parselabel/pds3.py
91 92 93 | |
add_statement(parameter, value)
Add a statement.
Source code in pdr/parselabel/pds3.py
100 101 102 103 | |
parse_statements(statements) -> tuple[MultiDict[str, Any], list[str]]
Parse a series of PVL statements into a (possibly nested) MultiDict and a flattened list of all keys at all levels of that MultiDict.
Source code in pdr/parselabel/pds3.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
chunk_statements(trimmed_lines: Iterable[str]) -> list[tuple[str, str]]
chunk trimmed lines from a pvl-text into assignment statements.
Source code in pdr/parselabel/pds3.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
depointerize(string: str) -> str
prevent a string from starting with ^
Source code in pdr/parselabel/pds3.py
370 371 372 | |
extract_pvl_block_terminal(line: str) -> Optional[str]
get the PVL block terminator, if any, from a string
Source code in pdr/parselabel/pds3.py
33 34 35 36 37 38 | |
get_pds3_pointers(label: Optional[MultiDict] = None) -> tuple[str]
attempt to get all PDS3 "pointers" -- PVL parameters starting with "^" -- from a MultiDict generated from a PDS3 label. These typically specify physical data locations, and in most cases correspond to data object definitions later in the label (common exceptions include "^STRUCTURE"-type pointers and "^DATA_SET_MAP_PROJECTION").
Source code in pdr/parselabel/pds3.py
350 351 352 353 354 355 356 357 358 359 360 361 362 | |
index_duplicate_pointers(pointers: Collection[str], mapping: MultiDict[str, Any], params: list[str]) -> tuple[MultiDict[str, Any], list[str]]
Although technically illegal, some PDS3 objects have multiple data objects with the same name. This produces counterintuitive results. This function appends ascending integers to any duplicate members of a specified set of "pointer" keys of a MultiDict, and also their "depointerized" versions, in order to distinguish data objects. This can potentially fail if duplicate-named object pointers and their corresponding object definitions are not given in the same order in a label, but we have not yet encountered that case.
Source code in pdr/parselabel/pds3.py
375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 | |
is_an_assignment_line(line: str) -> bool
pick lines that begin assignment statements.
in PDS labels, it never (?) seems to be the case that people use delimiters to put multiple assignment statements on a line
there is an issue with people who put '=' in text blocks -- looking for a block of capital letters is usually good enough
Source code in pdr/parselabel/pds3.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
literalize_pvl(obj: Union[str, MultiDict[str, Any]]) -> Union[MultiDict[str, Any], str, int, float, set, tuple]
attempt to interpret string representations of PVL values or aggregations
as Python objects. if obj is a MultiDict, attempt to interpret all its
values, diving recursively into any contained MultiDicts.
permissive; if parsing fails, simply return the string.
Source code in pdr/parselabel/pds3.py
303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | |
literalize_pvl_block(block: MultiDict[str, Any]) -> MultiDict[str, Any]
Parse the values of an entire (possibly-nested) MultiDict whose values are PVL strings into Python objects.
Source code in pdr/parselabel/pds3.py
335 336 337 338 339 340 341 342 343 344 345 346 347 | |
looks_pvl(filename) -> bool
Is this probably a PVL file?
Source code in pdr/parselabel/pds3.py
134 135 136 | |
multidict_dig_and_edit(input_multidict: MultiDict, target: Any = None, input_object: Any = None, predicate: Callable[[Any, Any, Any], bool] = None, setter_function: Callable = None, key_editor: bool = False, keep_values: bool = True, mtypes: tuple[type, ...] = (MultiDict,)) -> MultiDict
This function produces a modified copy of a MultiDict (or other mapping,
but may produce unintended results). It searches through
a MultiDict's items, recursively continuing into any children that are
an instance of mtypes, and checking for keys for which
predicate(key, value, target) is True. If predicate is None,
the behavior reverts to predicate == key.
If "key_editor" is False, the function changes the values associated with those keys. if it is True, the function changes the key names themselves.
If "setter_function" is not None, it replaces those keys/values with the output of "setter function", executed with "input_object" and the original key/value as arguments. If it is None, it will simply replace them with "input_object".
If "keep_values" is not True, the returned MultiDict will contain only edited values, causing this to also act as a filtering function.
Source code in pdr/parselabel/pds3.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
parse_non_base_10(text: str) -> int
Convert a PVL representation of a non-base-10 integer to a base-10 Python integer.
Source code in pdr/parselabel/pds3.py
266 267 268 269 270 271 272 273 274 275 | |
parse_non_base_10_collection(class_: Union[Type[set], Type[tuple]], obj: str) -> Union[tuple[int], set[int]]
Convert a collection of PVL representations of non-base-10 integers to a collection (of the same class) of base-10 Python integers.
Source code in pdr/parselabel/pds3.py
278 279 280 281 282 283 284 285 286 287 | |
parse_pvl(label: str, deduplicate_pointers: bool = True) -> tuple[MultiDict[str, Any], list[str]]
Parse a PVL-text into a MultiDict and a flattened list of keys.
Source code in pdr/parselabel/pds3.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
parse_pvl_quantity_object(obj: str) -> dict[str, Union[str, Number]]
Parse a PVL quantity string into a dict like {'value': 2, 'units': 'km'}.
Source code in pdr/parselabel/pds3.py
173 174 175 176 177 178 179 180 | |
parse_pvl_quantity_statement(statement: str) -> Any
parse pvl statements including quantities. returns quantities as mappings.
this will also handle statements that do not consist entirely of
quantities, notably including tuples of the form '("A5.DAT", 1000
Source code in pdr/parselabel/pds3.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | |
parse_unusual_collection(obj: str) -> Union[tuple[Union[int, str]], set[Union[int, str]]]
Parse a PVL collection of non-base-10 numbers or unquoted strings.
Source code in pdr/parselabel/pds3.py
290 291 292 293 294 295 296 297 298 299 300 | |
pointerize(string: str) -> str
make a string start with ^ if it didn't already
Source code in pdr/parselabel/pds3.py
365 366 367 | |
read_pvl(filename: Union[str, Path], deduplicate_pointers: bool = True, max_size: int = DEFAULT_PVL_LIMIT, default_strict_decode: bool = True) -> tuple[MultiDict, list[str]]
Read and parse a file containing a PVL-text.
Source code in pdr/parselabel/pds3.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
set_key_index(pointer_range: list[int], key: str) -> str
utility setter function for multidict_dig_and_edit() as called by
index_duplicate_pointers(); appends a number from a list to a string
Source code in pdr/parselabel/pds3.py
434 435 436 437 438 439 440 441 | |
parselabel.pds4
Simple utilities for preprocessing pds4_tools-produced label objects for the pdr.Metadata constructor.
reformat_pds4_tools_label(label: 'Label') -> tuple[MultiDict, list[str]]
Convert a pds4_tools Label object into a MultiDict and a list of parameters suitable for constructing a pdr.Metadata object. This is not just a type conversion; it also rearranges some nested data structures (in particular, repeated child elements become multiple keys of a MultiDict rather than a list of OrderedDicts).
Source code in pdr/parselabel/pds4.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
unpack_to_multidict(packed: Mapping, mtypes: tuple[type, ...] = (dict,)) -> MultiDict
Recursively unpack any Mapping into a MultiDict. Unpacks all list or tuple
values at any level into multiple keys at that level. This is an unusual-
sounding behavior but is generally appropriate for PDS4 labels, and
specifically for the pds4_tools representation of XML labels. PDS4 types
with cardinality > 1 always (?) represent multiple distinct entities /
properties rather than an array of properties. The list can also always be
retrieved from the resulting multidict with MultiDict.get_all().
Example:
>>> unpack_to_multidict({'a': 1, 'b': [{'c': 2}, 3]})
<MultiDict('a': 1, 'b': <MultiDict('c': 2)>, 'b': 3)>
Source code in pdr/parselabel/pds4.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
parselabel.utils
DEFAULT_PVL_LIMIT = 1000 * 1024
module-attribute
heuristic for max label size. we know it's not a real rule.
KNOWN_LABEL_ENDINGS = (re.compile(b'\nEND {0,8}(\r| {8})'), re.compile(b'\x00{3}'), b'\nEND\n')
module-attribute
Fast regex patterns for generic PVL label endings. They work for almost all PVL labels in the PDS.
_scan_to_end_of_label(buf: IO, max_size: int, text: bytes, raise_no_ending: bool)
Subroutine of trim_label()
Source code in pdr/parselabel/utils.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
trim_label(fn: Union[IO, Path, str], max_size: int = DEFAULT_PVL_LIMIT, strict_decode: bool = True, raise_no_ending: bool = False, special_encoding: str = 'utf-8') -> str
Look for a PVL label at the top of a file.
Source code in pdr/parselabel/utils.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
pd_utils
Methods for working with pandas objects, primarily intended for use in TABLE/ARRAY/SPREADSHEET/HISTOGRAM-loading workflows.
_apply_item_offsets(fmtdef: pd.DataFrame) -> pd.Series
Select item offsets (for a column or container with multiple items). If the specification didn't give item offsets, just assume they're equal to the byte width (i.e. there's no variable padding between fields).
Source code in pdr/pd_utils.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | |
booleanize_booleans(table: pd.DataFrame, fmtdef: pd.DataFrame) -> pd.DataFrame
We generally load boolean columns from binary tables as uint8 of value 0 or 1. This converts all such columns of a DataFrame to np.bool.
Source code in pdr/pd_utils.py
242 243 244 245 246 247 248 249 250 251 | |
compute_offsets(fmtdef: pd.DataFrame) -> pd.DataFrame
PDS3 TABLE/SPREADSHEET/ARRAY specifications do not explicitly give the correct byte offsets for CONTAINERs, COLLECTIONs, anything loaded in by reference from a STRUCTURE, or repeated elements of a COLUMN. Byte offsets in these cases always refer to their parent containers, which can repeat, have children with their own repetitions, etc., etc. This function 'unpacks' a format definition as necessary and adds an SB_OFFSET column giving the correct byte offsets (from record start) for each field of the data table/array.
Source code in pdr/pd_utils.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
construct_nested_array_format(fmtdef: pd.DataFrame) -> pd.DataFrame
ARRAY objects can be deeply nested. This function computes the correct byte offsets and dtypes (including array shape) for any nested subelements.
Source code in pdr/pd_utils.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
convert_ebcdic(table: pd.DataFrame, fmtdef: pd.DataFrame) -> pd.DataFrame
Decode any columns of a DataFrame that contain bytestrings constructed from IBM S/360-style EBCDIC-encoded text to Python strings.
Source code in pdr/pd_utils.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
convert_ibm_reals(df: pd.DataFrame, fmtdef: pd.DataFrame) -> pd.DataFrame
Converts all IBM reals in a dataframe from packed 32- or 64-bit integer form to np.float32 or np.float64.
Source code in pdr/pd_utils.py
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 | |
convert_vax_reals(data: pd.DataFrame, properties: pd.DataFrame) -> pd.DataFrame
If any columns in a DataFrame are in 32-bit VAX real format, convert them to 32-bit float.
Source code in pdr/pd_utils.py
354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 | |
fmtdef_to_dtype(fmtdef: pd.DataFrame) -> np.dtype
Construct a structured (but ideally never nested, see
construct_nested_array_format() below) dtype from a format definition.
Source code in pdr/pd_utils.py
195 196 197 198 199 200 201 202 203 204 | |
insert_sample_types_into_df(fmtdef: pd.DataFrame, identifiers: DataIdentifiers) -> tuple[pd.DataFrame, np.dtype]
Insert numpy-compatible data type strings into a TABLE/ARRAY format definition DataFrame. Also generate a numpy dtype object from that DataFrame.
Source code in pdr/pd_utils.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
numeric_columns(df: pd.DataFrame) -> list[Hashable]
Return names of all 'numeric' columns in a DataFrame.
Source code in pdr/pd_utils.py
27 28 29 30 31 32 33 | |
rectified_rec_df(array: np.ndarray) -> pd.DataFrame
Attempt to 'flatten' a 1- or 2D ndarray, possibly with a structured dtype but with no nested arrays, into a DataFrame, typecasting as necessary for pandas compatibility.
Source code in pdr/pd_utils.py
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | |
reindex_df_values(df: pd.DataFrame, column='NAME') -> pd.DataFrame
give unique string identifiers to every value in a particular column of a DataFrame by appending an underscore and an incrementing number if necessary.
include START_BYTE in string for values marked as RESERVED.
Source code in pdr/pd_utils.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
structured_array_to_df(array: np.ndarray) -> pd.DataFrame
Attempt to convert an ndarray with a structured dtype to a DataFrame, flattening any nested 1- or 2-D arrays into blocks of columns and typecasting as necessary for pandas compatibility. This does not attempt to flatten nested elements with dimensionality > 2, and will raise a NotImplementedError if it encounters them.
Source code in pdr/pd_utils.py
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 | |
pdr
Data
Core pdr class.
Source code in pdr/pdr.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 | |
__contains__(name: str) -> bool
True if self contains a data object with the name 'name'.
Source code in pdr/pdr.py
1161 1162 1163 | |
__getattribute__(attr: str) -> Any
Get an attribute of self; known data objects can be referred to using attribute notation.
Source code in pdr/pdr.py
1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 | |
__getitem__(name: str) -> Any
Return the contained data object with the name 'name'.
Source code in pdr/pdr.py
1166 1167 1168 1169 1170 1171 1172 1173 1174 | |
__init__(fn: Union[Path, str], *, debug: bool = False, label_fn: Optional[Union[Path, str]] = None, search_paths: Union[Collection[str], str] = (), skip_existence_check: bool = False, pvl_limit: int = DEFAULT_PVL_LIMIT, tracker: Optional[TrivialTracker] = None, strict_label_decode: bool = True)
Source code in pdr/pdr.py
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
__iter__() -> Iterator[Any]
Iterate over all the data objects contained in self. Iteration all the way to the end will cause all of the data objects to be loaded, which may run your computer out of memory. For this reason, iteration over Data objects is deprecated and will be removed in six months.
Source code in pdr/pdr.py
1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 | |
__len__()
Return the number of data objects contained in self.
Source code in pdr/pdr.py
1187 1188 1189 1190 1191 | |
__repr__()
Source code in pdr/pdr.py
1176 1177 1178 1179 1180 1181 | |
__str__()
Source code in pdr/pdr.py
1183 1184 1185 | |
_add_loaded_objects(obj: Mapping[str, Any])
Helper for load(). Ingests objects returned by a Loader.
Source code in pdr/pdr.py
602 603 604 605 606 607 608 | |
_associate_prefix_tables(imname, preobjs)
Check for underspecified line prefix table objects associated with a PDS3 image specification.
Source code in pdr/pdr.py
359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 | |
_check_compressed_file_pointer(object_name: str) -> tuple[bool, Optional[tuple[Path, ...]]]
When PDS3 labels describe data objects in compressed files, they often give the names that the compressed files would have, were someone to decompress them, as the physical locations of those objects. This can be confusing, because you cannot load an object from a merely hypothetical file.
However, this is by no means a strict convention, so we can't just assume that it's the case -- we have to check all the file names mentioned for that object in the label, including those not given as top-level pointers.
Source code in pdr/pdr.py
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 | |
_file_not_found(object_name: str)
Implements default file-not-found behavior.
Source code in pdr/pdr.py
622 623 624 625 626 627 628 629 630 631 632 | |
_find_fits_header_pds4_id(start_byte: int) -> Optional[str]
Given start byte for an HDU's data segment, check to see if the PDS4 product associated with self includes that HDU's header as a distinct data object with a local identifier. If it is, return the PDS4 local identifier of that object. If not, return None.
Source code in pdr/pdr.py
689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 | |
_find_objects()
Add all top-level data objects mentioned in the label to this object's
index, except for 'trivial' ones (see loaders.utility.is_trivial()).
Also check for interleaved objects not defined at top level (such as
some line prefix tables).
TODO: check for ISIS-style axplane objects.
Source code in pdr/pdr.py
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 | |
_init_pds4()
use pds4_tools to open pds4 files, but in our interface idiom.
Source code in pdr/pdr.py
335 336 337 338 339 340 341 342 343 344 345 346 347 | |
_init_primary_format()
Initialization handler for "primary" format modes (cases in which
Data offers an interface to a file or files in a standard format).
Currently only supports FITS and 'desktop' image formats.
Source code in pdr/pdr.py
651 652 653 654 655 656 657 658 659 660 661 662 663 | |
_init_search_path() -> str
Set initial path this object will check for additional files (just the directory that contains its "primary" file).
Source code in pdr/pdr.py
349 350 351 352 353 354 355 356 357 | |
_load_pds4(object_name: str)
Load this object however pds4_tools wants to load this object, then reformat to DataFrame, expose the array handle in accordance with our type conventions, etc.
If the object is from a FITS file, preempt all that behavior and send it to our internal FITS-loading workflow.
Source code in pdr/pdr.py
714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 | |
_load_primary_fits(object_name: str) -> Union[np.ndarray, pd.DataFrame, None]
Handle loading an HDU from a FITS file in "primary" FITS mode.
Source code in pdr/pdr.py
634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 | |
_object_to_filename(object_name: str) -> Union[str, list[str], Optional[tuple[Path, ...]]]
Construct one or more on-disk search paths for the file that contains a named data object. Does not actually check if files exist at those paths (typically performed by calls to `utils.check_cases()).
Source code in pdr/pdr.py
445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 | |
_target_path(object_name: str, cached: bool = True, raise_missing: bool = False) -> Optional[Union[Path, list[Path], str]]
Considering all known search paths and treating filenames as case-insensitive, attempt to find a filesystem path to a file or files in which a particular named data object might exist. This autopopulates self.file_mapping[object_name] if it finds one or more files, and by default treats this value as cached on subsequent calls (which can improve performance significantly, especially on networked filesystems).
Source code in pdr/pdr.py
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 | |
dump_browse(prefix: Optional[Union[str, Path]] = None, outpath: Optional[Union[str, Path]] = None, scaled: bool = True, purge: bool = False, **browse_kwargs: Any) -> None
attempt to dump all data objects associated with this Data object to disk.
By default, writes files to the working directory.
By default, assigns filenames like: {filename stem}_{object name}.{file extension}
So, for instance, a browse version of a TABLE object referenced from "jn23a1.lbl" would be written to "jn23a1_TABLE.csv".
If prefix is not None, filenames will begin with the value of prefix rather than the original filename stem.
If outpath is not None, files will be written to the value of outpath rather than to the working directory.
By default, attempts to apply scaling/offset factors and special constant masking before writing images. If scaled is False, does not do that. If scaled == "both", writes both scaled and unscaled versions, adding "_scaled" and "_unscaled" to their respective filenames before the file extension. Note that some types of load operations (like for FITS files) may have already applied scaling factors, in which case recovering the unscaled image is not possible.
if purge is True, objects are deleted as soon as they are dumped, rendering this Data object 'empty' afterward.
**browse_kwargs are passed directly to browsify.browsify(), and offer various ways to modify image dumping behavior:
-
image_clip: Union[float, tuple[float, float], None] = None Applies a percentile clip to the image at clip = (low_percentile, 100-high_percentile). If clip is a single value, low_percentile=high_percentile in the above formula. If it's a tuple, low_percentile is the first value in the tuple.
The default None value causes 'nice' clipping: it clips the image at (1, 1), but if this results in the clipped image containing only a single value, it uses the original image instead. Pass 0 if absolutely no clipping is desired.
-
mask_color: Optional[tuple[int, int, int]] = (0, 255, 255) Allows specification of RGB color for masked arrays (default cyan)
-
band_ix: Optional[int] = None The index of the band to be exported in a multiband image. If None, the middle band of the image is exported. If there are 3-4 bands in the image and the override_rgba argument is False, this value is ignored.
When set equal to "burst", returns a separate browse product for each band of a multiband image, appending numbers to the filenames prior to the file extension.
-
save: bool = True If False, renders images in memory but does not save them to disk. Not generally useful when passed to this method except for testing.
-
override_rgba: bool = False Allows use of band_ix when there are 3-4 bands in the image. Otherwise, the image will be returned as a stacked rgb image (the assumed 'alpha' channel is always dropped). Setting this to True is useful when a 3/4 band image is not actually RGB(A) (e.g. XYZ spatial products).
This argument has no effect on images that do not have 3-4 bands.
-
image_format: str = "jpg" Sets image extension which informs the format pillow will save the browse image as.
-
slice_axis: int = 0 Allows specification of which axis to slice along for the dump_browse image. The default slices at axis 0 (which is usually the axis labelled "BAND").
-
rgb_channels: Optional[tuple[int, int, int]] = None Allows specification of the bands used to create an RGB image. By default the first three bands of a 3-4 band image are used for the red, green, and blue channels respectively (equivalent to manually specifying rgb_channels=(0,1,2)).
If this argument is used, band_ix and override_rgba are ignored. It can also be used on multiband images with >4 bands to output an RGB image.
Source code in pdr/pdr.py
989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 | |
find_special_constants(object_name: str) -> dict[str, Number]
look up or infer special constants for one of our data objects. in general, only works well on ndarrays.
Source code in pdr/pdr.py
902 903 904 905 906 907 908 909 910 911 912 | |
get_absolute_paths(filename: Union[str, Path]) -> list[str]
Construct Paths for a filename in all our search paths. (These are
places we can look for that file).
Source code in pdr/pdr.py
950 951 952 953 954 955 956 957 958 959 | |
get_scaled(object_name: str, inplace: bool = False, float_dtype: Optional[np.dtype] = None) -> np.ndarray
fetches copy of data object corresponding to key, masks special constants, then applies any scale and offset specified in the label. only relevant to arrays.
if inplace is True, does calculations in-place on original array,
with attendant memory savings and destructiveness.
Source code in pdr/pdr.py
856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 | |
getattr(attr)
get an attribute of self without either lazy-loading on failure or risking infinite loops inside lazy-load behaviors.
Source code in pdr/pdr.py
1145 1146 1147 1148 1149 1150 | |
keys() -> list[str]
Returns names of all data objects defined in the label (or inferred while loading an object, like FITS headers).
Source code in pdr/pdr.py
1154 1155 1156 1157 1158 1159 | |
load(name: str, reload: bool = False, **load_kwargs: Any)
Explicitly load an identified data object by name; alternatively
name="all" means "load every identified object". Does not return the
object; just assigns it to the name attribute of self. The
Data.__getitem__() interface lazy-loads by calling this function
with default arguments in response to data['NOTYETLOADED'] etc.
Source code in pdr/pdr.py
534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 | |
load_all()
Handler (and alias) for Data.load("all").
Source code in pdr/pdr.py
610 611 612 613 614 615 616 617 618 619 620 | |
load_from_pointer(pointer: str, **load_kwargs: Any) -> dict[str, Union[pd.DataFrame, np.ndarray, str, MultiDict, 'PVLModule']]
PDS3 data object-loading handler. Set up the appropriate Loader for
the object, set up load flow tracking, call the loader, and perform
basic cleanup.
Source code in pdr/pdr.py
807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 | |
metablock(text: str, warn: bool = True) -> Optional[Mapping]
get the first value from this object's metadata whose key exactly
matches text, even if it is nested inside a mapping, if the value
itself is a mapping (e.g., nested PVL block, XML 'area', etc.)
evaluate it using self.metadata.formatter. if there is no key matching
'text', will evaluate and return the metadata as a whole.
WARNING: this function's return values are memoized for performance.
updating elements of self.metadata that have already been accessed
with this function will not update future calls to this function.
Source code in pdr/pdr.py
933 934 935 936 937 938 939 940 941 942 943 944 | |
metablock_(text: str) -> Optional[Mapping]
quiet-by-default version of metablock
Source code in pdr/pdr.py
946 947 948 | |
metaget(text: str, default: Any = None, warn: bool = True) -> Any
get the first value from this object's metadata whose key exactly
matches text, even if it is nested inside a mapping. evaluate it
using self.metadata.formatter.
Warning
this function's return values are memoized for performance. updating elements of self.metadata that have already been accessed with this function will not update future calls to this function.
Source code in pdr/pdr.py
914 915 916 917 918 919 920 921 922 923 924 925 926 927 | |
metaget_(text: str, default: Any = None) -> Any
quiet-by-default version of metaget
Source code in pdr/pdr.py
929 930 931 | |
read_metadata(pvl_limit: int = DEFAULT_PVL_LIMIT, strict_decode: bool = True) -> Metadata
Attempt to ingest a product's metadata. if it is a PDS4 product, pds4_tools will already have ingested its detached XML label in Data._init_pds4(). In that case, simply preprocess it for Metadata.init. Otherwise, if it has a detached PDS3/PVL label, ingest it with pdr.parselabel.pds3.read_pvl. Then, if we found no detached label, look for an attached PVL label (also using read_pvl). If we are in a "primary" mode, ignore all that and ingest the product's metadata with the appropriate format-specific functions. Then, construct a Metadata object from whatever we loaded and add all the objects it implies to our index.
Source code in pdr/pdr.py
762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 | |
show(object_name: str = None, scaled: bool = True, **browse_kwargs: Any) -> Image
Produce an Image from a data object associated with this product. A convenient way to quickly look at data.
Source code in pdr/pdr.py
963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 | |
unloaded() -> tuple[str]
Return names of all identified but unloaded data objects.
Source code in pdr/pdr.py
530 531 532 | |
DebugExceptionPreempted
Bases: Exception
Stub Exception subclass for selectively ignoring Exceptions from load failures when not in debug mode.
Source code in pdr/pdr.py
211 212 213 214 215 216 | |
Metadata
Bases: MultiDict
MultiDict subclass intended primarily as a helper class for Data. includes various convenience methods for handling metadata syntaxes, common access and display interfaces, etc.
Source code in pdr/pdr.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | |
__init__(mapping_params: tuple[Mapping, Collection[str]], standard: Literal['PDS3', 'PDS4', 'FITS'] = 'PDS3', **kwargs)
Source code in pdr/pdr.py
89 90 91 92 93 94 95 96 97 98 99 100 101 | |
__repr__()
Source code in pdr/pdr.py
203 204 205 | |
__str__()
Source code in pdr/pdr.py
199 200 201 | |
_init_identifiers() -> DataIdentifiers
Initializes common PDS3 data identifiers for use in special-case checks.
Source code in pdr/pdr.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
metablock(text: str, warn: bool = True) -> Optional[Mapping]
get the first value from this object whose key exactly
matches text, even if it is nested inside a mapping, if the value
itself is a mapping (e.g., nested PVL block, XML 'area', etc.)
evaluate it using self.formatter. raise a warning if there are
multiple keys matching this.
if there is no key matching 'text', will evaluate and return the
metadata as a whole.
Warning
This function's return values are memoized for performance.
Updating elements of a Metadata object's underlying mapping
that have already been accessed with this function will not update
future calls to this function.
Source code in pdr/pdr.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | |
metablock_(text: str) -> Optional[Mapping]
quiet-by-default version of metablock
Source code in pdr/pdr.py
180 181 182 | |
metaget(text: str, default: Any = None, warn: bool = True) -> Any
get the first value from this object whose key exactly matches text,
even if it is nested inside a mapping. optionally evaluate it using
self.formatter. raise a warning if there are multiple keys matching
this.
Warning
This function's return values are memoized for performance.
Updating elements of a Metadata object's underlying mapping
that have already been accessed with this function will not update
future calls to this function.
Source code in pdr/pdr.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
metaget_(text: str, default: Any = None) -> Any
quiet-by-default version of metaget
Source code in pdr/pdr.py
137 138 139 | |
metaget_fuzzy(text: str) -> Any
Like metaget(), but fuzzy-matches key names.
Source code in pdr/pdr.py
141 142 143 144 145 146 147 148 149 150 151 | |
_metablock_factory(metadata: Metadata) -> Callable[[str], Mapping]
Factory function for an internal component of metablock(). Reduces the
risk that the metadata access cache will create reference cycles.
Source code in pdr/pdr.py
1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 | |
pdrtypes
Axname: TypeAlias = Literal['BAND', 'LINE', 'SAMPLE']
module-attribute
Conventional names for image axes.
BandStorageType: TypeAlias = Literal['BAND_SEQUENTIAL', 'LINE_INTERLEAVED', 'SAMPLE_INTERLEAVED', None]
module-attribute
Codes for physical storage layout of 3-D arrays. Also known as BSQ/band sequential, BIL/band interleaved by line, BIP/band interleaved by pixel. None implies either that the storage layout is unknown or that the array is not 3-D.
ByteOrder: TypeAlias = Literal['<', '>']
module-attribute
Most significant/least significant byteorder codes
LoaderFunction: TypeAlias = Callable[..., Union[str, 'MultiDict', 'pd.DataFrame', 'np.ndarray']]
module-attribute
Signature of a Loader's load function
PDRLike: TypeAlias = Union['Data', 'Metadata']
module-attribute
Something with a pdr-style metadata-getting interface
PhysicalTarget: TypeAlias = Union[list[str, int], tuple[str, int], int, str, dict[str, Union[str, int]]]
module-attribute
Expected formats of 'pointer' parameters, i.e. ^WHATEVER = PhysicalTarget
DataIdentifiers
Bases: TypedDict
Standard PDS3 'identifiers' Data checks its Metadata for on initialization (if it's made from a PDS3 product). Used primarily to make special case checks more compact. These are taken directly from the label, then stringified if they're sets or tuples. All keys are always present, but may be None if a parameter's not actually in the label.
Source code in pdr/pdrtypes.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
ImageProps
Bases: TypedDict
Standard image properties dict used in image-processing workflows.
Source code in pdr/pdrtypes.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
pil_utils
Utilities for dealing with 'desktop'-format images using pillow.
not all of this ultimately goes here. Also, we might want to use opencv
for some things instead.
pvl_utils
utilities for working with the pvl library.
TimelessOmniDecoder
Bases: OmniDecoder
Source code in pdr/pvl_utils.py
16 17 18 19 20 21 22 | |
cached_pvl_load(reference)
cached
Source code in pdr/pvl_utils.py
25 26 27 28 29 30 | |
utils
generic i/o, parsing, and functional utilities.
SUPPORTED_COMPRESSION_EXTENSIONS = ('.gz', '.bz2', '.zip')
module-attribute
compression 'types' we support
append_repeated_object(obj: Union[Sequence, Mapping], fields: MutableSequence, repeat_count: int) -> MutableSequence
Polymorphic function to append obj repeat_count times to fields.
If obj is a non-string sequence, it instead concatenates and adds it.
For instance:
>>> append_repeated_object([1, 2], [4], 3)
[4, 1, 2, 1, 2, 1, 2]
>>> append_repeated_object({"a": "b"}, ["a"], 3)
["a", {"a": "b"}, {"a": "b"}, {"a": "b"}]
NOTE: This function treats repeat_count values < 1 as 1.
WARNING: this function does not copy obj or any of its elements, even if
they are mutable. This is not a bug, but can cause unexpected behavior, so
take care (and in particular, always go depth-first if you are using this
function in a recursive operation).
Source code in pdr/utils.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
associate_label_file(data_filename: str, label_filename: Optional[str] = None, skip_check: bool = False) -> Optional[str]
Source code in pdr/utils.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | |
check_cases(filenames: Union[Collection[Union[Path, str]], Union[Path, str]], skip: bool = False) -> str
check for oddly-cased versions of a specified filename in local path -- very common to have case mismatches between PDS3 labels and actual archive contents. similarly, check common compression extensions.
the skip argument makes the function simply return filename.
Source code in pdr/utils.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
check_primary_fmt(data_filename: str) -> Union[str, None]
Source code in pdr/utils.py
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 | |
decompress(filename)
Open FILENAME. If its name suffix indicates one of the supported compression algorithms, transparently decompress it.
Source code in pdr/utils.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | |
find_repository_root(absolute_path)
Source code in pdr/utils.py
187 188 189 190 191 192 193 | |
head_file(fn_or_reader: Union[IO, Path, str], nbytes: Union[int, None] = None, offset: int = 0, tail: bool = False) -> BytesIO
Source code in pdr/utils.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
import_best_gzip()
Source code in pdr/utils.py
151 152 153 154 155 156 157 | |
prettify_multidict(multi, sep=' ', indent=0)
Source code in pdr/utils.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | |
read_hex(hex_string: str, fmt: str = '>I') -> Number
return the decimal representation of a hexadecimal number in a given number format (expressed as a struct-style format string, default is unsigned 32-bit integer)
Source code in pdr/utils.py
31 32 33 34 35 36 37 | |
stem_path(path: Path)
convert a Path to lowercase and remove any compression extensions from it to stem for loose matching
Source code in pdr/utils.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 | |
with_extension(fn: Union[str, Path], new_suffix: str) -> str
Source code in pdr/utils.py
182 183 184 | |