Senr Bigio data Engineer Resume
VA
SUMMARY:
- Over 8 years of progressive professional experience in data analysis, system design, development using Bigdata/Hadoop, Teradata and Mainframes technologies
- More than 4+ years of in depth knowledge and hands on experience on Bigdata/Hadoop core components like MapReduce, HDFS, Spark, Hive, Impala, Sqoop, Oozie, Hue
- Experienced in working with business experts to identify, prioritize and implement solutions to improve efficiency and support new business
- Experienced in supporting ad - hoc requests and creating reusable queries using MS Excel and SQL Query
- Good in analyzing gaps in current process and work with SME to resolve and process the data on priority
- Experienced professional with a successful career in banking, finance and insurance domain
- Good acumen in software development cycles involving system study, analysis, development, enhancement, implementation and support activities
- Experienced in CI/CD related activities involving GIT, BitBucket, Jenkins, Artifactory, Ansible & JIRA
- Experienced in Agile and Waterfall methodology
- Vast experience in mapping client requirement and designing the solutions by understating the core of the change
- Experienced in ETL processing via Spark, Hive and ETL tools
- Experienced in writing Unix Shell scripting for builds and deployment in different environments
- Strong experience working in relational database like Teradata and in mainframes
- Good exposure to integrated testing, data analysis and data validation on Hadoop Environment
- Proactive nature has earned laurels from clients
- Excellent interpersonal skills which helps in clearly stating and recording ideas
- Vast analytical, organizational and leadership skills earned vital roles
TECHNICAL SKILLS:
Hadoop Technology: Cloudera Hadoop, MapReduce (MR1, MR2- YARN), Spark, HDFS, Hive, Impala, Pig, Sqoop, Oozie, Hue, Cloudera Manager, Kafka, Flume, HCatalog, Spark Streaming, PySpark, HBase, Druid, TEZ, Ambari, Jupyter Hub / Zeppelin Notebooks
Operating System: UNIX, Linux, MS-DOS, Windows, OS 390 Mainframe
Database: Teradata, DB2, IMS, Presto DB
ETL Tool: DMXpress Hadoop ETL tool by Syncsort, Talend, IBM DataStage, IBM Big SQL
CI/CD Tool: JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory, Ansible, CVS
Scheduling Tool: Autosys, CA7, Crontab, Tidal
Language: Java, Scala, JCL, COBOL, SQL, Unix shell script, Python, IMS
Other Software: Eclipse, Maven, SharePoint, Maximo/Remedy, Teradata SQL assistant, TSO/ISPF, MS Office Tools, QlikView
PROFESSIONAL EXPERIENCE:
Confidential, VA
Senr Big io DATA ENGINEER
Responsibilities
- Attending business meetings and collaborate with business teams to understand & articulate the project requirements and assess them with the development team
- Helping to create high quality documentation supporting the design/coding task (Data Lineage, Data Mapping, High/Low Level Design etc.)
- Developing the new ETL/ELT framework using Big Data (Hadoop, Teradata, Informatica, IBM Data Stage etc.)
- Performing Data migration/Data Ingestion from source systems to distributed file systems using various tools like NiFi, SQOOP, IBM Data Stage, TDCH etc. to achieve best results and maximum throughput
- Performing POC for real time streaming data using Kafka and Spark Streaming
- Creating Data Lake with maintain staging and base layer on Hadoop to maintain various datasets from different line of business
- Performing ETL/ELT operation via Spark (PySpark) using Spark SQL, RDD operations etc. and store resultant as tables in Hive
- Creating hive tables (external/managed) with partitioning/buckets based on the amount of data that is being processed
- Deciding the ideal storage platform for the application being designed based on the type of the data (historical or incremental), format of the data (structured, semi-structured and unstructured), compression requirements, data frequency, pattern and consumer of the data
- Preparing a security framework to maintain the data privacy for the data stored on the distributed file system
- Creating reusable components to be used to perform similar set of operations
- Performing Data archival by creating components using DistCP and Hadoop archival command to reduce name node utilization
- Getting the data ready for the data visualization tools like Tableau, QlikView, Jupyter Hub / Zeppelin Notebooks
- Performing performance tuning on existing applications and increase their throughput by various techniques
- Working with the system admins to change cluster configuration /settings in order to achieve optimal performance from cluster
- Using Presto DB to perform analytical queries for business users related
- Using JIRA, SVN, BitBucket, GIT for CI/CD related activities
Confidential, NC
LEAD Big DATA Developer cum Analyst (Teradata & Hadoop) | Period
Responsibilities
- Gathering necessary information from users on Anti Money Laundering via Transaction, Customer profiling and Posting data
- Doing end to end data analysis to understand the business logic and design approach to build a new data model
- Creating mapping documents for AML domain for data lineage involving Teradata and Hadoop as Source and Target systems respectively
- Cataloging and documenting the data sources applicable for use cases to form the data layer
- Creating data flow diagram using MS VISIO, etc.
- Creating data models for the landing zone using the documentations
- Working with the Business Team to gather the requirements and prioritize their needs
- Developing and implementing data collection reports that optimized statistical efficiency and data quality
- Working with clients on initiatives involving Architecture, Data Warehousing, Data Platform Migration, Performance & Optimization, Data Analysis, ETL Development, and Hadoop Data Integration leveraging my knowledge of Hadoop, Teradata, ETL and Analytics to solve Customers' problems
- Understanding the various sources involved for formulating various use case like Panama Papers, FinCen 314(a), Foreign Terrorist Fighters etc.
- Understanding the various data sources involving various transaction mode like Wire, Cash, Card, etc.
- Proposing technical solution and paving out the plan for successful implementation
- Preparing High-Level and Low-Level Design document
- Supporting the daily load and incremental load from Teradata to Hadoop data layer
- Embedding data quality checks using Teradata, Hive, Spark etc.
- Performing the unit testing and tune the code as required
- Preparing necessary technical standard and functional manuals for application
- Scheduling the jobs using Autosys scheduler
- Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities
Confidential, NC,
LEAD Big DATA Developer cum Analyst (Teradata & Hadoop)
Responsibilities
- Converting the existing Mainframe - Teradata ETL to Hadoop ETL in order to leverage Teradata computational storage
- Doing end to end data analysis to understand the business logic and design approach to build a new data model
- Creating mapping documents for data lineage involving Teradata and Hadoop as Source and Target systems respectively
- Cataloging and documenting the data sources applicable for use cases to form the data layer
- Creating data flow diagram using MS VISIO, etc.
- Creating data models for the landing zone using the documentations
- Proposing technical solution and paving out the plan for successful implementation
- Preparing High-Level and Low-Level Design document
- Using Syncsort’s DMX-H ETL tool to facilitate application development in HDFS
- Developing Map Reduce and Spark codes to support the use cases
- Using JAVA and SCALA for programming
- Developing HIVE Scripts equivalent to Teradata
- Using SQOOP to import the data in/out of Teradata
- Developing automated scripts for all jobs in order to complete loading data from MAINFRAME to TERADATA after processing in Hadoop
- Handling data from FLUME and KAFKA sources via SPARK STREAMING
- Scheduling the Hadoop jobs using OOZIE and AUTOSYS
- Developing customized HIVE UDFs
- Handling Fixed block, Variable block, Text Delimited, Binary, AVRO, PARQUET files
- Using Network Data Movement (NDM) / Connect Direct to move data across servers
- Developing MAP REDUCE and SPARK code to structure the data
- Using IMPALA for end user queries and validation
- Solving issues raised by other application teams via Nexus request
- Building archival and recovery jobs for DR purpose
- Building reusable common components which will reduce application coding effort
- Preparing necessary technical standard and functional manuals for application
- Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities
Confidential
Lead Developer cum analyst
Responsibilities
- Used Hadoop as a data processing layer when moving the data from MAINFRAME to TERADATA
- Used Syncsort’s DMX-H ETL tool to facilitate application development in HDFS
- Developed MAP REDUCE using JAVA for data manipulation
- Used HIVE, OOZIE and SQOOP extensively for ETL processing
- Created a batch calculation process with help of historical data which consisted of account balance, aggregated deposits & investments of the customer
- Designed the model and flow to achieve the requirement
- Changed the BTEQ /MLOAD/TPUMP/FLOAD/FASTEXPORT/TPT/JCL scripts as per requirement
- Wrote and executed the Teradata SQL scripts to validate the end data
- Created views on the tables along with access categories to provide data access to the users
- Prepared design, test plan, implementation plan, test scripts, validation script and unit testing documents
- Prepared Job flow diagram in MS VISIO in order to handover the implementation to production support team
- Tuned the bad performing Teradata SQL queries and inefficient collect stats
- Provided root cause analysis on critical and non-critical issues occurred in production
- Analyzed the dashboard and performance metrics
- Prepared necessary technical and functional manuals for the application
- Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities