Senr BigioÂ DATA ENGINEER Resume VA - Hire IT People

SUMMARY:

Over 8 years of progressive professional experience in data analysis, system design, development using Bigdata/Hadoop, Teradata and Mainframes technologies
More than 4+ years of in depth knowledge and hands on experience on Bigdata/Hadoop core components like MapReduce, HDFS, Spark, Hive, Impala, Sqoop, Oozie, Hue
Experienced in working with business experts to identify, prioritize and implement solutions to improve efficiency and support new business
Experienced in supporting ad - hoc requests and creating reusable queries using MS Excel and SQL Query
Good in analyzing gaps in current process and work with SME to resolve and process the data on priority
Experienced professional with a successful career in banking, finance and insurance domain
Good acumen in software development cycles involving system study, analysis, development, enhancement, implementation and support activities
Experienced in CI/CD related activities involving GIT, BitBucket, Jenkins, Artifactory, Ansible & JIRA
Experienced in Agile and Waterfall methodology
Vast experience in mapping client requirement and designing the solutions by understating the core of the change
Experienced in ETL processing via Spark, Hive and ETL tools
Experienced in writing Unix Shell scripting for builds and deployment in different environments
Strong experience working in relational database like Teradata and in mainframes
Good exposure to integrated testing, data analysis and data validation on Hadoop Environment
Proactive nature has earned laurels from clients
Excellent interpersonal skills which helps in clearly stating and recording ideas
Vast analytical, organizational and leadership skills earned vital roles

TECHNICAL SKILLS:

Hadoop Technology: Cloudera Hadoop, MapReduce (MR1, MR2- YARN), Spark, HDFS, Hive, Impala, Pig, Sqoop, Oozie, Hue, Cloudera Manager, Kafka, Flume, HCatalog, Spark Streaming, PySpark, HBase, Druid, TEZ, Ambari, Jupyter Hub / Zeppelin Notebooks

Operating System: UNIX, Linux, MS-DOS, Windows, OS 390 Mainframe

Database: Teradata, DB2, IMS, Presto DB

ETL Tool: DMXpress Hadoop ETL tool by Syncsort, Talend, IBM DataStage, IBM Big SQL

CI/CD Tool: JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory, Ansible, CVS

Scheduling Tool: Autosys, CA7, Crontab, Tidal

Language: Java, Scala, JCL, COBOL, SQL, Unix shell script, Python, IMS

Other Software: Eclipse, Maven, SharePoint, Maximo/Remedy, Teradata SQL assistant, TSO/ISPF, MS Office Tools, QlikView

PROFESSIONAL EXPERIENCE:

Confidential, VA

Senr Big io DATA ENGINEER

Responsibilities

Attending business meetings and collaborate with business teams to understand & articulate the project requirements and assess them with the development team
Helping to create high quality documentation supporting the design/coding task (Data Lineage, Data Mapping, High/Low Level Design etc.)
Developing the new ETL/ELT framework using Big Data (Hadoop, Teradata, Informatica, IBM Data Stage etc.)
Performing Data migration/Data Ingestion from source systems to distributed file systems using various tools like NiFi, SQOOP, IBM Data Stage, TDCH etc. to achieve best results and maximum throughput
Performing POC for real time streaming data using Kafka and Spark Streaming
Creating Data Lake with maintain staging and base layer on Hadoop to maintain various datasets from different line of business
Performing ETL/ELT operation via Spark (PySpark) using Spark SQL, RDD operations etc. and store resultant as tables in Hive
Creating hive tables (external/managed) with partitioning/buckets based on the amount of data that is being processed
Deciding the ideal storage platform for the application being designed based on the type of the data (historical or incremental), format of the data (structured, semi-structured and unstructured), compression requirements, data frequency, pattern and consumer of the data
Preparing a security framework to maintain the data privacy for the data stored on the distributed file system
Creating reusable components to be used to perform similar set of operations
Performing Data archival by creating components using DistCP and Hadoop archival command to reduce name node utilization
Getting the data ready for the data visualization tools like Tableau, QlikView, Jupyter Hub / Zeppelin Notebooks
Performing performance tuning on existing applications and increase their throughput by various techniques
Working with the system admins to change cluster configuration /settings in order to achieve optimal performance from cluster
Using Presto DB to perform analytical queries for business users related
Using JIRA, SVN, BitBucket, GIT for CI/CD related activities

Confidential, NC

LEAD Big DATA Developer cum Analyst (Teradata & Hadoop) | Period

Responsibilities

Gathering necessary information from users on Anti Money Laundering via Transaction, Customer profiling and Posting data
Doing end to end data analysis to understand the business logic and design approach to build a new data model
Creating mapping documents for AML domain for data lineage involving Teradata and Hadoop as Source and Target systems respectively
Cataloging and documenting the data sources applicable for use cases to form the data layer
Creating data flow diagram using MS VISIO, etc.
Creating data models for the landing zone using the documentations
Working with the Business Team to gather the requirements and prioritize their needs
Developing and implementing data collection reports that optimized statistical efficiency and data quality
Working with clients on initiatives involving Architecture, Data Warehousing, Data Platform Migration, Performance & Optimization, Data Analysis, ETL Development, and Hadoop Data Integration leveraging my knowledge of Hadoop, Teradata, ETL and Analytics to solve Customers' problems
Understanding the various sources involved for formulating various use case like Panama Papers, FinCen 314(a), Foreign Terrorist Fighters etc.
Understanding the various data sources involving various transaction mode like Wire, Cash, Card, etc.
Proposing technical solution and paving out the plan for successful implementation
Preparing High-Level and Low-Level Design document
Supporting the daily load and incremental load from Teradata to Hadoop data layer
Embedding data quality checks using Teradata, Hive, Spark etc.
Performing the unit testing and tune the code as required
Preparing necessary technical standard and functional manuals for application
Scheduling the jobs using Autosys scheduler
Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities

Confidential, NC,

LEAD Big DATA Developer cum Analyst (Teradata & Hadoop)

Responsibilities

Converting the existing Mainframe - Teradata ETL to Hadoop ETL in order to leverage Teradata computational storage
Doing end to end data analysis to understand the business logic and design approach to build a new data model
Creating mapping documents for data lineage involving Teradata and Hadoop as Source and Target systems respectively
Cataloging and documenting the data sources applicable for use cases to form the data layer
Creating data flow diagram using MS VISIO, etc.
Creating data models for the landing zone using the documentations
Proposing technical solution and paving out the plan for successful implementation
Preparing High-Level and Low-Level Design document
Using Syncsort’s DMX-H ETL tool to facilitate application development in HDFS
Developing Map Reduce and Spark codes to support the use cases
Using JAVA and SCALA for programming
Developing HIVE Scripts equivalent to Teradata
Using SQOOP to import the data in/out of Teradata
Developing automated scripts for all jobs in order to complete loading data from MAINFRAME to TERADATA after processing in Hadoop
Handling data from FLUME and KAFKA sources via SPARK STREAMING
Scheduling the Hadoop jobs using OOZIE and AUTOSYS
Developing customized HIVE UDFs
Handling Fixed block, Variable block, Text Delimited, Binary, AVRO, PARQUET files
Using Network Data Movement (NDM) / Connect Direct to move data across servers
Developing MAP REDUCE and SPARK code to structure the data
Using IMPALA for end user queries and validation
Solving issues raised by other application teams via Nexus request
Building archival and recovery jobs for DR purpose
Building reusable common components which will reduce application coding effort
Preparing necessary technical standard and functional manuals for application
Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities

Confidential

Lead Developer cum analyst

Responsibilities

Used Hadoop as a data processing layer when moving the data from MAINFRAME to TERADATA
Used Syncsort’s DMX-H ETL tool to facilitate application development in HDFS
Developed MAP REDUCE using JAVA for data manipulation
Used HIVE, OOZIE and SQOOP extensively for ETL processing
Created a batch calculation process with help of historical data which consisted of account balance, aggregated deposits & investments of the customer
Designed the model and flow to achieve the requirement
Changed the BTEQ /MLOAD/TPUMP/FLOAD/FASTEXPORT/TPT/JCL scripts as per requirement
Wrote and executed the Teradata SQL scripts to validate the end data
Created views on the tables along with access categories to provide data access to the users
Prepared design, test plan, implementation plan, test scripts, validation script and unit testing documents
Prepared Job flow diagram in MS VISIO in order to handover the implementation to production support team
Tuned the bad performing Teradata SQL queries and inefficient collect stats
Provided root cause analysis on critical and non-critical issues occurred in production
Analyzed the dashboard and performance metrics
Prepared necessary technical and functional manuals for the application
Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities

We provide IT Staff Augmentation Services!

Senr BigioÂ data Engineer Resume

VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship