We provide IT Staff Augmentation Services!

Senior Data Engineer  Resume

2.00/5 (Submit Your Rating)

Mcclellan, CA

SUMMARY:

  • Big Data Hadoop, Business Intelligence, Enterprise Data Warehousing, Master Data Management, Data Governance, Data Visualization
  • 13+ years of total experience in the Banking, Insurance, Manufacturing business domains working with technologies from Big Data, Data Lake, Cloud, Mobile, Enterprise Applications, Data Integration
  • Experience in Hadoop/Big Data, Cloud frameworks and its Enterprise wide implementations, Integrations and applications.
  • Expertise in Big Data Architecture design, planning, installation, application development, deployment and migration of traditional Data Warehouse solutions to Hadoop based Integrated Data Lakes and Enterprise Data Hub (EDH).
  • Strong data and event - driven architecture skills to outlay both tactical and strategic roadmaps for large enterprises that integrate with existing systems, processes for building scalable, low latency & fault-tolerant systems.
  • 4+ years of hands-on experience in designing large data processing frameworks with Hadoop, Spark, Pig, Hive, Flume, Sqoop, MapReduce and NoSQL, Linux data stores and proficient with Hadoop ecosystem and large data applications that run on multi-clustered environments.
  • Hands-on experience working on HDFS and NoSQL databases such as MongoDB, HBase & Cassandra.
  • Hand-on architectural experience with AWS (S3, EMR (emr, MapR), API Gateway, Lambda & so on) Design and build scalable Big Data Infrastructure and platforms to gather and process very large amounts of data (structured and unstructured) including streaming real-time data.
  • Extensive expertise benchmarking, debugging, monitoring and performance tuning.
  • Expertise in building real-time and near-real time complex stream processing platforms using tools such as Spark, Flume, Kafka for a scalable and fault tolerant system and integration with legacy systems (Mainframes/ DB2)
  • Actively lead integrating cross domain applications; overseeing & supporting project delivery team through end to end project implementation.
  • Led and managed successful implementation of Data Science practices using open source tools such as R, Spark ML, RStudio and that leveraged Big Data clusters for model executions.
  • Strong understanding of Data Security, Compliance, Vendor certifications and capabilities across product platforms, Data Governance, Audit and Data Lineage requirements.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, Map Reduce (MRv1, MRv2), HDFS, YARN, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Spark, Spark Streaming, Spark SQL(Data Frames)

Architectures: Lambda, Kappa

Machine Learning: R, Spark ML

NoSQL: HBase, Cassandra, MongoDB

Data Lineage: Cloudera Search, Solr, Navigator

Distributions: HDP (2.X), CDH (4.x, 5.x), MapR (4.x)

Data Visualization: Tableau, Micro Strategy

ETL: Talend Big Data, Informatica

Cloud: AWS (EMR, MapR), Lambda, S3, RDS, ELB, API Gateway, Elastic Beanstalk, RedShift, Kinesis, Data Pipeline

Languages: Scala, XML, Shell, Python, R

Databases: Oracle (8i, 10g), MySQL,Teradata

Build & Configuration(CI): ant, maven, chef, puppet, Jenkins

Code Review: Sonar, FishEye, Crucible

Code Repositories: CVS, CodeCommit(AWS)

Platforms: RedHat (6.7), CentOS, Solaris, Windows, Ubuntu.

Big Data: Hadoop, Business Intelligence, Data Lake, Enterprise Data Warehousing, Master Data Management, Data Governance, Data Visualization

Project Management: Project Planning, Communication, Risk Management, Problem solving and Decision making, SDLC methodologies - Waterfall, Agile (SCRUM, LEAN)

Banking: Collateral & Private Wealth Management, Investment Banking, Value At Risk reports, BSA/AML

Regulatory Reporting Projects: BSA/ AML, EVARE, SWIFT

Insurance: - Insight into Business and General Insurance products and web based applications

PROFESSIONAL EXPERIENCE:

SENIOR DATA ENGINEER

Confidential, McClellan, CA

Responsibilities:

  • To manage and monitor cluster availability, implementation and support of the Enterprise Hadoop environment.
  • To engage in design, capacity planning, cluster set up, monitoring, structure planning, scaling and administration of Hadoop components ((Cloudera CDH5, YARN, Cloudera CDH5, HDFS, HBase, Zookeeper, * Storm, Kafka, Spark, Pig and Hive)
  • To work with core production support personnel in IT and Engineering to automate deployment and operation of the infrastructure.
  • To deliver on analysis, development and maintenance of the Data Lake / Data Hub as well as the feed to/from all subscriber applications using the Hadoop ecosystem.
  • Hands-on development and support of integrations with multiple systems and ensuring accuracy and quality of data by implementing business and technical reconciliations.
  • To offer hands on support for development and maintenance of the Hadoop Platform and various associated components for data ingestion, transformation and processing.
  • To ensure data quality and accuracy by implementing business/ technical reconciliations via scripts and data analysis.
  • To develop and support RDBMS objects and code for data profiling, extraction, load and updates.
  • To work both independently and collaboratively with various teams and global stakeholders (Business Analysts/ Architects/ Support/ Business) while working on projects or data quality issues.
  • To perform data ingestion using ETL tools, specifically Sqoop Big Data Edition and Hadoop transformation (using Cloudera CDH5, Spark/Scala)
  • To work on Unix / Linux environment, as well as Windows environment
  • To be involved in profiling and ingestion of data from all parts of the organization in order to hydrate the growing Data Lake using Big Data technologies.
  • To drive data source discovery and identify potential data integration opportunities, Identifying gaps in the existing data processes and provide cost effective solutions.
  • To drive and support analysis and data processing of huge datasets - both real time and batch processing methods.
  • To extract data from HDFS into Spark RDD.
  • To apply transformations to logically move the data into a pair RDD.
  • To monitor the health of the batch, real time and cloud components of the Data Lake environment and troubleshoot and take actions when issues arise.
  • To eliminate manual production support activities through improved processes and technologies with an eye toward automation, driving efficiencies.
  • To perform unit, system and integration testing as well as participating in peer reviews.

DELIVERY PROJECT LEAD/ BIG DATA ARCHITECT

Confidential, CA

Responsibilities:

  • Responsible for Enterprise and Systems Architecture, Design and Application development initiatives, strategic roadmap, solution delivery, etc. that aligns with overall IT strategy of the Hadoop implementation.
  • Worked with Enterprise Architecture Board in creating the technology road map, reference architecture and deriving the project strategy.
  • Introduced the Lambda architecture at Confidential that provides the latest state of the data combining the batch (slow data) and real-time event (fast data) to get the current state of the data.
  • Lead the application development of Batch Ingestion frameworks using custom MapReduce programs, Hive, HDFS and used Pig scripting for data cleansing and transformations. Used Oozie as the workflow scheduler for the batch data pipeline.
  • Developed the warehouse specific Data Lake using Hive and Pig scripting and also ETL pipelines for populating the Data Marts for user/business consumption using Hive and Spark.
  • Lead the development efforts of migration of historical data from existing warehouses to Hadoop using Sqoop for scalable processing of the data and the eventual insights are sqooped back.
  • Architect the Elastic Search(ELK) stack and integrated with the CDH for index of data on ecosystem services.
  • Direction to Compliance and IT Security groups in designing, documenting and implementing Perimeter Security, Access Management, Auditing and Data Encryption for Hadoop Clusters.
  • Direction to Infrastructure teams in setting up CDH 5.x Hadoop Clusters, Configuration Management, Capacity Planning, Disaster Recovery(DR) and Business Continuity (BC) cluster and Cluster Management among the Prod, Perf, QA and DEV clusters.
  • Direction to Admin teams for configuring and implementing Knox, Kerberos, Sentry, Data Encryption solutions on Hadoop Cluster.
  • Responsible for evaluation, internal branding, pilot/evaluation and full-scale implementation of 3rd party vendors (BI, Visualization, Analytics, Machine Learning, ETL, etc.) that can integrate and leverage Hadoop as processing platform.
  • Responsible for overlook of new technologies and possible integration of those to build a robust, scalable and configurable technology solutions that would should be leveraged by new products enabled for the Confidential ’s banking customers.

DELIVERY PROJECT LEAD/ BIG DATA ARCHITECT

Confidential

Responsibilities:

  • To collaborate with IT teams and management to devise a data strategy that addresses industry requirements.
  • To design, implement and help maintain enterprise solutions on the big data analytics platform.
  • To demonstrate extensive knowledge of distributed systems, process flows and procedures to aid analyses and recommendations for solution offerings.
  • To demonstrate thorough understanding of data structures, algorithm design and architectural design.
  • To design and implement data processing pipelines using Big Data technologies such as Hadoop, Cloudera CDH5, Cloudera CDH5educe, Hive, NOSQL & so on.
  • To design and implement different architectural models for scalable data processing & data storage.
  • To build an inventory of data needed to implement the architecture and research on opportunities for data acquisition.
  • To implement measures to ensure data accuracy and accessibility.
  • .Constantly monitor, refine and report on the performance of data management systems.
  • To drive and lead performance optimization initiatives.
  • To maintain a repository of all data architecture artifacts and procedures.
  • To demonstrate vigilance in implementing data privacy and data security.
  • Ownership generating and presenting daily status dashboard to stakeholders including leadership teams flagging issues/ risks clearly detailing impact in a timely manner.

ETL DELIVERY PROJECT LEAD

Confidential

Responsibilities:

  • Participated in business user meetings, gathered Business requirements & specifications for the Data-warehouse design, translated the user inputs into ETL design docs.
  • Involved in the creation of Informatica mappings to extracting data from Oracle, SQL server, Flat Files loading Worked on data mapping, data cleansing, program development for loads, and data verification of converted data to legacy data.
  • Involved in error handling, performance tuning of mappings, testing of Stored Procedures and Functions, Testing of Informatica Sessions and the Target Data.
  • To ensure data is loaded into Data Warehouse/Data Marts using Informatica utilities.
  • Demonstrated extensive knowledge in performing Data Analysis, Data Verification and Validation of ETL Applications by carrying out Backend/Database Testing
  • Testing and debugging of all ETL and Teradata objects in order to evaluate the performance and to check whether the code is meeting the business requirement.
  • Responsible for code migrations/reviews and to manage Defect Analysis and Issue resolution.

TEAM LEAD

Confidential

Responsibilities:

  • Design multi-dimensional data-marts, cubes, dimensions and measures.
  • Design and Develop the ETL process.
  • Anticipate potential technical problems and build systems, procedures, and programs to maintain operational requirements.
  • Performance Improvement of the Existing SQL.
  • Applying fixes for production incidents.
  • Adding new source systems to the Existing DWS.

SENIOR TECHNICAL ASSOCIATE

Confidential

Responsibilities:

  • Designed and developed ETL jobs. Designed multi-dimensional Data marts, Cubes, dimensions & measures.
  • Designed and developed UNIX shell scripts as part of the ETL process to compare control totals, automate the process of loading, pulling and pushing data from/to different servers. Involved in optimization & performance tuning logic on target/source mappings and sessions for increased efficiencies of session and scheduled Workflows using AutoSys.
  • Implemented complex mappings such as Slowly Changing Dimensions (Type2) using Flag.
  • Designed and Developed pre-session, post-session routines for Informatica sessions to drop and recreate indexes and key constraints for Bulk Loading.
  • Extended production support and ensured timely resolution of issues.

We'd love your feedback!