Lead Data Engineer Resume
San Antonio, TX
SUMMARY
- Over 14+ years of professional IT experience with 5Years of Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Experience in installation, configuration, supporting and managingHadoop Components using Cloudera
- Experience in developing a data pipeline through Kafka-Spark API.
- Hands on Experience in AWS EC2, S3, Redshift, EMR, RDS, Glue, Dynamo DB.
- Strong hands-on exposure to AWS Analytics services including Athena, AWS Glue, EMR services.
- Balancer Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in Big Data analysis using Scala and HIVE and understanding of SQOOP .
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
- Worked on performing transformations & actions on RDDs and Spark streaming data.
- Extensive knowledge in TuningSQLqueries, improving the performance of the Database.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Manage all CM tools (JIRA, Confluence, Maven, Jenkins, Git, GitHub, Eclipse) and their usage / process ensuring traceability, repeatability, quality, and support.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Ability to adapt evolving technology, strong sense of responsibility and accomplishment.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Scala, Spark, Spark SQL, Spark Streaming,Kafka, Zookeeper,Oozie
Languages: Java, SQL, XML, C++, C, HTML, CSS, PLSQL, COBOL, PL/1, JCL, Easytrive, Rexx, Spufi, QMF, File Aid, QPP, Shell Script
Design and Modeling: UML and Rational Rose.
Web Services: SOAP, WSDL, Postman
Cloud: AWS EC2, S3, Redshift, EMR, RDS, Glue, Dynamo DB
Version Control and integration: Eclipse, StarTeam, CVS, Clear case, SVN,GIT,JenkinsChangeMan, Endeavor, SCM
Databases: Oracle 10g/9i/8i, SQL Server, DB2, MS-Access
Development Approach: Agile, Waterfall
Environment: s: UNIX, Red Hat Linux, Windows 2000/ server 2008/2007, Windows XP.
PROFESSIONAL EXPERIENCE
Lead Data Engineer
Confidential, San Antonio, TX
Responsibilities:
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Experience in using Apache Sqoop to import and export data to from HDFS and external RDBMS databases.
- Experienced in using the spark application master to monitor thespark jobsand capture the logs for the spark jobs.
- ImplementedSparkusing pyspark and Spark SQL for faster testing and processing of data.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Hadoop Developer with hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Hbase, Zookeeper,Oozie and Flume.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDD's, Spark YARN.
- Hands on Experience in AWS EC2, S3, Redshift, EMR, RDS.
- Migrating the needed data from Oracle, MySQL in to HDFS in using Sqoop and importing various formats of flat files in to HDFS.
- Proposed an automated system using Shell scripts to sqoop the job.
- Developed a data pipeline for data processing using Spark API.
- Developed a strategy for Full load and incremental load using Sqoop
- Implemented POC to migrate map reduce jobs into Spark RDD transformations.
- Good exposure with Agile software development process.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
Environment: YARN, Hive, Scala, Sqoop, Spark Core, Spark SQL, MySQL, ADF, GIT, Agile, Apache Hadoop, HDFS, Oracle,, Kafka-Spark, SparkR, Zookeeper,Oozie
Big Data Engineer
Confidential, San Antonio, TX
Responsibilities:
- Develop HIVE queries for the analysts.
- Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
- Responsible for developing the HIVE script for the corresponding querying (HIVEQL) process.
- Experience in using Apache Sqoop to import and export data to from HDFS and external RDBMS databases.
- Responsible for Performance tuning.
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and schedulingHadoopjobs.
- Experienced in working with spark eco system using Spark SQL and Scala,pysaprk queries on different formats like Text file, CSV file.
Environment: YARN, Hive, Java, Sqoop,Spark Core, Spark SQL, MySQL, ADF, GIT,Agile, Apache Hadoop, HDFS, Pig Hive, Oracle, Zookeeper,Oozie
Mainframe Technical Lead
Confidential San Antonio, TX
Responsibilities:
- Increased user experience and improving performance by retiring the legacy databases, re-engineering the application with new databases and removing unnecessary calls.
- Re-engineer the Monitor diary, CRS diary to new work item tasks which can be stored to IBM DB2 database from legacy IMS DB which improves system performance using TCS MasterCraft™ Enterprise Apps Maker.
- Retire Legacy DB2 tables, IMS diary database and IMS loss report database
- Retired these legacy databases to enable the digital transformation quickly.
- Client Interaction, Onsite/Offshore Coordination
- Analysis on existing system and Model creation on MasterCraft Tool
- Development & Integration Testing
- Participation/Lead Agile ceremonies
- Healthy Environment
Environment: COBOL, PL/1, JCL, Easytrive, Rexx, Spufi, QMF, File Aid, CPP, QPP, Java, Eclipse, Putty, Jira, RTC, star team, TCS Mastercraft, qTest, Splunk, DB2, IMS, Kibana,postman
Mainframe Technical Lead
Confidential, San Antonio, TX
Responsibilities:
- Above projects are aimed at replacing the existing financial Auto and Property transactions. The new system will support all the business processes that the current system supports which includes the following.
- Open, Payment and Close effort, Setting of Reserves
- Modify IMS transactions to support new process for Property and Auto Line of Business.
- Provide interfaces for batch processes that need new Property financial data.
- Provide interface support of the new auto financial data to other financial areas (Loss Reserving, Underwriting and Policy Administration System).
- Position Claims to retire the databases that will not be needed after the financial projects are implemented.
- Provide support for other miscellaneous claims database functionality(CRS, Workers Comp, CEA, SDS, Activity Translation, 1099, Outbound and Escheat)
- Identify reports necessary to support business operations and replace or provide interfaces for those reports.
- Client Interaction, Onsite/Offshore Coordination
- Requirements Study, Analysis & Design
- Development & Integration Testing
- Defect Fixing, QA/Prod support
Environment: COBOL, PL/1, JCL, Easytrive, Rexx, Spufi, QMF, File Aid, CPP, QPP, Java, Eclipse, Putty, Jira, RTC, star team, TCS Mastercraft, qTest, Splunk, DB2, IMS
Mainframe Technical Lead
Confidential San Antonio, TX
Responsibilities:
- Client Interaction, Onsite/Offshore Coordination
- Requirements Study, Analysis & Design
- Development & Integration Testing
- Defect Fixing, QA/Prod support
- Open, Payment and Close effort, Setting of Reserves
- Modify IMS transactions to support new process for Property and Auto Line of Business.
- Provide interfaces for batch processes that need new Property financial data.
- Provide interface support of the new auto financial data to other financial areas (Loss Reserving, Underwriting and Policy Administration System).
- Position Claims to retire the databases that will not be needed after the financial projects are implemented.
- Provide support for other miscellaneous claims database functionality(ISO, Medicare, SDS, Activity Translation, 1099, Outbound and Escheat)
- Identify reports necessary to support business operations and replace or provide interfaces for those reports.
- Implement work item changes necessary to support the new financial accounting info.
- Migrates the legacy losses to new CLR losses
- Involves creating new database queries to insert in to new tables
- Uses comparison tools to compare the legacy and new databases.
Environment: COBOL, PL/1, JCL, Easytrive, Rexx, Spufi, QMF, File Aid, CPP, QPP, Java, Eclipse, Putty, Jira, RTC, star team, TCS Mastercraft, qTest, Splunk, DB2, IMS
Mainframe Developer
Confidential, NJ
Responsibilities:
- Client Interaction, Onsite/Offshore Coordination
- Requirements Study, Analysis & Design
- Development & Integration Testing
- Defect Fixing, QA/Prod support
- The DCB system handles settling of Claims against Policies issued by the Confidential .
- This involves receiving the Claim request, creating a case in the system, allocating appropriate reserves and making payments against the reserves created, and finally closing the Case once all payments are paid.
- The claims created via the CICS screens are processed by the overnight batch jobs. The system interfaces with underwriting systems and other Claims systems
Environment: COBOL, PL/1, JCL, Easytrive, Rexx, Spufi, QMF, File Aid, CPP, QPP, Java, Eclipse, Putty, Jira, RTC, star team, TCS Mastercraft, qTest, Splunk, ChangeMan, Endeavor,VSAM