Senior Data Engineer Resume Costa Mesa, CA - Hire IT People

SUMMARY:

Over 7 Years of professional experience in designing, developing, integrating and testing software applications, which includes 5 years of experience in various Big Data technologies of Hadoop like Map - Reduce, Hive, Spark (core and Spark SQL), Impala and Sqoop and 3+ years of experience in programing.
Hands on experience in working with Big Data in Hadoop ecosystem using Spark, hive, Impala and Map-Reduce.
Hands on experience in programming and implementation of Java, Scala and Python codes with strong knowledge in Object Oriented and Functional Programming Concepts.
In-depth knowledge in implementation of Data Visualization techniques in Spark using Apache Zeppelin.
Highly skilled at SQL and shell scripting operations.
Good knowledge of data architecture including data ingestion pipeline designs, Lambda architecture, data streams, data lakes and data warehouses.
Good knowledge of data modeling and advanced data processing techniques for Structured, Semi Structured and Unstructured data.
Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review with strong analytical and communication skills.
Hands on experience in tuning mappings with expertise in identifying and resolving performance bottlenecks in various levels.
Excellent skills in analyzing system architecture usage, defining and implementing procedures
A quick learner, punctual and trustworthy.
Motivated problem solver and resourceful team member with decent written and verbal communication skills.

TECHNICAL SKILLS:

Hadoop Platforms: Hortonworks(HDP) and Cloudera (CDH)

Filesystems: HDFS, S3

Databases: Hive, Impala

Scheduling: Oozie and crontab

Streaming Engines: Sqoop, Flume and Kafka

Querying Engines: Beeline, Phoenix

NoSQL DBs: HBase, MongoDB and Cassandra

Apache Zeppelin, Tableau, Arcadia, Microsoft: office (Power-point, Word & Excel) and Github.

Development Technologies: Spark, Python, Scala, Java, SQL and Shell Scripting.

IDEs, FTP and SSH tools: Eclipse, Intellij, Pycharm, DBvisualizer, Xshell, Putty, Filezilla and WinScp.

Build tools: SBT, Maven

File Formats: Structured (Delimiter separated values), Semi Structured (json, xml, html etc...), Compressed (Zgip, Snappy, LZO etc...) and Binary (Sequence, Avro, Parquet, ORC etc...).

Operating Systems: Mac, Ubuntu, Linux and Windows.

PROFESSIONAL EXPERIENCE:

Senior Data Engineer

Confidential, Costa Mesa, CA

Environment: Cloudera (CDH), Hadoop, Putty, Spark, beeline, Impala, MySQL, HDFS, Python, Scala, SQL scripting, Sqoop, Linux shell scripting, Eclipse, Intellij, Pycharms, SBT, and Maven.

Responsibilities:

Worked on Designing of ETL pipe line using Spark, Hive and HBase components.
Worked on batch data ingestion by creating a data pipe line using Sqoop and Spark.
Worked on near-real time data ingestion using a data pipe line designed using Kafka and Spark.
Worked on integration of Hive to HBase using HBaseStorageHandlers to enable insertion and updating of Data in the NOSQL HBase storage.
Worked on enabling Transactional tables in Hive to enable row level updates.
Worked on encryption of data using Hive to encrypt sensitive data, in-addition to providing a mechanism in to encrypt the Sqoop passwords use to connect to the Legacy systems.
Worked on designing a framework to input the RDMS and JDBC connections in MYSQL, to automate and unify the data pull processing for Sqoop ingestions.
Used Oozie and crontab application to schedule the Hadoop application jobs to using the cluster effectively.
Worked on real-time data analytics in Spark Streaming for streaming text data by integrating Flume and kafka with Spark Streaming.
Worked on Schema tuning, performance triage/troubleshooting and data distribution for the ingested and existing data in the Enterprise data platform.
Worked on performance tuning, debugging and optimization of hive queries.

Hadoop Solutions Engineer

Confidential, Colorado Springs, CO

Environment: Hortonworks (HDP), Hadoop, Putty, Oracle, MySQL, HDFS, Spark, Hive, arcadia, Python, Scala, SQL scripting, Sqoop, Linux shell scripting, Eclipse, Intellij, Pycharms, SBT, and Maven.

Responsibilities:

Worked on creation of ETL pipe line using Spark, Hive and arcadia components.
Storing and retrieving data from HDFS in different formats like text, json, Sequence, Avro, Parquet, ORC and in compressed formats.
Worked on app and visual creation in Arcadia data to enable data visualization and Descriptive analytics.
Tuned Spark RDD parallelism technics to improving the performance and optimization of the spark jobs on Hadoop cluster.
Designed Hive table schemas using partitioning and bucketing to store tables as both external and internal table.
Worked on developing Hive UDF’s in Python to define custom analytical functions.
Worked on programming spark applications using python and Scala in-addition to optimize the memory parameters for efficient cluster utilization.
Worked on loading data to and from RDBMS to HDFS using Spark and JDBC connectors for integrating Hadoop with MySQL and Oracle.

Hadoop Developer

Confidential , Southlake, TX, US

Environment: Cloudera (CDH), Hadoop, Putty, Oracle, MySQL, HDFS, Spark, Hive, Impala, Python, Scala, SQL scripting, Linux shell scripting, Eclipse, Intellij, Pycharms, SBT, and Maven.

Responsibilities:

Worked on parsing and filtering Semi-Structure data like json using Dataframe/SparkSQL, case classes and also programmatically specifying the schema explicitly.
Worked on modeling data using Avro schema into Parquet format using SparkSQL.
Worked on real-time data analytics in Spark Streaming for streaming text and kafka topic data.
Worked on data preparation methods in spark dataframes using set operations, regular expressions, sorting, parsing arbitrary date/time inputs and converting json arrays values into lists.
Worked on performance tuning, debugging and optimization of hive queries by changing the default YARN values.
Worked on developing Queries to analyses data of different format in Impala and Hive.
Worked on performance tuning, debugging and optimization of hive queries by changing the default YARN values.
Worked on loading data to and from RDBMS to HDFS using Spark and JDBC connectors.
Loading and retrieving data to and from the Local systems into HDFS.

Hadoop Developer

Confidential, Mayfield Village, OH

Environment: Big Data Platform - CDH 5.0.3, Hadoop HDFS, Map Reduce, Hive, Sqoop, Spark, Impala, Java, Shell Scripts, Oracle 10g, Eclipse, Tableau, Putty and Intellij.

Responsibilities:

Prepare technical design documents based on business requirements and prepare data flow diagrams.
Implement new design as per technical specifications.
Integrated Hadoop with Oracle in order to load and then cleanse raw unstructured data in Hadoop ecosystem to make it suitable for processing in Oracle using stored procedures and functions.
Experience in using Map-Reduce programming model for Batch processing of data stored in HDFS.
Developed Java Map-Reduce programs on log data to transform into structured way to find user location, login /logout time and spending time, errors.
Load and transform large sets of structured, semi structured and unstructured data.
Used SQOOP for importing data into HDFS and exporting data from HDFS to oracle database
Built re-usable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive querying
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive-QL queries.
Developed Spark scala Scripts for ETL kind of operation on captured data and delta record processing between newly arrived data and already existing data in HDFS.
Extensively used Pig for data cleansing.
Used Pyspark to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
Experienced in extending Hive and Pig core functionality by writing custom Impala UDFs using Java and Python.
Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the Map-Reduce jobs by analyzing the I/O latency, map time, combiner time, reduce time etc.
Troubleshooting: Used Hadoop logs to debug the scripts.

Hadoop Developer

Confidential, Houston, TX

Environment: Big Data Platform - CDH 4.2.1, Hadoop HDFS, Map Reduce, Hive, Sqoop, IBM DB2, PL/SQL, UNIX, Python, Eclipse.

Responsibilities:

Integrated, managed and optimized utility systems, including assets, devices, networks, servers, applications and data.
Ensured quality integration into the overall functions of smart meters into the system data acquisition and processing.
Enabled the use of metering data for a variety of applications such as billing, outage detection and recovery, fraud detection, finance, energy efficiency, customer care and a variety of analytics.
Analyzed large amounts of raw data to create information. Compiled technical specifications that allowed IT to create data systems, which supported the smart metering system.
Responsible for technical reviews and gave the quick-fix solution for the customer on production defects.
Developed Map-Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the enterprise data warehouse (EDW).
Worked with Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
Worked on developing Scala codes to filter and parse raw data in HDFS using Spark.
Written Map-Reduce java programs to analyze the log data for large-scale weather data sets.
Involved in testing Map-Reduce programs using MRUnit and JUnit testing frameworks.
Customize parser loader application of Data migration to HBase.
Provide support for data analysts in running ad-hoc Pig and Hive queries
Developed PL/SQL Procedures, Functions, and Packages using Oracle Utilities like PL/SQL, SQL Loader and Handled Exceptions to handle key business logic.
Utilized PL/SQL bulk collect feature to optimize the ETL performance. Fine-Tuned and optimized number of SQL queries and performed code debugging.
Developed UNIX & SQL script to load large volume of data for Data Mining & Data Warehousing.

Hadoop Developer

Confidential, NC

Environment: Big Data Platform - CDH 4.0.1, XML, Hadoop HDFS, Spark, Hive, Sqoop, Impala, Oracle 10g, Java, Eclipse.

Responsibilities:

Involved in design and development of server-side layer using XML, JDBC and JDK patterns using Eclipse IDE.
Involved in unit testing, system integration testing and enterprise user testing.
Extensively used Core Java, Servlets, and JDBC.
Developed data pipeline using Hive, Sqoop, Spark and Map Reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
Worked with NoSQL databases like Hbase in creating tables to load large sets of semi structured data coming from various sources.
Wrote MRUnit test cases to test and debug Map Reduce programs in local machine.
Involved in creating Hive tables, loading data and running hive queries in those data.
Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
Developed scripts and Batch Jobs to schedule various Hadoop Program.
Written Hive queries for data analysis to meet the business requirements.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Developed Pig UDF’s to pre-process data for analysis.
Developed Complex and Multi-Step data pipeline using Spark.
Written Spark SQL queries for data analysis.

Hadoop Developer

Confidential, NY

Environment: Big Data Platform - CDH 3, Map-Reduce, Hive, Spark Scripting, JDK 1.6, and Oracle.

Responsibilities:

Involved in analysis, design and development of data collection, data ingestion, and data profiling and data aggregation.
Working in development of controller, Batch and logging module using JDK 1.6.
Worked on development of data ingestion process using FS Shell and data loading into HDFS.
Working in the definition of Hive query for different profiling rules like business checks, outlier’s checks and domain and data range validation.
Working on the automating the generation of Hive query and Map-Reduce programs.
Developed User Defined Function in java and python to facilitate data analysis in Hive and pig.
Managed the end-to-end delivery during the different phase of the software implementation.
Involved in initial POC implementation using Hadoop - Map Reduce, Spark Scripting, and Hive Scripting.
Designed the framework for Data Ingestion, Data Profiling and generating the Risk Aggregation report based various business entities.
Mapped the business requirements and rules with the Risk Aggregation System.
Used JDBC to invoke Stored Procedures and database connectivity to ORACLE.
Code debugging and creating Documentation for future use.

We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

Costa Mesa, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship