Lead Data Engineer Resume Dallas, TX - Hire IT People

SUMMARY

Over all 10 years of experience in Information Technology which includes in Bigdata and Hadoop Ecosystem. 8+ years of experience with Big data, Hadoop technologies.
Hands on experience in configuring HDFS and Hadoop ecosystem components like HBase, Solr, Hive, Tez, Sqoop, Pig, Flume, Oozie, Zookeeper etc.
Hands on experience in writing scripts and Hive Query Language.
Experience in database development using SQL and PL/SQL and experience working on databases like Oracle, Informix and SQL Server.
Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
Expert in performing Data Analysis, Gap Analysis, Co - ordinate with the business, Requirement gathering and technical documents preparation. Experience in multiple distributions i.e. Horton works, cloudera etc.
Hands on experience on build tools like Jenkins, Maven, Ant and Virtualization and Containers (Docker) and Hypervisors ESXI, ESX.
Experience working on NoSQL databases like HBase and knowledge in Cassandra, Redis, MongoDB.
Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
In depth knowledge of Cassandra and hands on experience with installing, configuring and monitoring DataStax Enterprise cluster.
Experience of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker, YARN and Map Reduce.
Developed a data pipeline using Kafka, HBase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data
Working experience in databases such as Oracle, SQL Server, Sybase and DB2 in the areas of Object-Relational DBMS Architecture, physical and logical structure of database, Application Tuning and Query optimization.
Worked on creating Virtual machines using VMware and CHP software.
Handle Virtual Machine Migrations from Azure Classic to Azure Rm using power shell
Experience with installation and configuration of Web sphere, Web Logic, Tomcat and deployment of 3-tier applications.
Proficient in SQL and PL/SQL using Oracle, DB2, Sybase and SQL Server.
Installed, Configured Talend ETL on single and multi-server environments
Experienced in working with Apache Airflow, created different data pipelines using Apache Airflow
Created standard and best practices for Talend ETL components and jobs.
Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical deadlines.
Strong technical and architectural knowledge in solution development.
Effective in working independently and collaboratively in teams.
Good analytical, communication, problem solving and interpersonal skills.
Flexible and ready to take on new challenges.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Spark, Oozie, Flume, Impala, Tez, Kafka, Storm, Sonar, Flume, Hcatalog, yarn, Cassandra, And Mesos.

Big Data Distributions: Horton Works, Cloudera, Apache

Programming: Python, and PL/SQL

Database: Oracle 10g, DB2, SQL, No sql (MongoDB, Cassandra, HBase), Snowflake

Web/App Server: WebSphere Application Server 7.0, Apache Tomcat 5.x 6.0, Jboss 4.0

Web Languages: XML, XSL, HTML, JavaScript, JQuery and JSON.

ETL: Talend and Informatic 9.x/8.X (Integration Service / Power Center) IWX (Info works)

Messaging Systems: JMS, Kafka and IBM MQ Series

Version Tools: Git, SVN, and CVS

Scripts: Shell, Python, Maven and ANT

OS & Others: Windows, Linux, SVN, Clear Case, Putty, WinSCP and FileZilla

Cloud (AWS): AWS (EC2, S3, CloudWatch, RDS, ElastiCache, ELB, IAM), M, Rackspace, Openstack, CloudFoundry.

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Lead Data Engineer

Responsibilities:

Good hands on experience in various Big Data technologies like Hadoop, Map Reduce and HDFS
Working with Data Analysts to create metrics based on business requirements and evaluate them in Prodstage thereby move them into Production environment.
Playing a vital role in creating Technical SQL for snowflake based on Business SQL given by the Data Analysts
Troubleshooting production support issues post-deployment and come up with solutions as required
Implementing CICD using in house automation frameworks and GitHub as VCS.
Reviewing software documentation to ensure technical accuracy, compliance, or completeness, with focus to mitigate risks
Designing test plans, scenarios, scripts, and/or procedures to determine product quality or release readiness
Performing test execution and capturing results documentation by executing shell scripts on EMR.
Worked on auto scaling the instances to design cost effective, fault tolerant and highly reliable systems.
Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Cloud Compute (EC2) and Amazon Simple Storage Service (S3).
Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups.
Expertise in Extraction, Transformation, loading data from JSON, DB2, SQL Server, Excel, Flat Files.
Developed Streaming based solutions based on Kafka Streaming API
Implemented Windowing technique using confluent KafkaStream API
Experienced in integrating kafka with Spark Structured streaming
Enhanced the performance of queries and daily running ETL jobs using the efficient design of partitioned tables.
Developed data extraction pipelines using pandas data frame in python.
Performing initial debugging procedures by reviewing configuration files, logs, or code pieces to determine breakdown source

Environments/Tools: GitHub, SQL, AWS (EC2, S3 & EMR), Docker, Control-m, UNIX, Shell scripts, Python, Spark, Snowflake, Nebula - Master Data Management tool,Jupyter Notebooks.

Confidential, Atlanta, GA

Lead Hadoop Developer

Responsibilities:

Developed Spark SQL script for handling different data sets and verified its performance over MR jobs.
Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
Involved in loading data from LINUX file system to HDFS
Experience in Writing Map Reduce jobs for text mining and worked with predictive analysis team to check the output and requirement.
Developed custom aggregate functions using SparkSQL and performed interactive querying.
Used Scoop to store the data into HBase and Hive.
Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode high availability, capacity planning, and slots configuration.
Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using b.
Implemented partitioning, dynamic partitions and buckets in HIVE.
Developed Hive Scripts to create the views and apply transformation logic in the Target Database.
Monitored multiple Hadoop clusters environments using ClouderaManager. Monitored workload, job performance and collected metrics for Hadoop cluster when required.
Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and MySQL.
Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
Manage and support the Teradata EDW including their client tools i.e. Teradata Studio and SQL Assistant connecting to the Datalike
Designed a workflow which can download binary files directly into cluster/local directory.
Developed a Mapr document which it spins up in the OpenShift environment when the job is running successfully.
Used Sonar to check the code issues successfully which propagates the code clarity.

Environments/Tools: Apache Hadoop 2.x, HDFS, YARN, Map Reduce, Hive, HBase, Splunk, Kafka, LDAP, Kerberos, Oracle Server, MySQL Server, Elasticsearch, Crontab, Core Java, Linux, Bash scripts

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

Data is ingested from sources like Oracle and DB2, performed data transformations and then export the transformed data to Cubes as per the business requirement.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from Oracle, DB2 and Teradata into HDFS using Sqoop.
Involved in creating Hive tables, loading with data using HQL scripts which will run internally in map reduce way
Written customized Hive UDFs in Java where the functionality is too complex.
Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
Applied windowing functions, aggregations, time and date functions on the data as per the business logic.
Developed dynamic partitioned Hive tables and store data by timestamp, source type for efficient performance tuning.
Scheduled sqoop ingestions and Hive transformations (Hql scripts) using Oozie, Maestro schedulers
Worked with different File Formats like TEXTFILE and ORC for HIVE querying and processing.
Experienced in Querying data using SparkSQL on top of Spark Engine, implementing Spark RDD's in Scala.
Worked on Apache spark writing python applications to convert txt, xls files and parse.
Experience in integrating Apache Kafka with Apache Spark for real time processing.
Used NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
Performed various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.

Environment: HDFS, Hive, Map Reduce, Java, HBase, Pig, Sqoop, Oozie, MySQL, SQL Server, Windows and Linux.

We provide IT Staff Augmentation Services!

Lead Data Engineer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship