Bigdata Engineer Resume Dallas, TX - Hire IT People

SUMMARY

6 years of work experience in IT, which includes experience in the Development, Testing, Implementation and documentation of Big data and Hadoop ecosystem.
Experience in dealing with Hadoop components like HDFS, MapReduce, HiveQL, Hive, Sqoop, and Spark.
Experience in writing shell scripts to extract, transform and load data onto HDFS for processing.
Excellent knowledge of data mapping, extract, transform and load from different data source.
Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
Pleasant experience in analyzing substantial amounts of data sets writing python scripts and Hive queries.
Extensive experience in using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux for big data resources.
Designs and implements data ingestion techniques for real time and batch processes for structured and unstructured data sources into Hadoop ecosystems and HDFS clusters
Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
Design, implement, and improve data pipelines throughout our data platform from data ingestion through the endpoints used to make data actionable
Experience in data ingestion technologies such as Sqoop and NIFI.
Analyzed the SQL scripts and designed the solution to implement using PySpark.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python.
Good experience in handling data manipulation using Python Scripts.
Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

TECHNICAL SKILLS

Hadoop Ecosystem Tools: HDFS, MapReduce, Spark, Nifi, Hive, Sqoop.

Languages: SQL, Shell Scripting, Python and Scala

Databases: MySQL, Oracle, MS SQL Server

Cloud: AWS

Operating Systems: Linux, Windows XP, Server 2003, Server 2008.

Version Control: GitHub and SVN

Defect Tracking: JIRA

Monitoring Tools: Grafana, Kibana

Tools: Eclipse, Tableau, SQL Developer, TOAD, Notepad++, PyCharm, Sublime Text.

PROFESSIONAL EXPERIENCE

Bigdata Engineer

Confidential, Dallas, TX

Responsibilities:

As part of DFW Viewership Project, AT&T collects customers home - location, work-location and the general pattern of day to day work is being collected in order to send alerts/offers with respect to special events or targeted advertisements in their specific location.
Involved in Data Driven decision based Big data projects on HDP 2.6 and 3.1 Data Lake clusters related to Direct-TV, Viewership data, Call Detail Record Data, Wireless Roaming, Sales data with tables having billions of records.
As part of this project, developed Spark scripts to collect the raw data which is in the form of structured, semi-structured and unstructured, populate staging tables and store the refined data in partitioned tables in KM datalake that can be accessed by Data scientists to build the ML models.
Actively collaborated with the infrastructure, network, and application teams to ensure data quality and availability.
Created Hive queries that helped the analysts/scientists to spot the trends by comparing fresh data with EDW tables and historical metrics.
Vast exposure to Telecom, Clickstreams, TV Viewership, TV program data.
Extensively designed and developed Snapshot, Data compression and Data lookup processes.
Extracted and updated the data into HDFS using Sqoop import and export. Created a generic application in Python/Scala to get the raw data files and to process them into ORC files while storing them in HDFS
Used Spark-data frames in applications where iteration of data was necessary and Worked on enhancements using Hive and TWS jobs.
Developed pipelines in NIFI to ingest data from Data router into HDFS.
Used PySpark, Spark SQL to analyze the peak usage of TV Viewership data.
Create visualizations and reports for the business intelligence team.
Used several parameters to tune the performance of HIVE queries which were used to handle the 90 Million records of Viewership data per day.
Developed complex queries in Hive, Spark SQL using Scala and oracle to perform data analytics and provide meaningful insights to business groups.
Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
Involved in production support, which involved monitoring server and error logs, and foreseeing and preventing potential issues, and escalating issue when necessary.
Configured S3 buckets with various life cycle policies to archive the infrequently accessed data to storage classes based on requirement
Wrote unit testing codes using unit test, resolving bugs and other defects using Firebug.
Worked with offshore team as part of the day-to-day responsibilities

Data Engineer

Confidential, PA

Responsibilities:

As part of Subscribers Optimization Insights project, collaborated with stakeholders and data scientists, performed source-to-target mapping, and created a massive data frame which integrates all the subscriber’s data.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with historical metrics.
Implemented Hive generic UDFs to incorporate business logic into Hive queries.
Using python developed spark code for faster processing and testing raw data and executed performance scripts.
Exported data from relational databases to data lake using Sqoop.
Have used Tableau for visualization and building and pipelining dashboards.
Developed Spark jobs and Hive jobs to summarize and transform data.
Experienced in loading and transforming of large sets of structured data using Spark.
Using python developed spark code and Spark-SQL for faster processing and testing raw data and executed performance scripts.
Exported data to relational databases using Sqoop for visualization and to generate reports.
Managed and reviewed Hadoop log files.
Developed complex queries in Hive, Spark SQL using Scala and oracle to perform data analytics and provide meaningful insights to business groups.
Developed Spark pipeline to perform aggregations on real time CDR data and messages to identify chronic customers for retail groups.

Java Developer

Confidential

Responsibilities:

Involved in all the phases of SDLC. Developed the code using agile methodology with daily meetings to keep track of the progress and Issues pertaining to the project.
Developed the code using Agile Scrum methodology with projects divided into succinct work cadences, known as sprints, which are typically two and three weeks in duration
Involved in Requirement meetings with Business partners to understand Business expectation.
Responsible for design, development, and implementation of modules.
Designed and Developed Java based web application using Java, J2EE and Spring MVC.
Designed and Developed java batches using spring batch framework.
Implemented the critical functionalities using Java and oracle packages & stored procedures.
Writing SQL Queries to access data for analysis of data from database
Integrated JUnit testing tool with Apache's Ant build tool to automate unit and regression testing to ensure the system stability.
Used several design patterns like business delegate, and front controller in the development process.
Developed UNIX automated deployment scripts for QA and Dev environments
Designed and Developed UNIX based scripts to call the java batches.
Worked on Documentation part for initial planning, estimation, impact analysis and production rollouts

We provide IT Staff Augmentation Services!

Bigdata Engineer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship