We provide IT Staff Augmentation Services!

Bigdata Engineer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • 6 years of work experience in IT, which includes experience in the Development, Testing, Implementation and documentation of Big data and Hadoop ecosystem.
  • Experience in dealing with Hadoop components like HDFS, MapReduce, HiveQL, Hive, Sqoop, and Spark.
  • Experience in writing shell scripts to extract, transform and load data onto HDFS for processing.
  • Excellent knowledge of data mapping, extract, transform and load from different data source.
  • Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
  • Pleasant experience in analyzing substantial amounts of data sets writing python scripts and Hive queries.
  • Extensive experience in using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux for big data resources.
  • Designs and implements data ingestion techniques for real time and batch processes for structured and unstructured data sources into Hadoop ecosystems and HDFS clusters
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Design, implement, and improve data pipelines throughout our data platform from data ingestion through the endpoints used to make data actionable
  • Experience in data ingestion technologies such as Sqoop and NIFI.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python.
  • Good experience in handling data manipulation using Python Scripts.
  • Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

TECHNICAL SKILLS

Hadoop Ecosystem Tools: HDFS, MapReduce, Spark, Nifi, Hive, Sqoop.

Languages: SQL, Shell Scripting, Python and Scala

Databases: MySQL, Oracle, MS SQL Server

Cloud: AWS

Operating Systems: Linux, Windows XP, Server 2003, Server 2008.

Version Control: GitHub and SVN

Defect Tracking: JIRA

Monitoring Tools: Grafana, Kibana

Tools: Eclipse, Tableau, SQL Developer, TOAD, Notepad++, PyCharm, Sublime Text.

PROFESSIONAL EXPERIENCE

Bigdata Engineer

Confidential, Dallas, TX

Responsibilities:

  • As part of DFW Viewership Project, AT&T collects customers home - location, work-location and the general pattern of day to day work is being collected in order to send alerts/offers with respect to special events or targeted advertisements in their specific location.
  • Involved in Data Driven decision based Big data projects on HDP 2.6 and 3.1 Data Lake clusters related to Direct-TV, Viewership data, Call Detail Record Data, Wireless Roaming, Sales data with tables having billions of records.
  • As part of this project, developed Spark scripts to collect the raw data which is in the form of structured, semi-structured and unstructured, populate staging tables and store the refined data in partitioned tables in KM datalake that can be accessed by Data scientists to build the ML models.
  • Actively collaborated with the infrastructure, network, and application teams to ensure data quality and availability.
  • Created Hive queries that helped the analysts/scientists to spot the trends by comparing fresh data with EDW tables and historical metrics.
  • Vast exposure to Telecom, Clickstreams, TV Viewership, TV program data.
  • Extensively designed and developed Snapshot, Data compression and Data lookup processes.
  • Extracted and updated the data into HDFS using Sqoop import and export. Created a generic application in Python/Scala to get the raw data files and to process them into ORC files while storing them in HDFS
  • Used Spark-data frames in applications where iteration of data was necessary and Worked on enhancements using Hive and TWS jobs.
  • Developed pipelines in NIFI to ingest data from Data router into HDFS.
  • Used PySpark, Spark SQL to analyze the peak usage of TV Viewership data.
  • Create visualizations and reports for the business intelligence team.
  • Used several parameters to tune the performance of HIVE queries which were used to handle the 90 Million records of Viewership data per day.
  • Developed complex queries in Hive, Spark SQL using Scala and oracle to perform data analytics and provide meaningful insights to business groups.
  • Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
  • Involved in production support, which involved monitoring server and error logs, and foreseeing and preventing potential issues, and escalating issue when necessary.
  • Configured S3 buckets with various life cycle policies to archive the infrequently accessed data to storage classes based on requirement
  • Wrote unit testing codes using unit test, resolving bugs and other defects using Firebug.
  • Worked with offshore team as part of the day-to-day responsibilities

Data Engineer

Confidential, PA

Responsibilities:

  • As part of Subscribers Optimization Insights project, collaborated with stakeholders and data scientists, performed source-to-target mapping, and created a massive data frame which integrates all the subscriber’s data.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with historical metrics.
  • Implemented Hive generic UDFs to incorporate business logic into Hive queries.
  • Using python developed spark code for faster processing and testing raw data and executed performance scripts.
  • Exported data from relational databases to data lake using Sqoop.
  • Have used Tableau for visualization and building and pipelining dashboards.
  • Developed Spark jobs and Hive jobs to summarize and transform data.
  • Experienced in loading and transforming of large sets of structured data using Spark.
  • Using python developed spark code and Spark-SQL for faster processing and testing raw data and executed performance scripts.
  • Exported data to relational databases using Sqoop for visualization and to generate reports.
  • Managed and reviewed Hadoop log files.
  • Developed complex queries in Hive, Spark SQL using Scala and oracle to perform data analytics and provide meaningful insights to business groups.
  • Developed Spark pipeline to perform aggregations on real time CDR data and messages to identify chronic customers for retail groups.

Java Developer

Confidential

Responsibilities:

  • Involved in all the phases of SDLC. Developed the code using agile methodology with daily meetings to keep track of the progress and Issues pertaining to the project.
  • Developed the code using Agile Scrum methodology with projects divided into succinct work cadences, known as sprints, which are typically two and three weeks in duration
  • Involved in Requirement meetings with Business partners to understand Business expectation.
  • Responsible for design, development, and implementation of modules.
  • Designed and Developed Java based web application using Java, J2EE and Spring MVC.
  • Designed and Developed java batches using spring batch framework.
  • Implemented the critical functionalities using Java and oracle packages & stored procedures.
  • Writing SQL Queries to access data for analysis of data from database
  • Integrated JUnit testing tool with Apache's Ant build tool to automate unit and regression testing to ensure the system stability.
  • Used several design patterns like business delegate, and front controller in the development process.
  • Developed UNIX automated deployment scripts for QA and Dev environments
  • Designed and Developed UNIX based scripts to call the java batches.
  • Worked on Documentation part for initial planning, estimation, impact analysis and production rollouts

We'd love your feedback!