Bigdata Engineer Resume
4.00/5 (Submit Your Rating)
Dallas, TX
SUMMARY
- 6 years of work experience in IT, which includes experience in the Development, Testing, Implementation and documentation of Big data and Hadoop ecosystem.
- Experience in dealing with Hadoop components like HDFS, MapReduce, HiveQL, Hive, Sqoop, and Spark.
- Experience in writing shell scripts to extract, transform and load data onto HDFS for processing.
- Excellent knowledge of data mapping, extract, transform and load from different data source.
- Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
- Pleasant experience in analyzing substantial amounts of data sets writing python scripts and Hive queries.
- Extensive experience in using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux for big data resources.
- Designs and implements data ingestion techniques for real time and batch processes for structured and unstructured data sources into Hadoop ecosystems and HDFS clusters
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Design, implement, and improve data pipelines throughout our data platform from data ingestion through the endpoints used to make data actionable
- Experience in data ingestion technologies such as Sqoop and NIFI.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python.
- Good experience in handling data manipulation using Python Scripts.
- Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
TECHNICAL SKILLS
Hadoop Ecosystem Tools: HDFS, MapReduce, Spark, Nifi, Hive, Sqoop.
Languages: SQL, Shell Scripting, Python and Scala
Databases: MySQL, Oracle, MS SQL Server
Cloud: AWS
Operating Systems: Linux, Windows XP, Server 2003, Server 2008.
Version Control: GitHub and SVN
Defect Tracking: JIRA
Monitoring Tools: Grafana, Kibana
Tools: Eclipse, Tableau, SQL Developer, TOAD, Notepad++, PyCharm, Sublime Text.
PROFESSIONAL EXPERIENCE
Bigdata Engineer
Confidential, Dallas, TX
Responsibilities:
- As part of DFW Viewership Project, AT&T collects customers home - location, work-location and the general pattern of day to day work is being collected in order to send alerts/offers with respect to special events or targeted advertisements in their specific location.
- Involved in Data Driven decision based Big data projects on HDP 2.6 and 3.1 Data Lake clusters related to Direct-TV, Viewership data, Call Detail Record Data, Wireless Roaming, Sales data with tables having billions of records.
- As part of this project, developed Spark scripts to collect the raw data which is in the form of structured, semi-structured and unstructured, populate staging tables and store the refined data in partitioned tables in KM datalake that can be accessed by Data scientists to build the ML models.
- Actively collaborated with the infrastructure, network, and application teams to ensure data quality and availability.
- Created Hive queries that helped the analysts/scientists to spot the trends by comparing fresh data with EDW tables and historical metrics.
- Vast exposure to Telecom, Clickstreams, TV Viewership, TV program data.
- Extensively designed and developed Snapshot, Data compression and Data lookup processes.
- Extracted and updated the data into HDFS using Sqoop import and export. Created a generic application in Python/Scala to get the raw data files and to process them into ORC files while storing them in HDFS
- Used Spark-data frames in applications where iteration of data was necessary and Worked on enhancements using Hive and TWS jobs.
- Developed pipelines in NIFI to ingest data from Data router into HDFS.
- Used PySpark, Spark SQL to analyze the peak usage of TV Viewership data.
- Create visualizations and reports for the business intelligence team.
- Used several parameters to tune the performance of HIVE queries which were used to handle the 90 Million records of Viewership data per day.
- Developed complex queries in Hive, Spark SQL using Scala and oracle to perform data analytics and provide meaningful insights to business groups.
- Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
- Involved in production support, which involved monitoring server and error logs, and foreseeing and preventing potential issues, and escalating issue when necessary.
- Configured S3 buckets with various life cycle policies to archive the infrequently accessed data to storage classes based on requirement
- Wrote unit testing codes using unit test, resolving bugs and other defects using Firebug.
- Worked with offshore team as part of the day-to-day responsibilities
Data Engineer
Confidential, PA
Responsibilities:
- As part of Subscribers Optimization Insights project, collaborated with stakeholders and data scientists, performed source-to-target mapping, and created a massive data frame which integrates all the subscriber’s data.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with historical metrics.
- Implemented Hive generic UDFs to incorporate business logic into Hive queries.
- Using python developed spark code for faster processing and testing raw data and executed performance scripts.
- Exported data from relational databases to data lake using Sqoop.
- Have used Tableau for visualization and building and pipelining dashboards.
- Developed Spark jobs and Hive jobs to summarize and transform data.
- Experienced in loading and transforming of large sets of structured data using Spark.
- Using python developed spark code and Spark-SQL for faster processing and testing raw data and executed performance scripts.
- Exported data to relational databases using Sqoop for visualization and to generate reports.
- Managed and reviewed Hadoop log files.
- Developed complex queries in Hive, Spark SQL using Scala and oracle to perform data analytics and provide meaningful insights to business groups.
- Developed Spark pipeline to perform aggregations on real time CDR data and messages to identify chronic customers for retail groups.
Java Developer
Confidential
Responsibilities:
- Involved in all the phases of SDLC. Developed the code using agile methodology with daily meetings to keep track of the progress and Issues pertaining to the project.
- Developed the code using Agile Scrum methodology with projects divided into succinct work cadences, known as sprints, which are typically two and three weeks in duration
- Involved in Requirement meetings with Business partners to understand Business expectation.
- Responsible for design, development, and implementation of modules.
- Designed and Developed Java based web application using Java, J2EE and Spring MVC.
- Designed and Developed java batches using spring batch framework.
- Implemented the critical functionalities using Java and oracle packages & stored procedures.
- Writing SQL Queries to access data for analysis of data from database
- Integrated JUnit testing tool with Apache's Ant build tool to automate unit and regression testing to ensure the system stability.
- Used several design patterns like business delegate, and front controller in the development process.
- Developed UNIX automated deployment scripts for QA and Dev environments
- Designed and Developed UNIX based scripts to call the java batches.
- Worked on Documentation part for initial planning, estimation, impact analysis and production rollouts
