Hadoop Developer Resume
Plano, TX
PROFESSIONAL SUMMARY:
- Around 6 years of overall IT experience in a variety of industries, which includes hands - on experience of around 4+ year in Big Data technologies and designing and implementing Map Reduce and Spark architectures and 1+ year as SQL/BI Developer.
- Good knowledge of Hadoop Development and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map-Reduce and Spark concepts.
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Spark, Kafka, and Flume.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using Hortonworks Data Platform.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Hands on experience with Spark components like Spark SQL and Streaming.
- Involved in converting Hive/SQL queries into spark transformations using Spark RDD’s, Spark SQL and Scala.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map-Reduce programs in Java.
- Written MapReduce programs with custom logic based on the requirement and writing custom UDFs in pig and hive based on the user requirement.
- Implemented NoSQL databases like HBase for storing and processing different formats of data.
- Implemented Oozie for writing workflows and scheduling jobs.
- Written Hive queries for data analysis and to process the data for visualization.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
- Installed Spark and performed analyzing HDFS data and then, by caching a dataset in memory to perform a large variety of complex computations interactively.
- Experience in importing and exporting the different formats of data into HDFS, HBase from different RDBMS databases and vice versa.
- Good interpersonal and communication skills. Team player with strong problem-solving skills.
- Excellent technical skills, consistently outperformed schedules and acquired interpersonal and communication skills.
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Scala, Oozie, Zookeeper, Amazon Web Services, MRUnit, Spark, Kafka. NiFi.
Databases: Oracle, MySQL, SQL Server, MS: SQL Server, HBase, Teradata.
Web Services: Spring
Operating Systems: Unix, Windows, Linux.
Web Technologies: HTML, XML, XHTML, CSS
Languages: Java 1.8, SQL, PL/SQL, R, Python, Shell Scripting
IDE s and Tools: Eclipse, IntelliJ, HDP, GIT.
Methodologies: Agile and Waterfall Model.
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, Plano, TX
Responsibilities:
- Involved in business analysis and technical design sessions with business and technical staff to develop data models, requirements document and ETL specifications.
- Defined technical designs/specifications/solutions to address the Business requirements
- Created prototype of the solutions / provide algorithms to showcase the implementation approach using Hadoop/Hive/Python/Unix/Sqoop.
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- Design and develop ETL code to load data from heterogeneous Source systems like flat files, XML’s, MS Access files, Oracle to target system Oracle under Stage, then to the data warehouse and then to Data Mart tables for reporting.
- Developed ETL with SCD’s, caches, complex joins with optimized SQL queries.
- Worked on importing and exporting data from different databases into HDFS using SQOOP for analysis, visualization and to generate reports.
- Created SCD type tables, merge logic with insert/update.
- Used shell scripting and Python Scripting to perform ETL process to call HQL commands, pre-post ETL process like file validation, zipping, massaging and archiving the source and target files, and used UNIX scripting to manage the file systems.
- Automation and scheduling of ETL batch jobs using Control-M application that are scheduled with file watchers, daily, weekly and on special requests.
- Performance tuning/improvements of the jobs to reduce the wait time
- Work Closely with QA and Prod support team by providing components, documentation, validation and Knowledge transfer on new projects and debugging on issues.
- Started Working on Kafka and NiFi to bring in the Live streaming data into HDFS.
- Participated in high-level code reviews to ensure that the solution implemented meets the defined standards as well as the business requirements
- Ensure smooth actual implementation of the product in production environment
- Value creation to the customer through process improvement and Automation
- Performed data analytics and provide inputs to the business in understanding the data patterns so that they can make informed decisions
- Worked with product managers, development managers and other Confidential teams
Environment: Hadoop, HDFS, Hive, Scala, Sqoop, Control-M, Netezza, DB2, MySQL, Kafka, NiFi, Elastic Search, Tableau.
Data/EDW Engineer
Confidential, Chicago, IL
Responsibilities:
- Analyzed all the tables in DB and listed all the C1, C2 and C3 columns (Data Classification). Created hashing algorithms in python to hash these columns.
- Changed the existing ETL pipeline to become GDPR compliant. Also created user views to enforce data abstraction on Hive tables.
- Optimized Teradata query and ETL jobs to reduce the main pipeline time by 30%.
- Developed the new Opswise workflows to transform the ETL pipelines from pulling the PII data from the source and Deprecated the Older workflows.
- Designed and developed Global transactions to capture the financial metrics across the organization.
- Developed performance dashboards using Tableau that encompasses key metrics to be reviewed by senior leadership and sales management.
- Supported production systems to resolve incidents and performed root cause analysis.
- Developed SCALA scripts using both Data Frames/ SQL and RDD in spark for Data Aggregation, queries and writing data back into OLTP system through SQOOP.
- Developed the new Opswise workflows to transform the ETL pipelines from pulling the PII data from the source and Deprecated the Older workflows.
- Working with Big Data Hadoop Application using Tableau on cloud through Amazon Web Services (AWS) EC2 and S3.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Extensively used Apache SQOOP to import and export the data from the analytical cluster to the MYSQL.
Environment: Hadoop, HDFS, Hive, Spark, Scala, Sqoop, Opswise, Zombie Runner, Teradata, MySQL, Tableau.
Hadoop Developer
Confidential, Herndon, VA
Responsibilities:
- Responsible for building a framework that ingests data onto Hadoop from a variety of data sources providing high storage efficiency and an optimized layout for analytics.
- Implemented POC to showcase the ETL processing capability through Spark RDD transformations using SCALA.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real-time and persist it to HBase.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Developed a data pipeline using Kafka, Spark, HBase and Hive to ingest, transform and analyze data using Oozie workflows .
- Loaded transformed data into Hive tables which helps in providing SQL like access to the data.
- Develop Spark programs using Spark-SQL library to perform analytics on data in Hive.
- Writing Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Parsed Error data is streamed into Kafka error topic which serves as an input for Apache Sentry to send out notifications.
- Using Hive to analyze the partitioned data and compute various metrics for reporting.
- Managed and reviewed Hadoop log files.
- Extensively used Apache Sqoop for efficiently transferring bulk data from Hive to MYSQL .
- Used PowerBI for report generation.
Environment: Hadoop, Spark, Scala, Kafka, Sqoop, HDFS, Hive, Pig, Oozie, MySQL, Impala, Sentry.
Hadoop Developer
Confidential
Responsibilities:
- Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, and Sqoop.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA.
- Developed an ETL framework using KAFKA, Spark, and Hive.
- Developed Spark Programs for Batch and Real-Time Processing to process incoming streams of data from Kafka sources and transform it into as Data frames and load those data frames into Hive and HDFS.
- Experience in developing SQL scripts using Spark for handling different data sets and verifying the performance of Map Reduce jobs.
- Developed Spark programs using Spark-SQL library to perform analytics on data in Hive.
- Developed various JAVA UDF functions to use in both Hive and Impala for ease of usage in various requirements.
- Created multiple MapReduce jobs in Pig and Hive for data cleaning and preprocessing.
- Created Hive views/tables for providing SQL like interface.
- Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
- Writing Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Using Hive to analyze the partitioned data and compute various metrics for reporting.
- Transformed the Impala queries into hive scripts which can be run using the shell commands directly for higher performance rate.
- Created the shell scripts which can be scheduled using Oozie workflows and even the Oozie Coordinators.
- Developed the Oozie workflows to generate monthly report files automatically.
- Managing and reviewing the Hadoop log files.
- Exporting data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Environment: Hadoop, Spark, Scala, Kafka, MapReduce, Sqoop, HDFS, Hive, Pig, Oozie, Java, Oracle 10g, MySQL, Impala.
SQL Developer
Confidential
Responsibilities:
- Analyzed Business Requirements and identified mapping documents required for the system.
- Performed requirement gathering, analyzing and negotiating customer requirements.
- Developed User Interface using HTML, CSS, Javascript, and JSP for various functionalities.
- Developed Stored Procedures and Triggers using SQL, PL/SQL in order to store and retrieve data and implemented the J2EE architecture using struts based on MVC pattern.
- Hands on experience on IDE tools like RAD (Rational Application Developer) and Eclipse.
- Struts Action followed by design patterns such as session, facade, singleton were used for business flow.
- Responsible for creating, sending and receiving messages by using SOAP protocols.
- Configured Hibernate and spring through required configuration and XML files.
- Extensively used Spring Inversion-of-Control for Dependency Injection.
- Experience with WebSphere, Apache and IBM HTTP, used Log4j for logging and JUnit for Unit Testing.
- Involved with writing SQL queries using Joins and Stored Procedures.
- Experience in Waterfall Software Development Lifecycle Model.
Environment: Servlet, Struts, Hibernate, Eclipse Junit and DB2, HTML, CSS, JSP, SQL, PL/SQL, JUnit.