We provide IT Staff Augmentation Services!

Big Data / Hadoop Developer Resume

4.00/5 (Submit Your Rating)

New, YorK

PROFESSIONAL SUMMARY:

  • Around 5 years of IT experience in various domains with Bigdata, Hadoop Ecosystems and Java J2EE technologies.
  • Very good hands - on in Spark Core, Spark Sql and good knowledge on Spark Streaming and Spark machine learning using Scala and Pspark.
  • Solid understanding of RDD operations in Apache Spark i.e., Transformations and Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
  • Tuned Spark RDD parallelism technics to improving the performance and optimization of the spark jobs on Hadoop cluster
  • In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
  • Hands on experience in testing and implementation phase of all the big data Technologies.
  • Experience with Apache Nifi sandbox instance.
  • Solid understanding of open source monitoring tools like Apache Ambari .
  • Analyze and develop Transformation logic for handling large sets of structured, semi structured and unstructured data using Spark and Spark SQL.
  • Hands on experience in working with input file formats like orc, parquet, json, avro.
  • Hands on experience with NOSQL databases Hbase and Cassandra.
  • Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet the business requirements.
  • Experience in managing and reviewing Hadoop log files.

TECHNICAL SKILLS:

Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, MapReduce

Bigdata distribution: Hortonworks, Amazon EMR, Cloudera

Programming languages: Java, Scala, Python Scripting, SQL, Shell Scripting

Operating Systems: Windows, Linux

Databases: Oracle, SQL Server

Designing Tools: Intellij, Eclipse

Java Technologies: JSP, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON

Linux Experience: System Administration Tools

Web Services: Web Service (RESTful and SOAP)

Development methodologies: Agile

Logging Tools: Log4j

Application / Web Servers: Cherrypy, Apache Tomcat, Websphere

Messaging Services: Flume, Kafka

Version Tools: Git, SVN

PROFESSIONAL EXPERIENCE:

Confidential, New York

Big Data / Hadoop Developer

Responsibilities:

  • Analyzed Integrated data facility platform (IDF) and made recommendations to the Scala source code in storing large amount of data in parquet format on S3 buckets.
  • Debugged the defects such as count mismatch, missing tables on S3.
  • Analyzed nightly, daily, weekly and monthly spark jobs which are triggered using Autosys in IDF platform.
  • Used EMR edge nodes in development and QA environments for debugging the issues.
  • Analyzed Spears Automated Surveillance which is a rule based monitoring system built on top of IDF platform.
  • Developed various test cases scenarios based on rules provided by the business using Spark-Scala API’s. Worked closely with Business analysts and Enterprise architects for understanding the rules provided by the business.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data
  • Worked on several semi-structured formats to process datasets in S3 like JSON, XML, CSV
  • Writing Spark SQL in Scala for extracting data from AWS S3, transform it and load the data to AWS S3
  • Worked extensively on AWS Components such as EC2, S3, Elastic Map Reduce (EMR)
  • Used Spark-SQL to load json data and create schema RDD and load it into Hive tables.
  • Developed Spark scripts using scala as per requirements.

Environment: SparkCore, Spark-Sql, Scala, Hive, S3, EMR, Autosys, Oracle, Scoop

Confidential, California

Hadoop and Data Science Platform Engineer

Responsibilities:

  • Performed benchmarking of federated queries in Spark and compared their performance by running the same queries on Presto.
  • Defined Spark confs for optimization of federated queries by maneuvering the number of executors, executor-memory and executor-cores.
  • Created partitions and buckets defined Hive tables for data analysis.
  • Successfully migrated data from Hive to Memsql db via Spark engine where the largest table being 1.2T.
  • Successfully ran benchmarking queries on Memsql database and calculated the performance of each query.
  • Compared the performance of each benchmark query among different solutions like Spark, Teradata, Memsql, Presto, Hive (using Tez engine) by creating a bar graph in Numbers.
  • Successfully migrated data from Teradata to Memsql using Spark by persisting Dataframe to Memsql.
  • Provided a solution using HIVE, SQOOP (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
  • Developed Spark scripts by using Scala as per the requirement.
  • Developed Java scripts using both RDD and Data frames/SQL/Data sets in Spark 1.6 and Spark 2.1 for Data Aggregation, queries and writing data.
  • Used Grafana for analyzing the usage of spark executors for different queues on different clusters.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, efficient Joins, Transformations and other during ingestion process itself.
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Converting SQL codes to Spark codes using Java and Spark-SQL/Streaming for faster testing and processing of data and Import and index data from HDFS for secure searching, reporting, analysis and visualizations in Splunk.
  • Working extensively on Hive, SQL, Scala, Spark, and Shell.
  • Used to complete the Assigned radar's in time, used to store the code in GIT repository.
  • Tested python, R, livy, teradata jdbc interpreters but executing sample paragraphs.
  • Performed CI/CD builds of Zeppelin, Azkaban and Notebook using Ansible.
  • Built a new version of Zeppelin by applying Git patches, changing the artifacts using Maven.
  • Worked on shell scripting to determine the status on various components in data science platform.
  • Performed data copying activities in a distributed environment using Ansible.

Environment: SparkCore,SparkSQL,Memsql,Presto,Teradata,Hive,ApacheZeppelin,Maven,Github,Intellij,Nginx,Redis,Monit,Linux,Shell Scripting,Ansible.

Confidential, Orlando,Florida

Bigdata Developer

Responsibilities:

  • Extracted the data from Teradata into HDFS using Sqoop.
  • Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
  • Implemented MapReduce programs on log data to transform into structured way to find user information.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views and visit duration.
  • Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading using Informatica.
  • Extensively used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
  • Created Talend ETL jobs to read the data from Oracle Database and import in HDFS.
  • Worked on data serialization formats for converting complex objects into sequence bits by using Avro,RC and ORC file formats
  • Modified store procedures to setup the control table that used to generate package id, batch id and status for each batch.
  • Created DDL’s for the migration tables residing in mysql and mssql databases
  • Performed batch processing on large sets of data.
  • Built Apache Nifi flow for migration of data from mssql and mysql databases to the Staging table.
  • Performed transformations on large data sets using Apache Nifi expression language.
  • Unit tested the migration of mysql and mssql tables using the built Nifi flow.
  • Used Dbeaver for connecting to the different databases that are on different sources.
  • Performed queries for verifying the data types of different columns that are being migrated to staging table.
  • Responsible for monitoring data from source to target.
  • Successfully populated the staging tables in mysql database without any data mismatch errors.
  • Worked on Agile Version One methodology by attending the scrums and scrum plannings.

Environment: Apache Hadoop, Hortonworks HDP 2.0, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBaseOozie, Teradata, Avro, Java, Talend, Linux, Apache NiFi, Mysql, Mssql, Agile,Dbeaver.

Confidential - Omaha, NE

Hadoop Developer

Responsibilities:

  • Imported data from our relational data stores to Hadoop using Sqoop.
  • Created various MapReduce jobs for performing ETL transformations on the transactional and application specific data sources.
  • Wrote PIG scripts and executed by using Grunt shell.
  • Worked on the conversion of existing MapReduce batch applications for better performance.
  • Big data analysis using Pig and User defined functions (UDF).
  • Worked on loading tables to Impala for faster retrieval using different file formats.
  • The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
  • Created Reports and Dashboards using structured and unstructured data.
  • Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
  • Performed joins, group by and other operations in MapReduce by using Java and PIG.
  • Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
  • Used HIVE definition to map the output file to tables.
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Wrote data ingesters and map reduce programs
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance
  • Wrote MapReduce/HBase jobs
  • Worked with HBase, NOSQL database.

Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Flume, Linux, Java 7, Eclipse, NOSQL

Confidential

Java Developer

Responsibilities:

  • Worked with the business community to define business requirements and analyze the possible technical solutions.
  • Requirement gathering, Business Process flow, Business Process Modeling and Business Analysis.
  • Extensively used UML and Rational Rose for designing to develop various use cases, class diagrams and sequence diagrams.
  • Used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
  • Developed application using Spring MVC architecture.
  • Developed custom tags for table utility component
  • Used various Java, J2EE APIs including JDBC, XML, Servlets, and JSP.
  • Designed and implemented the UI using Java, HTML, JSP and JavaScript.
  • Designed and developed web pages using Servlets and JSPs and also used XML/XSL/XSLT as repository.
  • Involved in Java application testing and maintenance in development and production.
  • Involved in developing the customer form data tables. Maintaining the customer support and customer data from database tables in MySQL database.
  • Involved in mentoring specific projects in application of the new SDLC based on the Agile Unified Process, especially from the project management, requirements and architecture perspectives.
  • Designed and developed Views, Model and Controller components implementing MVC Framework.

Environment: JDK 1.3, J2EE, JDBC, Servlets, JSP, XML, XSL, CSS, HTML, DHTML, JavaScript, UML, Eclipse 3.0, Tomcat 4.1, MySQL.

We'd love your feedback!