Big Data / Hadoop Developer Resume New York - Hire IT People

PROFESSIONAL SUMMARY:

Around 5 years of IT experience in various domains with Bigdata, Hadoop Ecosystems and Java J2EE technologies.
Very good hands - on in Spark Core, Spark Sql and good knowledge on Spark Streaming and Spark machine learning using Scala and Pspark.
Solid understanding of RDD operations in Apache Spark i.e., Transformations and Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
Tuned Spark RDD parallelism technics to improving the performance and optimization of the spark jobs on Hadoop cluster
In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
Hands on experience in testing and implementation phase of all the big data Technologies.
Experience with Apache Nifi sandbox instance.
Solid understanding of open source monitoring tools like Apache Ambari .
Analyze and develop Transformation logic for handling large sets of structured, semi structured and unstructured data using Spark and Spark SQL.
Hands on experience in working with input file formats like orc, parquet, json, avro.
Hands on experience with NOSQL databases Hbase and Cassandra.
Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet the business requirements.
Experience in managing and reviewing Hadoop log files.

TECHNICAL SKILLS:

Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, MapReduce

Bigdata distribution: Hortonworks, Amazon EMR, Cloudera

Programming languages: Java, Scala, Python Scripting, SQL, Shell Scripting

Operating Systems: Windows, Linux

Databases: Oracle, SQL Server

Designing Tools: Intellij, Eclipse

Java Technologies: JSP, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON

Linux Experience: System Administration Tools

Web Services: Web Service (RESTful and SOAP)

Development methodologies: Agile

Logging Tools: Log4j

Application / Web Servers: Cherrypy, Apache Tomcat, Websphere

Messaging Services: Flume, Kafka

Version Tools: Git, SVN

PROFESSIONAL EXPERIENCE:

Confidential, New York

Big Data / Hadoop Developer

Responsibilities:

Analyzed Integrated data facility platform (IDF) and made recommendations to the Scala source code in storing large amount of data in parquet format on S3 buckets.
Debugged the defects such as count mismatch, missing tables on S3.
Analyzed nightly, daily, weekly and monthly spark jobs which are triggered using Autosys in IDF platform.
Used EMR edge nodes in development and QA environments for debugging the issues.
Analyzed Spears Automated Surveillance which is a rule based monitoring system built on top of IDF platform.
Developed various test cases scenarios based on rules provided by the business using Spark-Scala API’s. Worked closely with Business analysts and Enterprise architects for understanding the rules provided by the business.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data
Worked on several semi-structured formats to process datasets in S3 like JSON, XML, CSV
Writing Spark SQL in Scala for extracting data from AWS S3, transform it and load the data to AWS S3
Worked extensively on AWS Components such as EC2, S3, Elastic Map Reduce (EMR)
Used Spark-SQL to load json data and create schema RDD and load it into Hive tables.
Developed Spark scripts using scala as per requirements.

Environment: SparkCore, Spark-Sql, Scala, Hive, S3, EMR, Autosys, Oracle, Scoop

Confidential, California

Hadoop and Data Science Platform Engineer

Responsibilities:

Performed benchmarking of federated queries in Spark and compared their performance by running the same queries on Presto.
Defined Spark confs for optimization of federated queries by maneuvering the number of executors, executor-memory and executor-cores.
Created partitions and buckets defined Hive tables for data analysis.
Successfully migrated data from Hive to Memsql db via Spark engine where the largest table being 1.2T.
Successfully ran benchmarking queries on Memsql database and calculated the performance of each query.
Compared the performance of each benchmark query among different solutions like Spark, Teradata, Memsql, Presto, Hive (using Tez engine) by creating a bar graph in Numbers.
Successfully migrated data from Teradata to Memsql using Spark by persisting Dataframe to Memsql.
Provided a solution using HIVE, SQOOP (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
Developed Spark scripts by using Scala as per the requirement.
Developed Java scripts using both RDD and Data frames/SQL/Data sets in Spark 1.6 and Spark 2.1 for Data Aggregation, queries and writing data.
Used Grafana for analyzing the usage of spark executors for different queues on different clusters.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, efficient Joins, Transformations and other during ingestion process itself.
Developed Hive queries to process the data and generate the data cubes for visualizing
Converting SQL codes to Spark codes using Java and Spark-SQL/Streaming for faster testing and processing of data and Import and index data from HDFS for secure searching, reporting, analysis and visualizations in Splunk.
Working extensively on Hive, SQL, Scala, Spark, and Shell.
Used to complete the Assigned radar's in time, used to store the code in GIT repository.
Tested python, R, livy, teradata jdbc interpreters but executing sample paragraphs.
Performed CI/CD builds of Zeppelin, Azkaban and Notebook using Ansible.
Built a new version of Zeppelin by applying Git patches, changing the artifacts using Maven.
Worked on shell scripting to determine the status on various components in data science platform.
Performed data copying activities in a distributed environment using Ansible.

Environment: SparkCore,SparkSQL,Memsql,Presto,Teradata,Hive,ApacheZeppelin,Maven,Github,Intellij,Nginx,Redis,Monit,Linux,Shell Scripting,Ansible.

Confidential, Orlando,Florida

Bigdata Developer

Responsibilities:

Extracted the data from Teradata into HDFS using Sqoop.
Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
Implemented MapReduce programs on log data to transform into structured way to find user information.
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views and visit duration.
Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading using Informatica.
Extensively used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
Created Talend ETL jobs to read the data from Oracle Database and import in HDFS.
Worked on data serialization formats for converting complex objects into sequence bits by using Avro,RC and ORC file formats
Modified store procedures to setup the control table that used to generate package id, batch id and status for each batch.
Created DDL’s for the migration tables residing in mysql and mssql databases
Performed batch processing on large sets of data.
Built Apache Nifi flow for migration of data from mssql and mysql databases to the Staging table.
Performed transformations on large data sets using Apache Nifi expression language.
Unit tested the migration of mysql and mssql tables using the built Nifi flow.
Used Dbeaver for connecting to the different databases that are on different sources.
Performed queries for verifying the data types of different columns that are being migrated to staging table.
Responsible for monitoring data from source to target.
Successfully populated the staging tables in mysql database without any data mismatch errors.
Worked on Agile Version One methodology by attending the scrums and scrum plannings.

Environment: Apache Hadoop, Hortonworks HDP 2.0, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBaseOozie, Teradata, Avro, Java, Talend, Linux, Apache NiFi, Mysql, Mssql, Agile,Dbeaver.

Confidential - Omaha, NE

Hadoop Developer

Responsibilities:

Imported data from our relational data stores to Hadoop using Sqoop.
Created various MapReduce jobs for performing ETL transformations on the transactional and application specific data sources.
Wrote PIG scripts and executed by using Grunt shell.
Worked on the conversion of existing MapReduce batch applications for better performance.
Big data analysis using Pig and User defined functions (UDF).
Worked on loading tables to Impala for faster retrieval using different file formats.
The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
Created Reports and Dashboards using structured and unstructured data.
Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
Performed joins, group by and other operations in MapReduce by using Java and PIG.
Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
Used HIVE definition to map the output file to tables.
Setup and benchmarked Hadoop/HBase clusters for internal use
Wrote data ingesters and map reduce programs
Reviewed the HDFS usage and system design for future scalability and fault-tolerance
Wrote MapReduce/HBase jobs
Worked with HBase, NOSQL database.

Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Flume, Linux, Java 7, Eclipse, NOSQL

Confidential

Java Developer

Responsibilities:

Worked with the business community to define business requirements and analyze the possible technical solutions.
Requirement gathering, Business Process flow, Business Process Modeling and Business Analysis.
Extensively used UML and Rational Rose for designing to develop various use cases, class diagrams and sequence diagrams.
Used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
Developed application using Spring MVC architecture.
Developed custom tags for table utility component
Used various Java, J2EE APIs including JDBC, XML, Servlets, and JSP.
Designed and implemented the UI using Java, HTML, JSP and JavaScript.
Designed and developed web pages using Servlets and JSPs and also used XML/XSL/XSLT as repository.
Involved in Java application testing and maintenance in development and production.
Involved in developing the customer form data tables. Maintaining the customer support and customer data from database tables in MySQL database.
Involved in mentoring specific projects in application of the new SDLC based on the Agile Unified Process, especially from the project management, requirements and architecture perspectives.
Designed and developed Views, Model and Controller components implementing MVC Framework.

Environment: JDK 1.3, J2EE, JDBC, Servlets, JSP, XML, XSL, CSS, HTML, DHTML, JavaScript, UML, Eclipse 3.0, Tomcat 4.1, MySQL.

We provide IT Staff Augmentation Services!

Big Data / Hadoop Developer Resume

New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship