Sr.Hadoop/Spark Developer Resume New York City,NY - Hire IT People

PROFESSIONAL SUMMARY:

Around 7+ years of IT experience in software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies. Having 3+ years of experience in Data Analysis using Hadoop Eco System components (Spark, HDFS, MapReduce, Pig, Sqoop,Kafka, Hive, Cassandra and HBase) in Financial, Retail and Health - care sector.
Experience in Hadoop components like HDFS, MapReduce, Job Tracker, Name Node, Data Node Task Tracker and Apache Spark.
Experience in importing data from existing relational databases (Oracle, MySQL and Teradata) that provide SQL interfaces using Sqoop.
Hands on experience in Avro, Parquet, RC files and Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
Experience in developing Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
Implemented Sqoop scripts for large dataset transfer between Hadoop and RDBMS.
Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries.
Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, BZIP)
Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
Involved In working with Maven for build process.
Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka
Experience in data workflow scheduler Zoo-Keeper and Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
Knowledge in creating impala views on top of Hive tables for faster access to analyze data.
Integrated BI tool like Tableau with Impala and analyzed the data.
Experience with NoSQL databases like HBase, Cassandra and MongoDB.
Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
Exposure in working with data frames in Spark.
Hands on experience in working with Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
Profound experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
Knowledge in creating dashboards with the help of business inteligence tool such as Tableau.
Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), J2SE, Multithreading in Core Java, HTML, servlets, JSP, JDBC.
Experience in working with different relational databases like MySQL and Oracle.
Working knowledge in database design, writing complex SQL Queries and Stored Procedures.
Capable at using AWS utilites such as EMR,S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.
Having knowledge in making use of Pycharm and Python shell to develop spark based applications using Python as lanquage.
Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
Having Experience on Development applications like Eclipse, NetBeans etc.
Involved in Agile methodologies, daily scrum meetings, spring planning.
Good analytical, communication, problem solving skills and adore learning new technical, functional skills .

TECHNICAL SKILLS:

Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Solr, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume.

NoSQL Databases: HBase, Cassandra, and MongoDB

Hadoop Distributions: Cloudera, Hortonworks

Programming languages: Java, C, SCALA, Pig Latin, HiveQL.

Scripting Languages: Shell Scripting

Databases: MySQL, oracle, Teradata, DB2

Build Tools: Maven, Ant, sbt

Reporting Tool: Tableau

Version control Tools: SVN, Git, GitHub

Cloud: AWS, Azure

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

Web Design Tools: HTML, AJAX, JavaScript, JQuery, CSS and JSON.

Operating Systems: WINDOWS 10/8/Vista/ XP

Development IDEs: NetBeans, Eclipse IDE, Python(IDLE)

Packages: Microsoft Office, putty, MS Visual Studio

WORK EXPERIENCE:

Confidential, New York City,NY

Sr.Hadoop/Spark Developer

Responsibilities:

Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
Developing design documents considering all possible approaches and identifying best of them.
Responsible to manage data coming from different sources.
Developing business logic using Scala.
Responsible for loading data from UNIX file systems to HDFS
Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
Developed scripts and automated data management from end to end and sync up between all the clusters.
Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop.
Import the data from different sources like HDFS/HBase into Spark RDD.
Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
Developed functional programs in SCALA for connecting the streaming data application and gathering web data.
Implemented the workflows using Apache Oozie framework to automate tasks.
Configured connection between Hive and Tableau using Impala for BI development tool.
Worked in Agile Methodology and used JIRA for maintain the stories about project.
Experience in automated scripts using Unixshell scripting to perform database activities.
Working experience with Linux lineup like Redhat and CentOS.
Good analytical,communication,problem solving skills and adore learning new technical, functional skills.

Environment: Hadoop, Map Reduce, Hive, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Tableau, Unix, Cloudera, Kafka, Sqoop, Scala, HBase.

Confidential, Rocky Hills,CT

Hadoop Developer

Responsibilities:

Created hive queries for extracting data and sending them to clients.
Created SCALA programs to develop the reports for Business users.
Created hive UDFs for formatting data in SCALA.
Distributed programming through spark, specifically Scala.
Transformation and Analysis in Hive/Pig, Parsing the raw data using Map reduce and SPARK.
Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
Worked on capturing transactional changes in the data using MAPREDUCE and HBASE.
Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SPARK, SQOOP and Pig Latin.
Familiar with AWS Components like EC2,S3.
Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
Worked on ingesting data from different sources.
Supported multiple application extracts coming out of Big Data Platform.
Followed agile methodology during project delivery.
Knowledge of CodeHub and GIT.
Worked/Coordinated with Offshore to complete the tasks.
Understanding of ServiceNowtool to submit Change requests, incidents for application deployments.

Environment: mapR, Hive, Pig, SPARK, SCALA, MapReduce, UNIX scripting, HBASE, Talend.

Confidential, Cranston, RI

Hadoop Developer

Responsibilities:

Implemented technical architecture and developed various Big Data workflows using custom MapReduce, Pig, Hive, Cassandra and Sqoop.
Deployed on premise cluster and tuned the cluster for optimal performance for job execution needs and processes large data sets.
Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF’s in Hive querying.
Used Kafka to dump the application server logs into HDFS.
The logs that are stored on HDFS are analyzed and the cleaned data is imported into Hive warehouse which enabled end business analysts to write Hive queries.
Configured various big data workflows to run on the top of Hadoop using oozie and these workflows comprise of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce.
Experience in working with NoSQL database HBase in getting real time data analytics.
Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
Assigned the tasks of resolving defects found in testing the new application and existing applications.
Analyzing the requirements, designing and developing solutions.
Managing Project team in achieving the project goals including resource allocation, resolving technical issues and mentoring the resources.
Used Linux (Ubuntu) machine for designing, developing and deploying of Java modules.

Environment: MapReduce, Pig, Hive, Sqoop, Kafka, FLUME, HBase, JDK 1.6, Maven, Linux

Confidential - Rolling Meadows, IL

Hadoop Developer.

Responsibilities:

Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
Installed Cloudera Manager on the clusters.
Used a 15-node cluster with Cloudera Hadoop distribution on Amazon EC2.
Developed ad-clicks based data analytics, for keyword analysis and insights.
Crawled public posts from Facebook and tweets.
Used Flume and Kafka to get the streaming data from Twitter and Facebook.
Hands on experience in MapReduce jobs with the Data Science team to analyze this data.
Converted output to structured data and imported to Tableau with analytics team.
Defined problems to look for right data and analyze results to make room for new project.

Environment: Hadoop, HBase, HDFS, MapReduce, Flume, Java, Tableau, Cloudera Manager, Amazon EC2.

Confidential

Java Developer

Responsibilities:

Interaction with business team for detailed specifications on the requirements and issue resolution.
Developed user interfaces using HTML, XML, CSS, JSP, Java Script and Struts Tag Libraries and defined common page layouts using custom tags.
Developed client-side validations using JavaScript.
Implemented Struts MVC Paradigm components such as Action Mapping, Action class, Action Form, Validation Framework, Struts Tiles and Struts Tag Libraries.
Involved in the development of the front end of the application using Struts framework and interaction with controller java classes.
Provided development support for System Testing, User Acceptance Testing and Production and deployed application on JBoss Application Server.
Wrote and executed efficient SQL queries (CRUD operations), JOINs on multiple tables, to create and test sample test data in Oracle Database using Oracle SQL Developer.
Used CVS for check-in, check-out of files to control versions of files.
Used Eclipse as an IDE.
Used HP Quality Center to track activities and defects
Implemented logging with Log4j
Used Maven to compile and build project.
Developed Style Sheet to provide dynamism to the pages and extensively involved in unit testing and System testing using JUnit and involved in critical bug fixing.
Utilized the base UML methodologies and Use cases modeled by architects to develop the front-end interface. The class, sequence and state diagrams were developed using Visio.

Environment: Java, Struts 1.2, Hibernate 3.0, JSP, JavaScript, HTML, XML, Oracle, Eclipse, JBoss Application Server, ANT, CVS, and SQL Developer.

Confidential

SQL Developer

Responsibilities:

Involved in installation and configuration of SQL server 2005 on Database Servers.
Developed database objects like Tables, Views, User-defined Functions and Triggers to handle complex business rules, history data and audit analysis.
Worked with Complex T-SQL queries, Sub queries, co-related sub queries and joins to fetch the data as per the functional requirements.
Used Common Table expressions for hierarchical data and complex stored procedures.
Created various integrity constraints like Primary Key, Foreign Keys, Unique and Check to support application functionality.
Worked with command shell to invoke executables in SQL Stored Procedures .
Actively participated in gathering of User Requirement and System Specification.
Maintained User account administration for Different domains.
Involved in creating SQL reports and generating emails through DB Mail.
Worked with loading of data from Excel using OPEN ROWSET commands.
Creation/ Maintenance of Indexes for fast and efficient reporting process .
Created SSIS package to load data from Flat files, DB2 by using Lookup, Derived Columns, Data conversions and Condition Split transformations.
Maintained the physical Databases by monitoring Performance, space utilization and physical integrity.
Generating Reports as per the requirement using SSRS.

Environment: MS SQL server 2005, Microsoft Visual studio 2005, SSIS, SSRS, DB2, Microsoft Visual Studio 2005, Windows Server 2005, Performance Monitor and MS Office.

We provide IT Staff Augmentation Services!

Sr.hadoop/spark Developer Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship