Hadoop/Spark Developer Resume New York - Hire IT People

SUMMARY:

About 9+ years of professional IT experience which includes experience in Big data ecosystem experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications
Over 4+ years of work experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, HBase, Sqoop, Hue, Flume, Oozie, Zookeeper, Spark, Spark SQL and Spark Streaming .
Hands on experience in Hadoop clusters like Hortonworks, AWS Elastic Map Reduce and Cloudera.
Knowledge on Hadoop architecture and its components like HDFS, Name Node, Data Node, Job Tracker, Task Tracker programming paradigm .
In depth knowledge on Big Data Stack like HDFS, Map Reduce, YARN, Hive, HBase, Sqoop, Flume, Kafka, Spark, Spark Data Frames, Spark SQL, Spark Streaming, etc.
Exploring with Spark using Scala improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark - SQL, Data Frame, pair RDD's, Spark YARN.
Hands on experience with HiveQL .
Hands on experience in Apache Spark using Scala(Scala-shell) and python(PySpark).
Hands on experience building spark applications using build tools like SBT, Maven and Gradle.
Good experience in working with different file formats like text, Sequence, RCFILE, ORC, Parquet, Avro and JSON.
Good experience with different compression formats like GZip, LZO, BZip2 and snappy.
Good knowledge on relational databases like MySQL, Oracle and NoSQL databases like HBase, MongoDB and Cassandra.
Good knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.
Working knowledge on UNIX /Linux systems including Experience on shell scripting
Experience in handling semi/un-structured data from different data sources.
Experience in developing Map Reduce programs using Combiners, Map side join, Reducer side join, Distributed Cache, Compression techniques, Multiple Input & output.
Good Exposure on Hive, PIG Scripting and Distribute Application and HDFS.
Experienced in performing ad-hoc analysis on structured data using HiveQL, joins and Hive Custom UDF's and Good exposure to Counters, Shuffle & Sort parameters, Dynamic Partitions, Bucketing for performance improvement.
Expertise in using IDE like Net Beans, Eclipse, Visual Studio and Intellij
Excellent knowledge in Java and SQL in application development and deployment.

TECHNICAL SKILLS:

Big Data Associated: HDFS, MapReduce, Pig, Hive, Sqoop, Flume, HBase, Cassandra, MongoDB, Oozie, Apache Spark, Spark SQL, Spark Streaming.

Process/Data Modeling: MS Visio, UML Diagrams and ER Studio

Cluster Manager Tools: HDP Ambari, Cloudera Manager, Hue

ETL/ELT/Databases: HBase, Spark SQL, MS Access, Oracle, DB-II, My SQL, SQL Developer, SQL Server 2000/2005/2008 and Toad

Languages: C, C++, Java, PL/SQL, Python, Scala

WebTechnologies: HTML, DHTML, XML, CSS

Microsoft Technologies: ASP.NET, C#.Net, VB.Net, ADO.NET, SharePoint, Word, Excel and PowerPoint.

Operating Systems: Linux, Ubuntu, RHEL, Windows 2000/2003/2008/ XP/7/8/10.

IDE: Net Beans, Eclipse, Visual Studio and Intellij

PROFESSIONAL EXPERIENCE:

Confidential, Chicago

Hadoop/Spark Developer

Responsibilities:

Gathering Requirements from the client and creating technical documentation
Using Sqoop imported the Data from relational databases to HDFS.
Involved in creating Hive tables, Spark applications, loading the data into HDFS using Sqoop.
Performed Data Profiling, identify data quality and validating rules regarding data integrity and data quality as it relates to the impact on business requirements.
Worked in building Spark applications.
Used Spark SQL to process the huge amount of structured data.
In-depth knowledge in Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL)
Worked with Hive Tables, Hive queries, Partitioning, Bucketing.
Created Rich dashboards using Tableau Dashboard.
Connected Tableau server to publish dashboard to a central location for portal integration.
Creation of metrics, attributes, filters, reports, and dashboards created advanced chart types, visualizations and complex calculations to manipulate the data.

Environment: Cloudera Manager, Apache Hadoop, Java (jdk1.8 Version), Pig, Hive, Spark, Spark-SQL, Scala, Tableau, Shell Scripting, HDFS, Sqoop.

Confidential, New York

Hadoop/Spark Developer

Responsibilities:

Involved in Design and Implementation of Enterprise applications and Web based applications using Java and Big Data platform.
Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, Hive.
Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop and Worked with Spark and Scala.
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs.
Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Utilized Oozie workflow to run Hive Jobs Extracted files through Sqoop and placed in HDFS and processed.
Responsible for importing log files from various sources into HDFS using Flume.
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
Used Spring JDBC Dao as a data access technology to interact with the database.
Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.

Environment: Hadoop, Spark, HDFS, Scala, Hive, Java, Spring, Map Reduce, Sqoop, Spring MVC, Big Data, Spark SQL, JDBC, Oozie, Pig, Flume

Confidential, Seattle

Hadoop Developer

Responsibilities:

Worked on analyzing data using different big data analytic tools including Pig, Hive and MapReduce.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Implemented Partitioning, Dynamic Partitions, and Buckets in Hive on Avro files to meet the business requirements.
Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) in Java.
Pioneered in design, develop and implementation of data transformation process in MapReduce.
Implemented Data Integrity and Data Quality checks using Linux scripts.
Used flume to tail the application log files into HDFS.
Experience with ServiceNow to provide support for the existing clients.
Involved in scheduling of Hive and pig jobs using Oozie workflow.
Worked with MySQL for storing metadata.
Involved in performance tuning and memory optimization of map-reduce applications.
Worked on end to end automation of application.
Responsible for continuous Build/Integration with Jenkins and deployment using XL Deploy.
Actively involved in code review and bug fixes and enhancements.

Environment: Hadoop, HDFS, Apache Hive, Pig, MapReduce, MySQL, Core Java, Shell Scripting, Eclipse, Git, Jenkins.

Confidential

Hadoop Developer

Responsibilities:

Involved and configured with Hadoop environment through Amazon Web Services in cloud.
Designed, planned and delivered proof of concept and business function/division based implementation of Big Data roadmap and strategy project (Apache Hadoop stack with Tableau) using Hadoop and its ecosystem tools.
Developed MapReduce jobs in java for data cleaning and preprocessing.
Importing and exporting data into HDFS and Hive using Sqoop.
Used Bash Shell Scripting, Sqoop, Hive, Pig, Java, Map/Reduce as a daily routine to develop ETL, batch processing, and data storage functionality.
Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
Worked with HBase, a NoSQL database in Hadoop ecosystem.
Worked on loading various data tables from the reference source database schema through Sqoop.
Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
Collected data residing in various relational databases (Oracle, MySQL) to HDFS
Used Oozie and Zookeeper for workflow scheduling and monitoring.
Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/HBase using Oozie.
Experienced in managing and reviewing Hadoop log files.
Working on extracting files from Mongo DB through Sqoop and placed in HDFS and processed.
Supported Map Reduce Programs those running on the cluster.
Cluster coordination services through Zoo Keeper.
Involved in loading data from Local File System (UNIX file system) to HDFS.
Installed and configured Hive and written Hive UDFs.
Created several Hive tables, Created DDL’s and DML’s and created UDF’s
Worked on setting up Pig, Hive, Redshift and HBase on multiple nodes and developed using Pig.
Developed Simple to complex MapReduce Jobs using Hive and Pig.

Environment: Apache Hadoop, Map Reduce, HDFS, Hive, Java, SQL, PIG, Zookeeper, Java (jdk1.6), Flat files, Oracle 11g/10g, MySQL, Windows NT, UNIX, Sqoop, Hive, Oozie, HBase.

Confidential

SQL/PL-SQL Developer

Responsibilities:

Data analysis primarily identifying Source Data, Source Meta Data, Data Definitions and Data Formats.
Designed Physical & Logical Data model and Data flow diagrams.
Involved in the creation of database objects like Tables, Views, Stored Procedures, Functions, Packages, DB triggers, Indexes.
Created database objects like tables, views, sequences, synonyms, indexes using Oracle tools like SQL*Plus, SQL Developer and Toad.
Enforced data integrity using integrity constraint and database triggers.
Proficient in advance features of Oracle 11g for PL/SQL programming like Using Records and Collections, Bulk Bind, Ref. Cursors, Nested tables and Dynamic SQL.
Experience in SQL and PL/SQL tuning and query optimization tools like SQL Trace, Explain Plan, and DBMS PROFILER.
Extensively used package and utilities like DBMS STATS, TKPROF, DBMS SCHEDULER.
Strong knowledge with PL/SQL Wrapper to protect the PL/SQL procedures or packages.
Developed data entry, query and reports request screens and tuned the SQL queries.
Used joins, indexes effectively in where clauses for Query optimization.
Assisted in gathering requirements by performing system analysis of the requirements with the technology teams.
Extensively worked in analyzing logs created by database jobs, shell script-based jobs and sorted out the issues.
Unit Testing of the procedures, assisting in SIT Testing, UAT Testing and Function Testing.
Code migration and data migration for SIT, UAT.
Code deployment in Production and warranty support given to the Application Support Team and client.
Maintenance and support activities for existing releases.
Monitoring activities of the application health after the deployment.

Environment: Oracle 10g, SQL Developer, SQL* PLUS, SQL* LOADER, Shell Scripts, ETL, TOAD, Rally, Bit bucket.

Confidential

SQL/PL-SQL Developer

Responsibilities:

Created custom PL/SQL procedures to read data from flat files to dump to Oracle database using SQL * Loader.
Developed PL/SQL Procedures and database triggers for the validation of input data and to implement business rules.
Created database objects like packages, procedures, and functions according to the client requirement.
Extracting the data from simple flat text files into operational database.
Used SSIS to create ETL packages to validate, extract, transform and load data to data warehouse databases, data mart databases to store data to OLAP databases.
Created the PL/SQL packages, procedures, functions applying the business logic to load the data to relevant tables. database and Converted different source system data into oracle format T-SQL.
Created records, tables, collections for improving performance by reducing context switching.
Created and manipulated stored procedures, functions, packages and triggers using TOAD.
Wrote heavy stored procedures using dynamic SQL to populate data into temp tables from fact and dimensional tables for reporting purpose.
Responsible to tune ETL mappings to optimize load and query Performance
Developed Oracle Forms for form end user using oracle form builder 10g.
Extensively used the advanced features of PL/SQL like Records, Tables, Object types and Dynamic SQL.
Advising requesters and implementation using SQL and PL/SQL scripts to target the best audience for the marketing/sales campaign/event scheduled.
Tune ETL procedures and STAR schemas to optimize load and query Performance.

Environment: Oracle 10g, T-SQL, SQL*Plus, SQL*Loader, PL/SQL Developer, Web Services, SSIS, SSRS, TOAD, Agile.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship