We provide IT Staff Augmentation Services!

Hadoop / Big Data /spark Developer Resume

4.00/5 (Submit Your Rating)

Newhaven, CT

SUMMARY

  • Over 6+ years of experience and 3 Years of comprehensive experience in application development and design using Hadoop echo system tools, Big Data, Big Data Analytics.
  • Experience in installing, configuring and using eco system components like Hadoop Map reduce, HDFS, Hive, Pig, Flume, Sqoop, mahout, R packages & MLlib.
  • Strong noledge of Software Development Life Cycle (SDLC) including Business interaction, Requirement Analysis, Software Architecture, Design, Development, Testing and Documentation phases.
  • Strong noledge and understanding of Hadoop HDFS & MapReduce concepts and Hadoop Ecosystem.
  • Experience in Apache, Spark, Hortonworks (HDP) & Cloudera distributions (CDH).
  • Created various use cases using massive public data sets. Ran various performance tests for verifying teh efficacy of Map Reduce, PIG and HIVE in various modes - standalone, pseudo distributed, cluster and Cloud.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Experience in Vertica, Impala, Solr and NOSQL databases like HBASE and Cassandra and also performed benchmarking on BigSQL, Impala and Hive.
  • Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Pig, Hbase, AVRO, Zookeeper, etc.), Amazon Web Services (S3, EC2, EMR, etc.),
  • Development Expertize of teh RDBMS like ORACLE, SYBASE, TERADATA, NETEZZA, MS SQL etc
  • Experience on Source control repositories like SVN, CVS and GITHUB.
  • Sound experience in Microsoft Technologies and frameworks like ASP.NET, MVC, WCF, Web API, ADO.NET, LINQ, Web Services.
  • Expertize skills of Importing and exporting teh data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Expertize in writing Linux Scripts, setting up Autosys jobs, writing Pig Scripts, Hive queries, Oozie workflows and Map Reduce programs.
  • Good understanding of NoSQL Databases and hands on work experience in writing applications on No SQL databases like Cassandra and Mongo DB
  • Experience in automating teh Hadoop Installation, configuration and maintaining teh cluster by using teh tools like puppet
  • Experienced in application design using Unified Modeling Language (UML), Sequence diagrams, Case diagrams, Entity Relationship Diagrams (ERD)

TECHNICAL SKILLS

Programming Languages: Python, Scala

Analytical programing languages & tools: R, Mahout, Rapid Miner

Amazon Web Services: EC2, EMR, S3, RedShift

Apache Hadoop: HDFS, Hive, Pig, MapReduce, Flume, Sqoop, Kafka

Hadoop Solutions: Hortonworks, Cloudera, Apache Hadoop

NoSQL DB with MR: Cassandra, HBase, MongoDB

Relational Database: Oracle 12c/11g/10g, MySQL, SQL Server, PostgreSQL

Frameworks and Tools: Sqlplus, SqlDeveloper, Toad, Putty, Oracle Enterprise Manager(OEM), SQL loader

Operating Systems: RHEL6,7, Oracle Linux, CentOS, window

.net Technologies: ASP.NET, MVC, WCF, Web API, ADO.NET, LINQ, Web Services.

PROFESSIONAL EXPERIENCE

Confidential, NEWHAVEN, CT

Hadoop / Big data /Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop
  • Responsible for importing data to HDFS and loading data into HIVE tables after aggregations and other ETL operations
  • Continuous monitoring and managing teh Hadoop cluster through Cloudera Manager
  • Analyzed teh data by performing Hive queries and running Pig scripts to no user behavior
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs
  • Worked on Big Data Integration and Analytics based on Hadoop and Spark
  • Extracted and loaded data into Data Lake environment (Amazon S3) which was accessed by business users and data scientists. Developed PIG scripts to transform teh raw data into intelligent data as specified by business users
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop
  • Worked extensively with Sqoop for importing metadata from Oracle
  • Configured Sqoop and developed scripts to extract data from DB2 into HDFS
  • Created HIVE tables to store various data in teh Parquet format
  • Cluster co-ordination services through Zookeeper
  • Optimized PIG, HIVE jobs by using different compression techniques and performance enhancers
  • Optimization of complex joins in PIG by using techniques such as skewed joins and hash based aggregations
  • Installed and configured Hive and also written Hive UDFs in java and python
  • Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem
  • Worked on data preparation and pre-processing data for machine learning
  • Worked on end to end development for various data sets which had variety of data.
  • Written shell scripts and Python scripts for automation of job
  • Assist with teh addition of Hadoop processing to teh IT infrastructure
  • Perform data analysis using Hive and Pig

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Scala, Pig, Sqoop, Oozie, ZooKeeper, Teradata, PL/SQL, MySQL, Windows, Horton works, Oozie

Confidential, Maryland Heights, MO

Hadoop/ Big data Developer

Responsibilities:

  • Design, deploy, Manage cluster nodes for our data platform operations (racking/stacking)
  • Install and configure cluster. Setting up puppet for centralized configuration management.
  • Monitoring Cluster using various tools to see how teh nodes are performing.
  • Expertise in cluster task like Adding Nodes, Removing Nodes without any TEMPeffect to running jobs and data. Involved in teh pilot of Hadoop cluster hosted on Amazon Web Services (AWS)
  • Write scripts to automate application deployments and configurations. Monitoring YARN applications. Troubleshoot and resolve cluster related system problems.
  • Wrote map reduce programs to clean and pre-process teh data coming from different sources.
  • Implemented various output formats like Sequence file and parquet format in Map reduce programs. Also, implemented multiple output formats in teh same program to match teh use cases.
  • Installed and configured Hive and also written Hive UDFs in java and python
  • Developed Hadoop streaming Map/Reduce works using Python.
  • Performed benchmarking of teh No-SQL databases, Cassandra and HBase.
  • Hands on experience with Lambda architectures.
  • Created data model for structuring and storing teh data efficiently. Implemented partitioning and bucketing of tables in Cassandra.
  • Implemented test scripts to support test driven development and continuous integration.
  • Converted text files into Avro then to parquet format for teh file to be used with other Hadoop eco system tools.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Exported teh analyzed data to HBase using Sqoop and to generate reports for teh BI team.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Participate in requirement gathering and analysis phase of teh project in documenting teh business requirements by conducting workshops/meetings with various business users.
  • POC work is going on using Spark and Kafka for real time processing.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • POC work is going on comparing teh Cassandra and HBase NoSQL databases.

Environment: MapReduce, HDFS, Hive, Pig, Hue, Oozie, Core Java, Eclipse, HBase, Flume, Spark, Kafka, Cloudera Manager, Cassandra, Python, Greenplum DB, IDMS, VSAM, SQL*PLUS, Toad, Putty, Windows NT, UNIX Shell Scripting, Pentaho, Talend, BigData, YARN.

Confidential

Software Developer

Responsibilities:

  • Involved in design, coding system testing, function testing, multiuser testing and regression of teh Application.
  • Designed web page framework in ASP.NET
  • Worked on client - server projects.
  • Involved in design, coding system testing, function testing, multiuser testing and regression of teh Application.
  • Wrote SQL queries and validated teh results based on Functional Specification for new issues in teh market.
  • Participated in role based security implementation
  • Coding & requirement understanding.
  • Worked on daily basis task and review reports to complete within teh specified time limit.
  • Wrote SQL queries and validated teh results based on Functional Specification for new issues in teh market. Participated in role based security implementation. Coding & requirement understanding.
  • Worked on daily basis task and review reports to complete within teh specified time limit.
  • Created web pages using HTML, JQuery and JavaScript’s functions for different functionalities.
  • Writing nested queries, joins in line to get required data in entity FW
  • Wrote stored procedures and adding in entity framework.

Confidential

Database Developer

Responsibilities:

  • Create and maintain tables, views, procedures, functions, and packages.
  • Worked on SQL*Loader to load data from flat files obtained from various facilities every day.
  • Used external tables to manipulate data obtained daily before loading them into teh tables.
  • Developed UNIX Shell scripts to automate repetitive database processes.
  • Developed database objects like Tables, Views, Indexes, Synonyms and Sequences.
  • Involve in Creation of tables, join conditions, correlated subqueries, nested queries, views, sequences, synonyms for teh business application development.
  • Involved in teh design and development of User Interfaces and coding modules in PL/SQL
  • Extensively used teh Triggers, Indexes, Views and Materialized Views for teh Application design. preparing documentation and user support documents. preparing test plans, unit testing, System integration testing, implementation, and maintenance.

We'd love your feedback!