Sr. Hadoop Developer/ Big Data Resume Baltimore, MD - Hire IT People

SUMMARY

Around 9 Years of IT industry experience wif 6 years of experience in dealing wif ApacheHadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
Experience in performing in - memory data processing and real time streaming analytics using Apache Spark wif Scala, Java and Python.
Developed applications for Distributed Environment using Hadoop, MapReduce and Python.
Developed MapReduce jobs to automate transfer of data from HBase.
Developing and Maintenance teh Web Applications using teh Web Server Tomcat.
Experience in integrating Hadoop wif Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
Good experience working wif Hortonworks Distribution and Cloudera Distribution.
Very good understanding/noledge ofHadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
Experience in data extraction and transformation using MapReduce jobs.
Proficient in working wif Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
Performed data analysis using Hive and Pig.
Expert in creating Pig and Hive UDFs using Java in order to analyze teh data efficiently.
Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
Strong understanding of NoSql databases like HBase, MongoDB & Cassandra.
Experience in working on various Hadoop data access components like MapReduce, Pig, Hive, HBase, Spark and Kafka.
Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Intensive working experience wif Amazon Web Services(AWS) using S3 for storage, EC2 for computing and RDS, EBS.
Excellent programming skills at higher level of abstraction using SCALA and JAVA.
Well versed wif job workflow scheduling and monitoring tools like Oozie.
Loaded streaming log data from various web servers into HDFS using Flume.
Experience in using Sqoop, Oozie and Cloudera Manager.
Experience on Source control repositories like SVN, CVS and GIT.
Experience in improving teh search focus and quality in ElasticSearch by using aggregations and Python scripts.
Hands on experience in application development using RDBMS, and Linux shell scripting.
Have experience wif working on Amazon EMR and EC2 Spot instances.
Solid understanding of relational database concepts.
Extensively worked wif Unified Modeling Tools (UML) in designing Use Cases, Activity flow diagram, Class diagrams, Sequence and Object Diagrams using Rational Rose, MS-Visio.
Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
Good Knowledge on Hadoop Cluster architecture and monitoring teh cluster.
Adequate noledge and working experience in Agile & Waterfall methodologies.
Support development, testing, and operations teams during new system deployments.
Practical noledge on implementing Kafka wif third-party systems, such as Spark and Hadoop.
Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to teh new systems and environments.
Possess excellent communication and analytical skills along wif a can - do attitude.

TECHNICAL SKILLS

Programming languages: C, Java, Python, Scala, SQL

HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, HBase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch

Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Operating Systems: Windows, Unix, Linux, Ubuntu.

Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.

Web/Application Servers: Apache Tomcat, Sun Java Application Server

Tools: IntelliJ, Eclipse, Net Beans, Nagios, Ganglia, Maven

Scripting: BASH, JavaScript

Version Controls: GIT, SVN

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer/ Big Data

Confidential - Baltimore, MD

Responsibilities:

Analyzed SQL scripts and designed teh solution to implement using PySpark.
Worked in Spark streaming to get ongoing information from teh Kafka and store teh stream information to HDFS.
Developed and Configured Kafka brokers to pipeline server logs data into Spark Streaming.
Completed data extraction, aggregation and analysis in HDFS by using PySpark and store teh data needed to Hive.
Worked on MySQL database to retrieve information from storage using Python.
Experienced in implementing and working on teh python code using shell scripting.
Responsible for developing data pipeline Apache NIFI and Spark/Scala to extract teh data from vendor and store in HDFS and REDSHIFT.
Developed Python code to gather teh data from HBase (Cornerstone) and designs teh solution to implement usingPySpark.
Worked on integrating Python wif Web development tools for developing Web Services in Python using XML, JSON.
Optimizing teh poorly written Spark/Scala jobs by monitoring teh YARN UI to use more parallelism.
Working on python's UNITTEST and PYTEST frameworks.
Used Python/ HTML / CSS to halp teh team implement dozens of new features in a massively scaled Google App Engine web application.
Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
Installed and configured Hadoop MapReduce, HDFS, HIVE, PIG, SQOOP, Flume, Oozie on teh Hadoop cluster are installed and configured.
Strong Experience wif Python automation in automating Rest API and UI automation using Selenium web driver using Python.
Having good noledge on an Apache Cassandra for storing teh data in a Cluster.
Developed and analyzed teh SQL scripts and designed teh solution to implement using Pyspark
Built an Ingestion Framework that would ingest teh files from SFTP to HDFS using Apache NIFI and ingest financial data into HDFS.
Developed a fully automated continuous integration system using Git, Jenkins.
Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Designed and developed scalable Azure APIs using Flask web framework in Python and Integrated wif Azure API Management, Logical Apps and other Azure services.
Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark/PySpark.
Perform data analysis on NoSQL databases such as HBase and Cassandra.
Providing a responsive, AJAX-test driven design using JavaScript libraries such as JavaScript, jQuery, AngularJS and Bootstrap.js - Using Subversion for version control.
Developed and implemented core API services using Python and Spark (PySpark).
Used Pyspark to process and analyze teh data.
Written test cases using PyUnit test framework and Selenium Automation testing for better manipulation of test scripts.
Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP
Conducted systems design, feasibility on recommend cost-TEMPeffective cloud solutions such as Amazon Web Services (AWS), Microsoft Azure and Rackspace.
Worked on Micro services for Continuous Delivery environment using Docker and Jenkins.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP
Developed and deployed data pipeline in cloud such as AWS and GCP
Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
Done batch processing of data sources using Apache Spark, Elastic search.
Implemented a Python-based distributed random forest via Python streaming.
Worked on teh MySQL migration project to make teh system completely independent of teh database being used.

Environment: Python, Mapreduce, Spark, Hadoop, HBase, Scala, Kafka, Hive, Pig, Sqoop, RBAC, ACL, Kafka, Pandas, Docker, ReactJS, Google APIs, SOAP, REST, IntelliJ, Azure APIs, Shell Scripting, Selenium, AWS.

Sr. Hadoop Developer/ Big Data

Confidential - Devner, CO

Responsibilities:

Involved in installation, configuration and maintenance of Hadoop clusters for application development wif Cloudera distribution.
Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
Developed end-to-end scalable distributed data pipelines which receiving data using distributed messaging systems Kafka through persistence of data into HDFS wif Apache Spark using Scala.
Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
In teh framework we just need to mention teh table names, schemas and location of source file/ Sqoop parameters etc. and teh framework will generate teh entire code which includes Workdlow.xml.
Performed advanced operations like text analytics and processing, using in-memory computing capabilities of Spark using Scala.
Experience in query data using Spark SQL on Spark to implement Spark RDD’S in Scala.
Experienced in working wif different scripting technologies like Python, UNIX shell scripts.
Performed POC on writing teh spark applications in Scala, Python and R programming language.
Worked on Partitioning, Bucketing, Parallel execution, Map side Joins for optimization of necessary hive queries.
Performed Hive QL to create Hive tables and to write Hive queries to perform teh data analysis.
Experience in collecting log data from web servers and pushed to HDFS using Flume and NoSql database Cassandra.
Used Oozie workflow to Manage and scheduling Jobs on a Hadoop Cluster and used Zookeeper for cluster coordination services.
Started using Apache NiFi to copy teh data from local file system to HDP.
Implemented best offer logic using Pig scripts and Pig UDFs.
Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
Used NIFI for teh transformation of data from different components of Big data ecosystem.
Worked on different data sources like Oracle, Netezza, MySQL, Flat files etc. and experience wif AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates.
Developed Python scripts to update content in teh database and manipulate files.
Used Qlik sense to build customized interactive reports, worksheets, and dashboards.
Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
Developed and implemented core API services using Python and Spark (PySpark).
Strong expertise on MapReduce programming model wif XML, JSON, CSV file formats.
Involved in managing and organizing developers wif regular code review sessions by utilizing Agile and Scrum Methodologies. Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
Experience in implementing Spark RDD transformations, actions, data frames, case classes to required data by using Spark core.
Migrated teh computational code in hql toPySpark.
Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store teh data needed to Hive.
Provide support data analysts in running Pig and Hive queries.
Involved in HiveQL and Involved in Pig Latin.
Performed various POC’s in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive, Pig, Sqoop, Flume, Elastic Search.
Implemented Jira for bug tracking and Bit-bucket to code and code review.
Performed Data Migration to GCP
Implemented apache airflow DAG to find popular items in Redshift and ingest in teh main PostgreSQL via a web service call.
Implemented Spark applications in data processing project to handle data from various sources and creating DStreams, Data frames on input data which we get from streaming service like Kafka.

Environment: Map Reduce, HDFS, Hive, Spark, Spark-SQL, Sqoop, IntelliJ, Apache Kafka, Java 7, Cassandra, Scala, Apache Pig, Apache Hive, Oozie, Nifi, Linux, AWS EC2, Agile development, Oracle 11g/10g, UNIX Shell scripting, Ambari, TezEclipse, Qlik sense and Cloudera.

Hadoop Developer/ Big Data

Confidential - New York, NY

Responsibilities:

Used Cassandra Query Language to design Cassandra database and tables wif various configuration options.
Developed PIG UDF'S for manipulating teh data according to Business Requirements and also worked on developing custom PIG Loaders.
Experience in integrating Hadoop wif Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
Involved in teh review of functional and non-functional requirements.
Practical experience in developing Spark applications in Eclipse wif Maven.
Strong understanding of Spark real time streaming and SparkSQL.
Used Apache Kafka for collecting, aggregating, and moving large amounts of data from application servers.
Loading data from external data sources like MySQL and Cassandra for Spark applications.
Developed Python and Shell scripts to automate teh end-to-end implementation process of AI project.
Experience in selecting and configuring teh right Amazon EC2 instances and access key AWS services using client tools and AWS SDKs.
Responsible for setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
Firm understanding of optimizations and performance-tuning practices while working wif Spark.
Good noledge on compression and serialization to improve performance in Spark applications
Performed interactive querying using SparkSQL.
Involved in designing Kafka for multi data center cluster and monitoring it.
Responsible for importing real time data to pull teh data from sources to Kafka clusters.
Develop predictive analytic using Apache Spark Scala APIs.
Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
Strong expertise on MapReduce programming model wif XML, JSON, CSV file formats.
Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE.AWS
Practical noledge on Apache Sqoop to import datasets from MySQL to HDFS and vice-versa.
Good noledge on building predictive models focusing on customer service using R programming.
Experience in reviewing and managing Hadoop log files.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Experience in building batch and streaming applications wif Apache Spark and Python.
Used teh libraries built on Mlib to perform data cleaning and used R programming for dataset reorganizing
Debug CQL queries and implement performance enhancement practices.
Strong noledge on Apache Oozie for scheduling teh tasks.
Practical noledge on implementing Kafka wif third-party systems, such as Spark and Hadoop.
Experience in configuring Kafka brokers, consumers and producers for optimal performance.
Knowledge of creating Apache Kafka consumers and producers in Java.
Involved in creating Hive tables, loading wif data and writing hive queries which will run internally in map reduce way.
Practical noledge of monitoring a Hadoop cluster using Nagios and Ganglia.
Experience wif GIT for version control system.
Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
Understanding technical specifications and documenting technical design documents.
Strong skills in agile development and Test-Driven development.
Have practical noledge on implementing Internet of Things (IoT)

Environment: Hadoop Cloudera Distribution (CDH4), Java 7, Hadoop, Spark, SparkSQL, Mlib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig, Apache Hive, HDFS, Sqoop, IntelliJ, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, AWS EC2, GIT, Ambari, TezEclipse, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.

Hadoop Developer / Big Data

Confidential - Newport Beach, CA

Responsibilities:

Developed Spark scripts by using Scala as per teh requirement.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Designed and implemented Incremental Imports into Hive tables.
Developed and written Apache PIG scripts and HIVE scripts to process teh HDFS data.
Involved in defining job flows, managing and reviewing log files.
Involved in Unit testing and delivered Unit test plans and results documents using Junit and MR unit.
Supported Map Reduce Programs those are running on teh cluster.
As a Big Data Developer, implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD'
Developed Spark scripts by using Scala shell commands as per teh requirement.
Imported Bulk Data into HBase Using Map Reduce programs.
Perform analytics on Time Series Data exists in HBase using HBase API.
Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Written Hive jobs to parse teh logs and structure them in tabular format to facilitate TEMPeffective querying on teh log data.
Extracted teh data from Teradata into HDFS/Databases/Dashboards using Spark Streaming.
Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
Wrote multiple java programs to pull data from HBase.
Involved wif File Processing using Pig Latin.
Involved in creating Hive tables, loading wif data and writing hive queries that will run internally in map reduce way.
Created applications using Kafka, which monitors consumer lag wifin Apache Kafka clusters.
Experience in optimization of Map reduce algorithm using combiners and partitions to deliver teh best results and worked on Application performance optimization for a HDFS cluster.
Worked on debugging, performance tuning of Hive & Pig Jobs
Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
Worked wif Spark Ecosystem using SCALA and HIVE Queries on different data formats like Text file and parquet.
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Java, Hadoop, Map Reduce, Pig, Hive, Linux, Sqoop, Flume, Eclipse, AWS EC2, and Cloudera CDH 4.

Java/Hadoop Developer

Confidential

Responsibilities:

Processed data into HDFS by developing solutions.
Analyzed teh data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS.
Developed data pipeline using flume, Sqoop and pig to extract teh data from weblogs and store in HDFS.
Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Created Hive tables and involved in data loading and writing Hive UDFs.
Exported teh analyzed data to teh relational database MySQL using Sqoop for visualization and to generate reports.
Created HBase tables to load large sets of structured data.
Managed and reviewed Hadoop log files.
Worked on Data ingestion to Kafka and Processing and storing teh data Using Spark Streaming.
Involved in providing inputs for estimate preparation for teh new proposal.
Worked extensively wif HIVE DDLs and Hive Query language (HQLs).
Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
Created Map Reduce Jobs to convert teh periodic of XML messages into a partition avro Data.
Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
Created components like Hive UDFs for missing functionality in HIVE for analytics.
Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various.
Used different file formats like Text files, Sequence Files, Avro.
Cluster co-ordination services through Zookeeper.
Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and
Installed and configured Hadoop, Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.

Environment: Hadoop, HDFS, Map Reduce, Hive, Kafka, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.

Software Engineer

Confidential

Responsibilities:

Designed, developed and executed Data Migration from Db2 Database to Oracle Database using Linux scripts, Java and SQL loader concepts.
A key member of teh team and playing a key role in articulating teh Design requirements for teh Development of Automated tools that perform error free Configuration.
Developed UNIX and java utilities for Data migration from Db2 to Oracle. Sole developer and POC for teh migration Activity.
Developed JSP pages, Servlets and HTML pages as per requirement.
Developed teh necessary Java Beans, PL/SQL procedures for teh implementation of business rules.
Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for teh Presentation Tier.
Developed JSP pages and client-side validation by java script tags.
Developed an own realm for Apache Tomcat Server for authenticating teh users.
Developed front end controller in Servlet to handle all teh requests.
Developed teh web interface using JSP and developed struts action classes.
Responsible for both functional and non-functional requirements gathering, performing impact analysis and testing teh solutions build on build basis.
Coding using Java, Java Script and HTML.
Used JDBC to provide database connectivity to database tables in Oracle.
Used WebSphere Application Server for application deployment.
Implemented Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).

Environment: J2EE, IBM DB2, IBM WebSphere Application Server, EJB, JSP, Servlets, HTML, CSS, JavaScript, Oracle database, Unix Scripting and Windows 2000.

Java Developer

Confidential

Responsibilities:

Communicate wif Clients for Requirements Gathering, Explaining teh requirements to Team Members
Analyzing teh Requirements and Designing Screen Prototypes.
Involved in Project Documentation.
Involved in creation of Basic DB Architecture for teh application.
Involved in adding solution to VSS.
Designing & Development of Screens.
Coded JS functions for client validations.
Created user Controls for reusability.
Creation of Tables, Views, Packages, Sequences, Functions for all teh modules of teh project.
Developed Crystal Reports.
Integrating teh functionality of all modules.
Involved in deploying teh application.
Unit testing & integration testing.
Designing test plan, test cases and checking teh validation.
Test whether teh application meets teh business requirements.
Implementation of teh system at client Location.
Giving Training to Application users, interacting wif teh client, understanding teh change requests if any from client.
Responsible for Immediate Error Resolving.

Environment: Core Java, JavaScript, J2EE, Servlets, JSP, Design Patterns, JDBC, HTML, CSS, AJAX, Hibernate, WebLogic, Oracle 8i, ANT, LINUX, SVN, Windows XP

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer/ Big Data Resume

Baltimore, MD

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship