Sr. Hadoop Developer/ Big Data Resume
Baltimore, MD
SUMMARY
- Around 9 Years of IT industry experience wif 6 years of experience in dealing wif ApacheHadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
- Experience in performing in - memory data processing and real time streaming analytics using Apache Spark wif Scala, Java and Python.
- Developed applications for Distributed Environment using Hadoop, MapReduce and Python.
- Developed MapReduce jobs to automate transfer of data from HBase.
- Developing and Maintenance teh Web Applications using teh Web Server Tomcat.
- Experience in integrating Hadoop wif Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
- Good experience working wif Hortonworks Distribution and Cloudera Distribution.
- Very good understanding/noledge ofHadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
- Experience in data extraction and transformation using MapReduce jobs.
- Proficient in working wif Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
- Performed data analysis using Hive and Pig.
- Expert in creating Pig and Hive UDFs using Java in order to analyze teh data efficiently.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Strong understanding of NoSql databases like HBase, MongoDB & Cassandra.
- Experience in working on various Hadoop data access components like MapReduce, Pig, Hive, HBase, Spark and Kafka.
- Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Intensive working experience wif Amazon Web Services(AWS) using S3 for storage, EC2 for computing and RDS, EBS.
- Excellent programming skills at higher level of abstraction using SCALA and JAVA.
- Well versed wif job workflow scheduling and monitoring tools like Oozie.
- Loaded streaming log data from various web servers into HDFS using Flume.
- Experience in using Sqoop, Oozie and Cloudera Manager.
- Experience on Source control repositories like SVN, CVS and GIT.
- Experience in improving teh search focus and quality in ElasticSearch by using aggregations and Python scripts.
- Hands on experience in application development using RDBMS, and Linux shell scripting.
- Have experience wif working on Amazon EMR and EC2 Spot instances.
- Solid understanding of relational database concepts.
- Extensively worked wif Unified Modeling Tools (UML) in designing Use Cases, Activity flow diagram, Class diagrams, Sequence and Object Diagrams using Rational Rose, MS-Visio.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Good Knowledge on Hadoop Cluster architecture and monitoring teh cluster.
- Adequate noledge and working experience in Agile & Waterfall methodologies.
- Support development, testing, and operations teams during new system deployments.
- Practical noledge on implementing Kafka wif third-party systems, such as Spark and Hadoop.
- Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to teh new systems and environments.
- Possess excellent communication and analytical skills along wif a can - do attitude.
TECHNICAL SKILLS
Programming languages: C, Java, Python, Scala, SQL
HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, HBase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch
Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.
Operating Systems: Windows, Unix, Linux, Ubuntu.
Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.
Web/Application Servers: Apache Tomcat, Sun Java Application Server
Tools: IntelliJ, Eclipse, Net Beans, Nagios, Ganglia, Maven
Scripting: BASH, JavaScript
Version Controls: GIT, SVN
PROFESSIONAL EXPERIENCE
Sr. Hadoop Developer/ Big Data
Confidential - Baltimore, MD
Responsibilities:
- Analyzed SQL scripts and designed teh solution to implement using PySpark.
- Worked in Spark streaming to get ongoing information from teh Kafka and store teh stream information to HDFS.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark Streaming.
- Completed data extraction, aggregation and analysis in HDFS by using PySpark and store teh data needed to Hive.
- Worked on MySQL database to retrieve information from storage using Python.
- Experienced in implementing and working on teh python code using shell scripting.
- Responsible for developing data pipeline Apache NIFI and Spark/Scala to extract teh data from vendor and store in HDFS and REDSHIFT.
- Developed Python code to gather teh data from HBase (Cornerstone) and designs teh solution to implement usingPySpark.
- Worked on integrating Python wif Web development tools for developing Web Services in Python using XML, JSON.
- Optimizing teh poorly written Spark/Scala jobs by monitoring teh YARN UI to use more parallelism.
- Working on python's UNITTEST and PYTEST frameworks.
- Used Python/ HTML / CSS to halp teh team implement dozens of new features in a massively scaled Google App Engine web application.
- Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
- Installed and configured Hadoop MapReduce, HDFS, HIVE, PIG, SQOOP, Flume, Oozie on teh Hadoop cluster are installed and configured.
- Strong Experience wif Python automation in automating Rest API and UI automation using Selenium web driver using Python.
- Having good noledge on an Apache Cassandra for storing teh data in a Cluster.
- Developed and analyzed teh SQL scripts and designed teh solution to implement using Pyspark
- Built an Ingestion Framework that would ingest teh files from SFTP to HDFS using Apache NIFI and ingest financial data into HDFS.
- Developed a fully automated continuous integration system using Git, Jenkins.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Designed and developed scalable Azure APIs using Flask web framework in Python and Integrated wif Azure API Management, Logical Apps and other Azure services.
- Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark/PySpark.
- Perform data analysis on NoSQL databases such as HBase and Cassandra.
- Providing a responsive, AJAX-test driven design using JavaScript libraries such as JavaScript, jQuery, AngularJS and Bootstrap.js - Using Subversion for version control.
- Developed and implemented core API services using Python and Spark (PySpark).
- Used Pyspark to process and analyze teh data.
- Written test cases using PyUnit test framework and Selenium Automation testing for better manipulation of test scripts.
- Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP
- Conducted systems design, feasibility on recommend cost-TEMPeffective cloud solutions such as Amazon Web Services (AWS), Microsoft Azure and Rackspace.
- Worked on Micro services for Continuous Delivery environment using Docker and Jenkins.
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP
- Developed and deployed data pipeline in cloud such as AWS and GCP
- Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Done batch processing of data sources using Apache Spark, Elastic search.
- Implemented a Python-based distributed random forest via Python streaming.
- Worked on teh MySQL migration project to make teh system completely independent of teh database being used.
Environment: Python, Mapreduce, Spark, Hadoop, HBase, Scala, Kafka, Hive, Pig, Sqoop, RBAC, ACL, Kafka, Pandas, Docker, ReactJS, Google APIs, SOAP, REST, IntelliJ, Azure APIs, Shell Scripting, Selenium, AWS.
Sr. Hadoop Developer/ Big Data
Confidential - Devner, CO
Responsibilities:
- Involved in installation, configuration and maintenance of Hadoop clusters for application development wif Cloudera distribution.
- Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
- Developed end-to-end scalable distributed data pipelines which receiving data using distributed messaging systems Kafka through persistence of data into HDFS wif Apache Spark using Scala.
- Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
- In teh framework we just need to mention teh table names, schemas and location of source file/ Sqoop parameters etc. and teh framework will generate teh entire code which includes Workdlow.xml.
- Performed advanced operations like text analytics and processing, using in-memory computing capabilities of Spark using Scala.
- Experience in query data using Spark SQL on Spark to implement Spark RDD’S in Scala.
- Experienced in working wif different scripting technologies like Python, UNIX shell scripts.
- Performed POC on writing teh spark applications in Scala, Python and R programming language.
- Worked on Partitioning, Bucketing, Parallel execution, Map side Joins for optimization of necessary hive queries.
- Performed Hive QL to create Hive tables and to write Hive queries to perform teh data analysis.
- Experience in collecting log data from web servers and pushed to HDFS using Flume and NoSql database Cassandra.
- Used Oozie workflow to Manage and scheduling Jobs on a Hadoop Cluster and used Zookeeper for cluster coordination services.
- Started using Apache NiFi to copy teh data from local file system to HDP.
- Implemented best offer logic using Pig scripts and Pig UDFs.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
- Used NIFI for teh transformation of data from different components of Big data ecosystem.
- Worked on different data sources like Oracle, Netezza, MySQL, Flat files etc. and experience wif AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates.
- Developed Python scripts to update content in teh database and manipulate files.
- Used Qlik sense to build customized interactive reports, worksheets, and dashboards.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Developed and implemented core API services using Python and Spark (PySpark).
- Strong expertise on MapReduce programming model wif XML, JSON, CSV file formats.
- Involved in managing and organizing developers wif regular code review sessions by utilizing Agile and Scrum Methodologies. Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Experience in implementing Spark RDD transformations, actions, data frames, case classes to required data by using Spark core.
- Migrated teh computational code in hql toPySpark.
- Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store teh data needed to Hive.
- Provide support data analysts in running Pig and Hive queries.
- Involved in HiveQL and Involved in Pig Latin.
- Performed various POC’s in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive, Pig, Sqoop, Flume, Elastic Search.
- Implemented Jira for bug tracking and Bit-bucket to code and code review.
- Performed Data Migration to GCP
- Implemented apache airflow DAG to find popular items in Redshift and ingest in teh main PostgreSQL via a web service call.
- Implemented Spark applications in data processing project to handle data from various sources and creating DStreams, Data frames on input data which we get from streaming service like Kafka.
Environment: Map Reduce, HDFS, Hive, Spark, Spark-SQL, Sqoop, IntelliJ, Apache Kafka, Java 7, Cassandra, Scala, Apache Pig, Apache Hive, Oozie, Nifi, Linux, AWS EC2, Agile development, Oracle 11g/10g, UNIX Shell scripting, Ambari, TezEclipse, Qlik sense and Cloudera.
Hadoop Developer/ Big Data
Confidential - New York, NY
Responsibilities:
- Used Cassandra Query Language to design Cassandra database and tables wif various configuration options.
- Developed PIG UDF'S for manipulating teh data according to Business Requirements and also worked on developing custom PIG Loaders.
- Experience in integrating Hadoop wif Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
- Involved in teh review of functional and non-functional requirements.
- Practical experience in developing Spark applications in Eclipse wif Maven.
- Strong understanding of Spark real time streaming and SparkSQL.
- Used Apache Kafka for collecting, aggregating, and moving large amounts of data from application servers.
- Loading data from external data sources like MySQL and Cassandra for Spark applications.
- Developed Python and Shell scripts to automate teh end-to-end implementation process of AI project.
- Experience in selecting and configuring teh right Amazon EC2 instances and access key AWS services using client tools and AWS SDKs.
- Responsible for setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
- Firm understanding of optimizations and performance-tuning practices while working wif Spark.
- Good noledge on compression and serialization to improve performance in Spark applications
- Performed interactive querying using SparkSQL.
- Involved in designing Kafka for multi data center cluster and monitoring it.
- Responsible for importing real time data to pull teh data from sources to Kafka clusters.
- Develop predictive analytic using Apache Spark Scala APIs.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Strong expertise on MapReduce programming model wif XML, JSON, CSV file formats.
- Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE.AWS
- Practical noledge on Apache Sqoop to import datasets from MySQL to HDFS and vice-versa.
- Good noledge on building predictive models focusing on customer service using R programming.
- Experience in reviewing and managing Hadoop log files.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Experience in building batch and streaming applications wif Apache Spark and Python.
- Used teh libraries built on Mlib to perform data cleaning and used R programming for dataset reorganizing
- Debug CQL queries and implement performance enhancement practices.
- Strong noledge on Apache Oozie for scheduling teh tasks.
- Practical noledge on implementing Kafka wif third-party systems, such as Spark and Hadoop.
- Experience in configuring Kafka brokers, consumers and producers for optimal performance.
- Knowledge of creating Apache Kafka consumers and producers in Java.
- Involved in creating Hive tables, loading wif data and writing hive queries which will run internally in map reduce way.
- Practical noledge of monitoring a Hadoop cluster using Nagios and Ganglia.
- Experience wif GIT for version control system.
- Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
- Understanding technical specifications and documenting technical design documents.
- Strong skills in agile development and Test-Driven development.
- Have practical noledge on implementing Internet of Things (IoT)
Environment: Hadoop Cloudera Distribution (CDH4), Java 7, Hadoop, Spark, SparkSQL, Mlib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig, Apache Hive, HDFS, Sqoop, IntelliJ, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, AWS EC2, GIT, Ambari, TezEclipse, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.
Hadoop Developer / Big Data
Confidential - Newport Beach, CA
Responsibilities:
- Developed Spark scripts by using Scala as per teh requirement.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Designed and implemented Incremental Imports into Hive tables.
- Developed and written Apache PIG scripts and HIVE scripts to process teh HDFS data.
- Involved in defining job flows, managing and reviewing log files.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MR unit.
- Supported Map Reduce Programs those are running on teh cluster.
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD'
- Developed Spark scripts by using Scala shell commands as per teh requirement.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse teh logs and structure them in tabular format to facilitate TEMPeffective querying on teh log data.
- Extracted teh data from Teradata into HDFS/Databases/Dashboards using Spark Streaming.
- Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
- Wrote multiple java programs to pull data from HBase.
- Involved wif File Processing using Pig Latin.
- Involved in creating Hive tables, loading wif data and writing hive queries that will run internally in map reduce way.
- Created applications using Kafka, which monitors consumer lag wifin Apache Kafka clusters.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver teh best results and worked on Application performance optimization for a HDFS cluster.
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Worked wif Spark Ecosystem using SCALA and HIVE Queries on different data formats like Text file and parquet.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Java, Hadoop, Map Reduce, Pig, Hive, Linux, Sqoop, Flume, Eclipse, AWS EC2, and Cloudera CDH 4.
Java/Hadoop Developer
Confidential
Responsibilities:
- Processed data into HDFS by developing solutions.
- Analyzed teh data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS.
- Developed data pipeline using flume, Sqoop and pig to extract teh data from weblogs and store in HDFS.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Exported teh analyzed data to teh relational database MySQL using Sqoop for visualization and to generate reports.
- Created HBase tables to load large sets of structured data.
- Managed and reviewed Hadoop log files.
- Worked on Data ingestion to Kafka and Processing and storing teh data Using Spark Streaming.
- Involved in providing inputs for estimate preparation for teh new proposal.
- Worked extensively wif HIVE DDLs and Hive Query language (HQLs).
- Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
- Created Map Reduce Jobs to convert teh periodic of XML messages into a partition avro Data.
- Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various.
- Used different file formats like Text files, Sequence Files, Avro.
- Cluster co-ordination services through Zookeeper.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
- Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and
- Installed and configured Hadoop, Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
Environment: Hadoop, HDFS, Map Reduce, Hive, Kafka, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.
Software Engineer
Confidential
Responsibilities:
- Designed, developed and executed Data Migration from Db2 Database to Oracle Database using Linux scripts, Java and SQL loader concepts.
- A key member of teh team and playing a key role in articulating teh Design requirements for teh Development of Automated tools that perform error free Configuration.
- Developed UNIX and java utilities for Data migration from Db2 to Oracle. Sole developer and POC for teh migration Activity.
- Developed JSP pages, Servlets and HTML pages as per requirement.
- Developed teh necessary Java Beans, PL/SQL procedures for teh implementation of business rules.
- Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for teh Presentation Tier.
- Developed JSP pages and client-side validation by java script tags.
- Developed an own realm for Apache Tomcat Server for authenticating teh users.
- Developed front end controller in Servlet to handle all teh requests.
- Developed teh web interface using JSP and developed struts action classes.
- Responsible for both functional and non-functional requirements gathering, performing impact analysis and testing teh solutions build on build basis.
- Coding using Java, Java Script and HTML.
- Used JDBC to provide database connectivity to database tables in Oracle.
- Used WebSphere Application Server for application deployment.
- Implemented Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
Environment: J2EE, IBM DB2, IBM WebSphere Application Server, EJB, JSP, Servlets, HTML, CSS, JavaScript, Oracle database, Unix Scripting and Windows 2000.
Java Developer
Confidential
Responsibilities:
- Communicate wif Clients for Requirements Gathering, Explaining teh requirements to Team Members
- Analyzing teh Requirements and Designing Screen Prototypes.
- Involved in Project Documentation.
- Involved in creation of Basic DB Architecture for teh application.
- Involved in adding solution to VSS.
- Designing & Development of Screens.
- Coded JS functions for client validations.
- Created user Controls for reusability.
- Creation of Tables, Views, Packages, Sequences, Functions for all teh modules of teh project.
- Developed Crystal Reports.
- Integrating teh functionality of all modules.
- Involved in deploying teh application.
- Unit testing & integration testing.
- Designing test plan, test cases and checking teh validation.
- Test whether teh application meets teh business requirements.
- Implementation of teh system at client Location.
- Giving Training to Application users, interacting wif teh client, understanding teh change requests if any from client.
- Responsible for Immediate Error Resolving.
Environment: Core Java, JavaScript, J2EE, Servlets, JSP, Design Patterns, JDBC, HTML, CSS, AJAX, Hibernate, WebLogic, Oracle 8i, ANT, LINUX, SVN, Windows XP