Hadoop Developer Resume
New York, NY
SUMMARY
- Over 5 years of overall IT Industry and Software Development experience with 5+ years of experience in Hadoop Development
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Experienced in major Hadoop ecosystem's projects such as Pig, Hive, HBase and monitoring them with Cloudera Manager.
- Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Hands - on experience working on NoSQL databases including HBase, Cassandra and its integration with the Hadoop cluster.
- Experience in implementing Spark, Scala application using higher-order functions for both batch and interactive analysis requirement.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good noledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloudera, and AWS.
- Experienced in designing, built, and deploying a multitude application utilizing almost all the AWS stack (Including EC2, R53, S3, RDS, DynamoDB, SQS, IAM, and EMR), focusing on high-availability, fault tolerance, and auto-scaling.
- Experienced in MVC (Model View Controller) architecture and various J2EE design patterns like singleton and factory design patterns.
- Extensive experience in loading and analyzing large datasets with the Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like MongoDB, HBase, Cassandra.
- Solid understanding of HadoopMRV1 and HadoopMRV2 (or) YARN Architecture.
- Hands-on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Developed UML Diagrams for Object-Oriented Design: Use Cases, Sequence Diagrams and Class Diagrams using Rational Rose, Visual Paradigm, and Visio.
- Hands-on experience in solving software design issues by applying design patterns including the Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern, and Template Pattern.
- Developed Web-based applications using Python, Amazon Web Services, jQuery, CSS, and Model View control frameworks like Django, Flask, and JavaScript.
- Good experience with design, coding, debug operations, reporting and data analysis utilizing python and using python libraries to speed up development.
- Experienced in creative and effective front-end development using JSP, JavaScript, HTML 5, DHTML, XHTML Ajax and CSS.
- Good Working experience in using different Spring modules like Spring Core Container Module, Spring Application Context Module, Spring MVC Framework module, Spring ORM Module in Web applications.
- Used jQuery to select HTML elements, to manipulate HTML elements and to implement AJAX in Web applications. Used available plug-ins for extension of jQuery functionality.
- Working noledge of database such as Oracle10g/11g/12c, Microsoft SQL Server, DB2.
PROFESSIONAL EXPERIENCE
HADOOP DEVELOPER
Confidential, New York, NY
Responsibilities:
- Developed architecture document, process documentation, server diagrams, requisition documents
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Gatheird the business requirements from the Business Partners and Subject Matter Experts
- Developed environmental search engine using JAVA, Apache SOLR and MYSQL.
- Managed works including indexing data, tuning relevance, developing custom tokenizes and filters, adding functionality includes playlist, custom sorting and regionalization with SOLR Search Engine.
- Ingested data from RDBMS and performed data transformations, and tan export the transformed data to Cassandra as per the business requirement.
- Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms
- Developed PIG UDFs to provide Pig capabilities for manipulating the data according to Business Requirements and worked on developing custom PIG Loaders and Implemented various requirements using Pig scripts.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data
- Created POC using Spark Sql and Mlib libraries.
- Developed a Spark Streaming module for consumption of Avro messages from Kafka.
- Implementing different machine learning techniques in Scala using Scala machine learning library, and created POC using SparkSql and Mlib libraries.
- Experienced in Querying data using SparkSQL on top of Spark Engine, implementing Spark RDD's in Scala.
- Expertise in writing Scala code using higher order functions for iterative algorithms in Spark for Performance considerations.
- Experienced in managing and reviewing Hadoop log files
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.
- Loading data by using the Teradata loader connection, writing Teradata utilities scripts (Fast Load, Multi Load) and working with loader logs.
- To monitor query run times using Teradata Performance Monitor
- Involved in loading of data into Teradata from legacy systems and flat files using complex MLOAD scripts and FASTLOAD scripts.
- Create and Maintain Teradata Tables, Views, Macros, Triggers and Stored Procedures
- Monitored workload, job performance and capacity planning using Cloudera Distribution.
- Worked on Data loading into Hive for Data Ingestion history and Data content summary.
- Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
- Used Hive and Impala to query the data in HBase.
- Created Impala tables and SFTP scripts and Shell scripts to import data into Hadoop.
- Developed HBase java client API for CRUD Operations.
- Created Hive tables and involved in data loading and writing Hive UDFs. Developed Hive UDFs for rating aggregation
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
- Experienced with AWS services to smoothly manage application in the cloud and creating or modifying the instances.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Used EMR (Elastic Map Reducing) to perform big data operations in AWS.
- Worked on Apache spark writing python applications to convert txt, xls files and parse.
- Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop..
- Loading data from different source (database & files) into Hive using Talend tool.
- Load and transform data into HDFS from large set of structured data /Oracle/Sql server using Talend Big data studio.
- Implemented Spark using Python/Scala and utilizing Spark Core, Spark Streaming and Spark SQL for faster processing of data instead of MapReduce in Java
- Experience in integrating Apache Kafka with Apache Spark for real time processing.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc
- Involved in running Hadoop Streaming jobs to process Terabytes of data
- Used JIRA for bug tracking and CVS for version control.
Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, SOLR, CDH3, Cassandra, Oracle, Unix/Linux, Hadoop, Hive, PIG, SQOOP, Flume, HDFS, J2EE, Oracle/SQL & DB2, Unix/Linux, JavaScript, Ajax, Eclipse IDE, CVS, JIRA.
HADOOP DEVELOPER
Confidential, Bellevue, WA
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Responsible to manage data coming from different sources.
- Involved in gathering the business requirements from the Business Partners and Subject Matter Experts.
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup and disaster recovery systems and procedures.
- Involved in works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting.
- Supported MapReduce Programs those are running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
- Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud(S3).
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Expert noledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
- Extracted and restructured the data into MongoDB using import and export command line utility tool.
- Designed and Maintained Tez workflows to manage the flow of jobs in the cluster.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Writing Hive queries for data analysis to meet the business requirements.
- Loading log data into HDFS using Flume and performing ETL Integration.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Created Hive tables and working on them using Hive QL.
- Good Understanding of DAG cycle for entire Spark application flow on Spark application WebUI.
- Developed Spark SQL scripts and involved in converting Hive UDF's to Spark SQL UDF's.
- Implemented procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Developed multiple Spark jobs in Scala/Python for Data cleaning, pre-processing and aggregating.
- Developed Spark programs using Scala, Involved in Creating SparkSQL Queries and Developed Oozie workflow for Spark jobs.
- Push data as delimited files into HDFS using Talend Big data studio.
- Analyzed and performed data integration using Talend open integration suite.
- Wrote complex SQL queries to take data from various sources and integrated it with Talend.
- Used Storm for an automatic mechanism for repeating attempts to download and manipulate the data when their is a hiccup.
- Designing and development of technical architecture, requirements and statistical models using R.
- Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
- Developed UI application using AngularJS, integrated with Elastic Search to consume REST.
- Writing the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Utilized Agile and Scrum Methodology to halp manage and organize a team of developers with regular code review sessions.
Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Java, Scala, Python, Spark, Spark-Streaming, Spark SQL, AWS EMR, AWS S3, AWS Redshift, MapR, Oozie, Flume, HBase, Nagios, Ganglia, Hue, Cloudera Manager, Zookeeper, Cloudera, Oracle, Kerberos and RedHat 6.5
HADOOP DEVELOPER
Confidential, New York, NY
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Collaborate in identifying the current problems, constraints and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and HBase and further to develop reports in Tableau.
- Architect the Hadoop cluster in Pseudo distributed Mode working with Zookeeper and Apache and storing and loading the data from HDFS to AmazonAWSS3 and backing up and Created tables in AWS cluster with S3 storage.
- Evaluated existing infrastructure, systems, and technologies and provided gap analysis, and documented requirements, evaluation, and recommendations of system, upgrades, technologies and created proposed architecture and specifications along with recommendations
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
- Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Installed and Configured MapR-zookeeper, MapR-cldb, MapR-jobtracker, MapR-tasktracker, MapR resource manager, MapR-node manager, MapR-fileserver, and MapR-webserver.
- Installed and configured Knox gateway to secure HIVE through ODBC, WebHcat and Oozie services.
- Load data from relational databases into MapR-FS file system and HBase using Sqoop and setting up MapR metrics with NoSQL database to log metrics data.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data into HDFS for analysis.
- Integrated HDP clusters with Active Directory and enabled Kerberos for Autantication.
- Worked on commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning and installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Worked on creating the Data Model for HBase from the current Oracle Data model.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
- Monitoring the Hadoop cluster functioning through MCS and worked on NoSQL databases including HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and worked with Linux server admin team in administering the server hardware and operating system.
- Worked closely with data analysts to construct creative solutions for their analysis tasks and managed and reviewed Hadoop and HBase log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports and worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Automated workflows using shell scripts pull data from various databases into Hadoop.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Cloudera, Teradata