We provide IT Staff Augmentation Services!

Hadoop/aws Developer Resume

4.00/5 (Submit Your Rating)

Hoboken, NJ

PROFESSIONAL SUMMARY:

  • My professional experience has been in the Software Development, specifically in BigDatatechnologies. I carry a good experience of consulting and implementing big data solution to meet the client requirements using JAVA, Linux/Unix scripting & Python
  • Result oriented professional with 8 years of IT experience which includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using JAVA /J2EE Technologies and Big dataHadooptechnologies.
  • Around 3.5 Years of experience in Big DataHadoopEcosystems with ingestion, storage, querying, processing and analysis of big data.
  • Excellent understanding ofHadooparchitecture and its components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node and MapReduce programming paradigm.
  • Hands on experience in installing, configuring, monitoring and integration ofHadoopecosystem components like MapReduce, HDFS, HBase, Pig, Hive, Oozie, Sqoop, Crunch, Avro, Flume, Spark, Splunkand ZooKeeper.
  • Experience in building, maintaining multipleHadoopclusters (prod, dev etc.,) of different sizes and configuration and setting up the Rack Topology for large clusters.
  • Experience using variousHadoopDistributions (Cloudera, MapRetc) to fully implement and leverage newHadoopfeatures.
  • Expertise in deployment of Hadoop, Yarn,Sparkand Storm integration with Cassandra, Ignite andRabbitMQ, Kafka etc.
  • Experiencein Building and Managing Hadoop EMR clusters on AWS.
  • Hands on experience AWS services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Redshift, Lambda, SNS, SQS).
  • Experience in Migrating data from Hadoop/Hive/Hbase to DynamoDB using java automation.
  • Extensively used ApacheKafka to load the log data from multiple sources directly into HDFS.
  • Expertise in writing custom UDFs and UDAFs in Pig&Hive Core Functionality.
  • Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
  • Experience in Data Load Management, importing and exporting data from HDFS to AWS REDSHIFT and vice versa using Sqoop and Flume.
  • Experience in Elastic search technologies in creating custom Lucene/Solr Query components.
  • Experienced in working with the Different file formats like TEXTFILE, AVRO, JSON, Parquet & ORC.
  • Good knowledge in Machine Learning algorithms and used Apache Mahout and Spark MLib.
  • Good experience in Apache Storm based streaming analytics.
  • Expertise in Amazon Web Services AWSRedshift, EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Maintaining the MySQL server and Authentication to required users for Databases.
  • Experience in using Cloudera Manager for installation and management of single - node and multi-node Hadoop cluster (CDH3, CDH4 & CDH5).
  • Experience in cloud stack such as Amazon AWSRedshift and VMWARE stack.
  • Documented tool to perform chunk uploads of big data into google big query.
  • Exported the analyzed data to various Databases like Teradata (Sales Data Warehouse), SQL-Server using Sqoop.
  • Proficient in writing build scripts using Ant & Maven.
  • Serves as a liaison between finance, system wide business units and Data Warehousing
  • Strong understanding of data warehousing concepts, OLTP and OLAP data models.
  • Experience in developingSplunkdashboards, creating data models, summary indexes and forwarder management.
  • Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
  • Very good experience in monitoring and managing the Hadoop cluster usingCloudera Manager.
  • Excellent understanding and knowledge of NoSQL databases like DynamoDB, HBase, and Cassandra.
  • ImplementedKerberosauthentication for Hadoop services.
  • Experience in usingImpalato analyze stored data using HiveQL.
  • Strong understanding of data warehousing concepts, OLTP and OLAP data models.
  • Strong experience in working with UNIX/LINUX environments, writing shell scripts.
  • Extensive experience in Programming, Deploying, and Configuring on Web Servers such as IBM Web sphere, ApacheTomcat Server and Amazon EC2.

TECHNICAL SKILLS:

BigData Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Hadoop Streaming, ZooKeeper, ApacheSpark,Hcatalog,Avro,Apache Phoenix,Cloudera Impala

Cloud Environment: Amazon Web Services (AWS):

Hadoop Distributions: Cloudera (CDH4/CDH5), MapR

Languages: C, C++, JAVA SQL, PL/SQL, PIG-Latin, HQL

IDE Tools: Eclipse, NetBeans

Framework: Hibernate, Spring, Struts, Junit

Web Technologies: HTML5, CSS3, JAVAScript, JQuery, AJAX, Servlets, JSP, JSON, XML, XHTML

Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia

Application Servers: Jboss, Tomcat, Web Logic, Web Sphere,Glass Fish

Databases: Oracle, MySQL, IMB DB2, Derby, PostgreSQL

NoSQL Databases: HBase, Cassandra, DynamoDB

Web Servers: IIS 7.0, AWS, Redshift, Tomcat, EC2, S3, RDS, ELB

Operating Systems: Windows (XP,7,8), UNIX, LINUX, Ubuntu, CentOS, Mac OS

PROFESSIONAL EXPERIENCE:

Hadoop/AWS developer

Confidential, Hoboken, NJ

Responsibilities:

  • Migrated Hbase data from In house data center to AWS Dynamodb using Java API.
  • Responsible for API design and implementation to exposing data to/from Dynamodb.
  • Launched and configured Amazon Web Services EC2 cloud servers using AMI's and configuring the servers for specified applications.
  • Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins and chef along with Shell Scripts to automate routine jobs.
  • Hands on experience in deploying applications to AWS using Terraform.
  • Created and Configured web applications on AWS.
  • Planned, deployed, monitored and maintained Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMware VM's as required in the environment.
  • Designed, implemented and managed Amazon Web Services (AWS) infrastructure for clustered, high-performance cloud applications.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS cloud watch.
  • Experience with CI (Continuous Integration) and CD (Continuous Deployment) methodologies using Jenkins/Hudson.
  • Developed parsers for each new client using the regular expressions and the internal framework to map all the receipt information and eventually persist the data to SQL database like MySQL and NoSQL databases like Dynamo DB.
  • Utilized Cloud Watch to monitor resources such as EC2, CPU memory, Amazon RDS DB services, Dynamo DB
  • Good knowledge in relational and NoSQL databases like MySQL, SQLServer, Oracle, DynamoDB, Hbase.
  • Performed s3 buckets creation, policies and on the IAM role based policies and customizing the JSON template.
  • Developed Web API using Nodejs hosted on multiple load balanced API instances
  • Setting up databases in AWS using RDS, storage using S3 bucket and configuring instance backups to S3 bucket.
  • Used MySQL, DynamoDB and Elastic-cache to perform basic database administration.
  • Utilize cloud watch to monitor resources such as EC2, CPU memory, Amazon RDS DB services, DynamoDB tables and EBS volumes.
  • Used MySQL, DynamoDB and ElastiCache to perform basic database administration.

Environment: Hadoop, HDFS, Hive, MapReduce, Sqoop, Spark, AWS EC2, Redshift, S3, Lambda, AWS CLI, NodeJS, CloudWatch, VPC, IAM, Git, LINUX, Cloudera, Big Data, Scala, Python, NoSQL, DynamoDB, HBase, Jenkins.

Hadoop developer

Confidential, Eden Prairie, MN

Responsibilities:

  • Worked on a live 60 nodes Hadoop cluster running ClouderaCDH5.4
  • Performed both major and minor upgrades to the existing CDH cluster.
  • Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3)
  • Used SQOOP to move the structured data from Teradata, Oracle and SQL.
  • Worked and created Sqoop(version 1.4.3) jobs with incremental load to populate Hive External tables.
  • Extensive experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
  • Experience in configuring the Storm in loading the data from MYSQL to HBASE using jms
  • Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis
  • As an Architect, designed and implemented large Hadoop /HDFS/HBase application to support analytics platform
  • Real time streaming the data using Spark with Kafka.
  • Involved in transforming data from Mainframe tables to HDFS, and HBASE tables using Sqoop.
  • Involved in using components of Spark including SparkSQL, Spark Stream and MLib.
  • Architect the solution on Cloudera Data Platform and helped move the Enterprise database from thirty different work streams including Amazon Web Service S3 and Elastic File System.
  • Tested the different sources such as Flat files, Mainframe Legacy Flat Files, SQL server 2008 to load into the Teradata data Warehouse.
  • Sophisticated experience with Hadoop distributions like Cloudera.
  • Worked in converting HiveQL queries into Spark transformations using SparkRDDs, Python and Scala.
  • Involved in reading multiple data formats on HDFS using PySpark
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (PigLatin) to study customer behaviour.
  • Developed asset level and contract level, short Funding modules and integrated with the existing modules using shell scripting
  • Worked integrating Hadoop with Informatica loading data to HDFS and Hive.
  • Worked with Architecture Team to understand the source and target data models.
  • Written RDDs in Scala to execute in Spark to see the performance benefit against Map Reduce.
  • Worked on data warehouse product Amazon Redshift which is a part of the AWS (Amazon Web Services).
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Strong Data Warehousing ETL experience of using Informatica 9.1/8.6.1/8.5/8.1/7.1 PowerCentre Client tools - Mapping Designer, Repository manager, Workflow Manager/Monitor and Server tools - Informatica Server, Repository Server manager.
  • Leveraged Flume to stream data from Spool Directory source to HDFS Sink using AVRO protocol.
  • Experience in using Sqoop to import the data on to Hbase tables from different AWS REDSHIFT.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Prepared, arranged and tested SPLUNK search strings and operational strings.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Working experience on maintaining MySQL, SQL databases creation and setting up the users and maintain the backup of cluster metadata databases.
  • Involved in Cluster coordination services through ZooKeeper and adding new nodes to an existing cluster.
  • Developed Enterprise Lucene/Solr based solutions to include custom type/object modelling and implementation into the Lucene/Solr analysis (Tokenizers/Filters) pipeline.
  • Created custom Solr Query components to enable optimum search matching.
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in the environment.
  • Setting up MySQL master and slave replications and helping business applications to maintain their data in MySQL Servers.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Developed test cases for Unit testing using Junit and MRUnit and performed integration and system testing
  • Developed Simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Administrating Tableau Server backing up the reports and providing privileges to users.
  • Worked on Tableau for generating reports on HDFS data.
  • Extracted and updated the data into Monod using MongoDB import and export command line utility interface.
  • Moving data from HDFS to AWS REDSHIFT and vice-versa using SQOOP.
  • Collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further analysis
  • Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
  • Configured a High Availability in your cluster and Configured security with Kerberos

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Splunk, Spark, Storm, AWS EC2, Redshift, Kafka, Solr, LINUX, Cloudera, Big Data, Scala, Python,PySpark, SQL, HiveQL, NoSQL, Business Inteligence,MYSQL l,Tableau,Teradata, HBase.

Hadoop developer

Confidential, Herndon, VA

Responsibilities:

  • Used Sqoop to transfer data between AWS REDSHIFT and HDFS.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Implemented complex map reduce programs to perform map side joins using distributed cache.
  • Implemented multiple Map Reduce Jobs in JAVA for data cleansing and pre-processing.
  • Worked with the team to increase cluster from 32 nodes to 46 nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
  • Designed and implemented custom writable, custom input formats, custom partitions and custom comparators in MapReduce.
  • Thoroughly tested MapReduce programs using MRUnit and Junit testing frameworks.
  • Worked on Sql workbench to load and aggregate the data from S3 toRedshift.
  • Developed data pipelines to process the data from the source systems directly intoRedshiftdatabase.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files
  • Experienced in analysing the SQL scripts and designed the solution to implement usingPySpark
  • Set upKerberoslocally on 5 node POC cluster using Ambari and evaluated the performance of cluster, did impact analysis ofKerberosenablement.
  • Involved in writing bash scripts to automate the solr deployment, logstash-forwarder, logstash, background batch process for generating solr nodes health and stats reports on demands process for indexing to solr using solr API.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Integrating user data from Cassandra to data in HDFS. Integrating Cassandra with Storm for real time user attributes look up.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Created HBase tables to store variable data formats (Avro, JSON) of data coming from different portfolios using NOSQL.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Successfully integrated Hive tables and DynamoDB collections and developed web service that queries Mongo DB collection and gives required data to web UI. Involved in NoSQL database design, integration and implementation.
  • Involved in creation of test plan and testing module for the java code in this project and load test of solr server and search service.
  • Helping support team to understand the functionality of solr server by giving KT sessions and documentation.
  • Loaded data into NoSQL database HBase.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Explored Spark MLIB library to do POC on recommendation engines.

Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, JAVA, Spark, Solr, Kafka, Flume, Storm, Linux, Scala, Maven, MYSQL, Python, PySpark, Oracle 11g/10g, SVN.

Hadoop developer

Confidential, Cleveland, OH

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple MapReduce programs in JAVA for Data Analysis
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
  • Responsible for processing the usage logs and filter out unproductive data.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Experienced in data migration from relational database to Hadoop HDFS.
  • Responsible for writing and maintaining algorithm to identify and filter BOT sessions
  • Responsible for developing search queries in solr.
  • Responsible for indexing data into solr.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioural patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Performed extensive Data Mining applications using HIVE.
  • Use of Impala to create and manage Parquet tables.
  • Loaded data into Solr and performed search queries to retrieve the data.
  • Responsible for performing extensive data validation using Hive
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Implemented business logic by writing Pig UDFs in JAVA and used various UDFs from Piggybanks and other sources
  • Implemented Hive Generic UDF's to implement business logic.
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC
  • Developed the data model to manage the summarized data.
  • Made Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters using Ambari and provided an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, JAVA, Linux, Maven, Teradata, ZooKeeper, SVN, HBase, Cassandra.

JAVA developer

Confidential

Responsibilities:

  • Involved in Analysis, Design and Implementation/translation of Business User requirements.
  • Designed, Developed & implemented Web services on State Street’s cloud platform.
  • Designed and developed various modules of the application with J2EE design architecture.
  • Developed Reporting framework usingJAVA andJ2EE which will generate daily, monthly and yearly reports and prepared different framework design documents.
  • Analyzed business requirements and existing software for High Level Design.
  • Worked in an agile development process, monthly Sprint and daily Scrum.
  • Used spring framework for building the application based on MVC design paradigm.
  • Used Springs AOP to implement security, where cross cutting concerns were identified.
  • Developed JSPs, Servlets and custom tags for creating user interfaces.
  • CreatedJAVA J2EEapplication to View Data inNoSQL.
  • Developed Business logicwith the help of spring and Data Access was implemented using Hibernate.
  • Developed SQL queries and executed them by using JDBC Template provided by spring.
  • Developed HQL queries to retrieve data using Hibernate. The Data Manipulation operations were implemented using Hibernate Template provided by Spring
  • Implemented OR Mapping with the tables in the Oracle database for one-to-one relation and many-to-one relation with the tables.
  • Utilized ApacheTomcat server integrated with Eclipse for debugging and Unit testing
  • Developed RESTFulservices using spring and Used JAXB API for XML parsing.
  • Worked on Restful API and invoked web services (consumed) that are based on EJBs.
  • Did Packaging and Deployment of builds through ANT script.
  • Test Driven Development (TDD) approaches was utilized and the test coverage was always maintained and validated using Clover and Cruise Control.
  • Created LDAP services for user authentication and authorization.
  • Web logic Application Server is being used as the Business Service Tool in middle tier.
  • Worked with NoSQL database and worked with it to perform many different operations
  • Used Log4j for tracking the applications and Used Harvest as version control tool and Clear Quest for defect management.
  • Consumed and created REST Web services for Quick Quote Details
  • Involved in exposing, consuming and packaging Web services using Spring Framework
  • Involved in the code review process and updating the best practices document.

Environment: JAVA1.6, Jersey REST, Web logic, Oracle 11,Spring MVC, IOC, Spring AOP, Hibernate, Scrum, NoSql, ANT, SVN, Jdeveloper, Putty.

JAVA developer

Confidential

Responsibilities:

  • Responsible for design and development of Web Application using Struts Framework.
  • Written Action Classes, Form Bean Classes and configure the Application using Struts Configuration file.
  • Did technical design to conform to STRUTS (MVC) framework.
  • Wrote server side programs by using Servlets and JSP.
  • Designed and developed the HTML front end screens and validated forms using JAVA Script.
  • Made use of Object Oriented concepts like Inheritance, polymorphism and Abstraction.
  • Application and user level configurations have been maintained by using XML Files.
  • Widely used HTML for web based design.
  • Implemented MVC using Struts Framework.
  • Utilized Servlets to handle various requests from the client browser and send responses.
  • Created and implemented PL/SQL stored procedures, triggers.
  • Designing and documenting of the stored procedures.
  • Coding Test Classes using Junit for unit testing, Performed functional integration system and validation testing.
  • Customized RESTful Web Service using Spring RESTful API, sending JSON format data packets between front-end and middle-tier controller.
  • Used JDBC Prepared statements to call from Servlets for database access
  • Implemented design patterns MVC, Session Facade for developing the application.
  • Developed Message Driven Beans for asynchronous processing of alerts.
  • Worked with business analyst in understanding business requirements, design and development of the project.
  • Implemented the Struts frame work with MVC architecture.
  • Developed the presentation layer using JSP, HTML, CSS and client side validations using JAVA Script.
  • Collaborated with the ETL/ Informatica team to determine the necessary data models and UI designs to support Cognos reports.
  • UsedTEX for many typesetting tasks, especially in the form of LaTeX, ConTeXt, and other template packages. Performed several data quality checks and found potential issues, designed Ab Initio graphs to resolve them.
  • Applied J2EE design patterns like Business Delegate, DAO and Singleton.
  • Deployed and tested the application using Tomcat web server.

Environment: JAVA,J2EE, JSP, Servlets, HTML, DHTML, XML, JAVA Script, Struts 2.2, Eclipse, ApacheTomcat, PL/SQL, Oracle9i.

We'd love your feedback!