We provide IT Staff Augmentation Services!

Sr.hadoop Developer/technology Consultant Resume

4.00/5 (Submit Your Rating)

Houston, TX

PROFILE SUMMARY

  • Having around 9+ years of Professional experience in IT Industry, involved in Developing, Implementing and maintenance of various web based applications using Java and Big Data Ecosystem experience on Windows and Linux environments.
  • Having around 3 years of experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
  • Experienced with performing real time analytics on NoSQL data bases like HBase and Cassandra.
  • Worked on AWS EC2, EMR and S3 to create clusters and manage data using S3.
  • Good knowledge in working with Impala, Storm and Kafka.
  • Experienced with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • 5 + years of experience in java development experience.
  • Developing and Maintenance the Web Applications using the Web Server Tomcat.
  • Experience on Source control repositories like SVN, CVS and GITHUB.
  • Good Experience on SDLC (Software Development Life cycle).
  • Have knowledge on google cloud platform.
  • For Project Documentation used MS Word and Excel.
  • Experience in migrating on premise to Windows Azure in DR on cloud using Azure Recovery Vault and Azurebackups.
  • Strong Knowledge in InformaticaPower center, Data warehousing and Business intelligence.
  • Good level of experience in Core Java, JEE technologies as JDBC, Servletsand JSP.
  • Expert in developing web applications using Struts,Hibernate and SpringFrameworks.
  • Hands on Experience in writing SQL and PL/SQL queries.
  • Performance tuning of Vertica DB cluster. Install, upgrade, configure Vertica application, handle request for Vertica DB account setup for analysts.
  • Good understanding and experience with Software Development methodologies like Agile,Waterfall and performed Testing such as Unit, Regression, White-box, Black-box.
  • Monitor the ETL process job and validate the data loaded in Vertica/Teradata DW.
  • Experience in Web Services using XML, HTML, and SOAP.
  • Involved in maintain the Cognos10.2 version environment.
  • Administration of Hadoop and Vertica clusters for structured and unstructured data warehousing.
  • Worked on version control tools like CVS, GIT, SVN.
  • Well Experience in projects usingJIRA, Testing, Maven, MS Build and Jenkins build tools.
  • Experience in developing web pages using Java, JSP, Servlets, JavaScript, JQuery, Angular JS, Node, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, Junit, Apache-Tomcat andWeb Sphere.

TECHNICAL SKILLS:-

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Methodology: Agile, waterfall

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON,NodeJs.

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

ETL Tools: Talend, Informatica, Pentaho

PROFESSIONAL EXPERIENCE

Confidential - Houston, TX

Sr.Hadoop Developer/Technology Consultant

Responsibilities:

  • Involved in collecting business requirement and designing multiple data pipelines and monitoring the data flow in Hortonworks Ambari UI.
  • Highly motivated and versatileteam playerwith the ability to work independently &adapt quickly to the environment.
  • Performed ad-hoc queries on structured data usingHive QL and used Partition, Bucketingtechniques and joins with Hive for faster data access.
  • Designed and developed jobs to validate the data post migration such as reporting fields from source and designation systems using Spark SQL RDDs and DataFrames/Datasets.
  • Worked on Spark Streamingand Structured Spark streamingusing Apache Kafka for real time data processing.
  • Co-ordinate with team in San Antonio on daily basis through teleconference discussing road blocks, issues and developments.
  • Involved in creating external tables in hive in compressed formats both transactional and non-transactional.
  • Worked on query performance and try to optimize it by using aggregations/optimizing techniques.
  • Performed Data Quality check rules and logging techniques before and after executing the business requirement.
  • Co-ordinated with TMS team in gathering data from Kafka producers team and writing spark-core jobs in order to achieve the business requirement.
  • Used Dbeaveras SQL to see oversee the sample data and oversee the structure of data in the hive data base.
  • Involved in reading uncompressed data formats like Gzip,Avro,Parquet and compressed the same according to the business logic by writing generic code.
  • Subscribed to multiple topics in kafka and tried to import data over non-transactional to transactional table using JDBC connection.
  • Knowledge on MLLib (Machine Learning Library) framework for auto suggestions.
  • Performed Proof of concepts of Ni-fi to import data from Kafka to HDFS.
  • Involved in Ni-fi to export data from AWS S3 to RDBMS,Glacier.
  • DevelopedNi-fi workflow templates to replace oozie workflows of Sqoop,Kafka and Glacier.
Environment: Hadoop, Hive 1.x, HDFS, Sqoop, Oozie, Spark 2.1.1, Spark-Streaming, Dbeaver, KAFKA 0.14,Scala,Tez, Maven Build, MySQL, AWS, Agile-Scrum.

Confidential - Dallas, TX

Big Data Engineer

Responsibilities:

  • Gather business requirements and design and develop data ingestion layer and presentation layer.
  • Clearly and regularly communicate with management and technical support colleagues in developing business modules.
  • Support AllState technical and business team’s specific data and reporting needs on a global scale.
  • Developed spark applications for data transformations and loading into HDFS using RDD, Dataframes and datasets.
  • Good understanding of Data Mining and Machine Learning techniques.
  • Worked on multiple clusters in managing the Data in HDFS for Data Analytics.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala. Done Clustering, regression and Classification using Machine Learning libraries Mahout, MLlib(Spark).
  • Reviewing and managing Hadoop log files from multiple machines using Flume.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective &efficient Joins, Transformations and other during ingestion process.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location.
  • Configuration of Internal load balancer, load balanced sets and AzureTraffic manager.
  • Expertise in Extraction, Transformation,loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Used SQL Azure for Backend operations and data persistence.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Involved in analyzing log data to predict the errors by using Apache Spark.
  • Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR,EBS, RDS and VPC.
  • Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
  • Used Impala and Written Queries for fetching Data from Hive tables.
  • Developed Several MapReduce jobs using Java API.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming
  • Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star and Snow Flake Schema.
  • Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
  • Near Real Time Solr index on Hbase and HDFS.
  • Involved in analyzing log data to predict the errors by using Apache Spark.
  • Developed Pig and Hive UDF's to implement business logic for processing the data as per requirements.
  • Developed Oozie Bundles to schedule pig, Sqoop and hive jobs to create data pipelines.
  • Implemented the project by using Agile Methodology and Attended Scrum Meetings daily.
Environment: Hadoop, Hive, HDFS,Pig, Sqoop, Oozie, Spark, Spark-Streaming, KAFKA, Apache Solr,Cassandra,Cloudera Distribution, Java, Impala, Web Server’s, Maven Build,MySQL,AWS,Agile-Scrum.

Confidential - New York City, NY

Spark/Big Data Developer

Responsibilities:

  • Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Developed the services to run the Map-Reduce jobs as per the requirement basis.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Responsible to manage data coming from different sources.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Hive UDFs.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications
  • Responsible for Spark Streaming configuration based on type of Input Source
  • Experienced with AWS services to smoothly manage application in the cloud and creating or modifying instances.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Develop ETL Process usingSPARK, SCALA, HIVE and HBASE.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
  • Used codec's like snappy and LZO to store data into HDFS to improve performance.
  • Expert knowledge onMongoDB NoSQL data modeling, tuning, disaster recovery and backup.
  • Created HBase tables to store variable data formats of data coming from different Legacy systems.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed Sqoop Jobs to load data from RDBMS into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Worked on Kerberos authentication to establish a more secure network communication on the cluster.
  • Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Worked with Network, database, application and BI teams to ensure data quality and availability.
  • Worked with ELASTIC MAPREDUCE and setup environment in AWS EC2 Instances.
  • Experienced in NOSQL databases like HBase, MongoDBand experienced with Hortonworks distribution of Hadoop.
  • Developed ETL jobs to integrate data from various sources and load into the warehouse using Informatica 9.1.
  • Experienced in Creating ETL Mappings in Informatica.
  • Experienced in working with various Transformations like Filter, Router, Expression,update strategy etc.inInformatica.
  • Scheduled the ETL jobs using ESP scheduler.
  • Worked in Agile methodology and actively participated in daily Scrum meetings.
Environment: Hadoop, HDFS,MapReduce, Hive, Pig, Sqoop, Hbase,MongoDB,Flume, Apache Spark, Accumulo, Oozie, Kerberos, AWS, Tableau, Java, Informatica, Elastic Search, Git, Maven.

Confidential, FL

Hadoop/Bi Data Engineer

Responsibilities:

  • Develop Hive queries on external tables in order to perform various analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Mentored analyst and test team for writing Hive Queries.
  • Coding complex Oracle storedprocedures, functions, packages, and cursors for the client specific applications.
  • Involved in the databasemigrations to transfer data from one database to other and complete virtualization of many client applications
  • Prepare Developer (Unit) Test cases and execute Developer Testing.
  • Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
  • Supports and assist QA Engineers in understanding, testing and troubleshooting.
  • Written build scripts using ant and participated in the deployment of one or more production systems
  • Production RolloutSupport which includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
  • Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
  • Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards,codereviews, source controlmanagement and build processes.
  • Work closely with various levels of individuals to coordinate and prioritize multiple projects. Estimate scope, schedule and track projects throughout SDLC.
  • Worked in a team for Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets the needs of the business unit and enterprise and allows for business growth.
  • Involved in source system analysis, data analysis, data modelingto ETL (Extract, Transform and Load)
  • Experienced in working with data analytics, web Scraping and Extraction of data in Python.
  • Designed & Implemented database Cloning using Python and Built backend support for Applications using Shell scripts.
  • Worked on various compression techniques like GZIP and LZO.
  • Design and Implementation of Batch jobs using Sqoop, MR2, PIG and Hive.
  • Implemented HBase on top of HDFS to perform real time analytics.
  • Handled Avro Data files using Avro Tools and Map Reduce.
  • Developed Data pipelines by using Chained Mappers.
  • Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
  • Active involvement in SDLC phases (Design, Development, Testing)Code review etc.
  • Active involvement in Scrum meetings and Followed Agile Methodology for implementation.
Environment: - HDFS, Map Reduce, Hive, Flume, Pig, Sqoop,Oozie,HBase, RDBMS/DB, Flat files, MySQL, CSV, Avro data files.

Confidential

Java/Hadoop Developer

Responsibilities:

  • Used Sqoop for importing and exporting data from MySql, Oracle 11g into HDFS and Hive.
  • Optimizing and performance tuning of Hive Queries and Implementing Complex transformations by writing UDF's in PIG and HIVE.
  • Involved in running MapReduce jobs for processing millions of records.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
  • Created partitions and buckets on hive tables to improve performance while running Hive queries.
  • Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
  • Responsible for analyzing the performance Hive queries using Impala.
  • Developed Flume ETL job for handling data from HTTP source and sink as HDFS.
  • Automated the Hadoop pipeline using Oozie and scheduled using coordinator for time frequency and data availability.
  • Monitoring of Hadoop Cluster using Cloudera Manager.
  • Load and transform large sets of semi-structured and unstructured data that includes sequence files and xml files and worked on Avro and Parquet file formats using compression techniques like Snappy, Gzip and Zlib.
  • Worked on building Hadoop cluster in AWS Cloud on multiple EC2 instances.
  • Used Amazon Simple Storage Service(S3) for storing and accessing data to Hadoop cluster.
  • Used JIRA for Bit Bucket to check-in, Bug tracking and checkout code changes.
  • Responsible for developing SQL Queries required for the JDBC.
  • Designed the database and worked on DB2 and executed DDLS and DMLS.
  • Active participation in architecture framework design and coding and test plan development.
  • Strictly followed Water Fall development methodologies for implementing projects.
  • Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
  • Involved in developing training presentations for developers (off shore support), QA, Production support.
  • Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.
Environment: -Java JDK (1.5), Java J2EE, Informatica, Oracle 11g (TOAD and SQL developer) Servlets, Jboss application Server,Water Fall, JSPs, EJBs, DB2, RAD, XML, Web Server, JUNIT, Hibernate, MS ACCESS, Microsoft Excel.

Confidential

Java Developer

Responsibilities:

  • Implemented several design patterns like Observer pattern,factory pattern, singleton pattern, facade pattern etc.
  • Utilized Agile Methodologies to manage full life-cycle development of the project.
  • Interacting with the Business Analyst and Host to understating the requirements using the Agile methodologies and SCRUM meeting to keep track and optimizing the end client needs.
  • Created Use Case Diagrams, Class Diagrams, Activity Diagrams during the design phase.
  • Developed the project using MVC Design pattern.
  • Designed and Developed Server-side Components (DAO, Session Beans) Using J2EE.
  • Worked with Core Java concepts like Collections Framework, multithreading, memory management.
  • Used JDBC connectivity and JDBC statements, Prepared Statements, Callable Statements for querying, inserting, updating, deleting data from Oracle databases.
  • Developed Front-end Screens using HTML, CSS, and JavaScript.
  • Developed Date Time Picker using Object Oriented JavaScript extensively.
  • Code reviews and re-factoring was done during the development and check list is strictly adhered during development.
  • Used JENKINS for continuous Integration.
  • used Subversion as a version control system for the application.
  • Used Log4j for logging purposes and Tracing the code.
  • Client side Validations are done using JavaScript.
  • Optimized XML parsers like SAX and DOM for the production data.
  • Have good understanding of Teradata MPP architecture such as Partitioning and Primary Indexes.
  • Good knowledge in Teradata Unity, Teradata Data Mover, OS PDE Kernel internals, Backup and Recovery.
  • Implemented the JMS Topic to receive the input in the form of XML and parsed them through a common XSD.
  • Used JDBC Connections and WebSphere Connection pool for database access.
  • Developed and modified several Database Procedures, Triggers and views to implement the business logic for the application.
  • TOAD is used to monitor the turnaround times of queries and to test all the connections.
  • Prepared the test plans and executed test cases for unit, integration and system testing.
  • Developed multiple unit and integrations tests using Mockito and Easy Mock.
  • Used JIRA for reporting bugs in the application.

Environment:-Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JMS, JavaScript, XSLT, HTML,CSS, SAX, DOM, XML, UML, TOAD, Mockito, Oracle, Eclipse RCP, JIRA, WebSphere, Unix/Windows.

Confidential

Junior Java Developer

Responsibilities:

  • Extensive Involvement in Requirement Analysis and system implementation.
  • Actively involved in SDLC phases like Analysis, Design and Development.
  • Responsible for developing modules and assist in deployment as per the client’s requirements.
  • Application is implemented using JSP and servlets are used for implementing Business logic.
  • Developed utility and helper classes and Server side Functionalities using servlets.
  • Created DAO Classes and Written Various SQL queries to perform DML Operations on the data as per the requirements.
  • Created Custom Exceptions and implemented Exception handling using Try, Catch and Finally Blocks.
  • Developed user interface using JSP, JavaScript and CSS Technologies.
  • Implemented User Session tracking in JSP.
  • Involved in Designing DB Schema for the application.
  • Implemented Complex SQL Queries, Reusable Triggers, Functions, Stored procedures using PL/SQL.
  • Worked in pair programming, Code reviewing and Debugging.
  • Involved in Tool development, Testing and Bug Fixing.
  • Performed unit testing for various modules.
  • Involved in UAT and production deployments and support activities.
Environment: - Java, J2EE, Servlets, JSP, SQL,PL/SQL,HTML,JavaScript,CSS, Eclipse, Oracle, MYSQL, IBM Websphere,JIRA.Education Summary:Bachelor’s Degree in the field of Computer Science and Engineering from Kakatiya University, 2009.

We'd love your feedback!