We provide IT Staff Augmentation Services!

Big Data Hadoop Developer Resume

4.00/5 (Submit Your Rating)

TX

PROFESSIONAL SUMMARY:

  • About 6+ years of IT experience in software Development and Big Data Technologies and Analytical Solutions with 1+ years of hands - on experience in development and design of Java and related frameworks and 2+ years’ experience in design, architecture, and data modelling as database developer.
  • Over 4 years’ experience as Hadoop Developer with good knowledge of Hadoop framework, Hadoop Distributed file system and Parallel processing implementation, Hadoop Ecosystems HDFS, Map Reduce, Hive, Pig, Python, HBase, Sqoop, Hue, Oozie, Impala, Spark.
  • Built and Deployed Industrial scale Data Lake on premise and Cloud platforms .
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experienced in handling different file formats like Text file, Avro data files, Sequence files, Xml and Json files.
  • Extensively worked on Spark Core, Numeric RDD's, Pair RDD's, Data Frames, and Caching for developing Spark applications
  • Expertise in deployment of Hadoop, Yarn, Spark integration with Cassandra, etc.
  • Experience and Expertise in ETL, Data analysis and designing data warehouse strategies.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Upgraded Hadoop CDH to 5.x, and Hortonworks. Installed, Upgraded and Maintained Cloudera Hadoop-based software, Cloudera Clusters, Cloudera Navigator.
  • Good exposure on usage of NoSQL database column-oriented, HBase.
  • Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java. Extensively worked on MRV1 and MRV2 Hadoop architectures.
  • Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
  • Experience in using Python’s packages like xlrd, numpy, pandas, scipy, scikit-learn and IDEs - Spyder, Anaconda, Jupyter, IPython.
  • Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
  • Extensive experiences in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce, Hive, Spark jobs.
  • Involved in moving all log files generated from various sources to HDFS and Spark for further processing.
  • Excellent understanding and knowledge of NOSQL databases like Mongo DB, HBase, and Cassandra.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security.
  • Experience in Dimensional modelling, logical modelling and Physical data modelling .
  • Experienced with code versioning and dependency management systems such as Git, SVT, and Maven.
  • Experience with Testing Map Reduce programs using MRUnit, Junit, ANT, Maven.
  • Experienced in working with scheduling tools such as UC4, Cisco Tidal enterprise scheduler, or Autosys.
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.

TECHNICAL SKILLS:

Hadoop ECO Systems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, ZooKeeper, Flume, Impala, Hue, Oozie, Cloudera Manager, Accumulo, Spark, and MRUnit.

Analytics Softwares: R, Python

NO SQL: Hbase, Cassandra, MongoDB

Data Bases: MS SQL Server 2000/2005/2008/2012, MY SQL, Oracle 9i/10g, MS access, Teradata TeradataV2R5

Languages: Java 8, Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, Teradata SQL, PL/SQL.

Operating Systems: Windows Server 2000/2003/2008, Windows XP/Vista, Mac OS, UNIX, LINUX

Java Technologies: JavaBeans, JDBC, Sparing

Frame Works: JUnit and JTest

IDE’s & Utilities: IntelliJ, Eclipse, Maven, NetBeans.

SQL Server Tools: SQL Server Management Studio

Web Technologies: HTML, XML

Testing & Case Tools: Bugzilla, Selenium, Quality Center, Test Link, Junit, Log4j

Business Intelligence Tools: Tableau, Pentaho

ETL Tools: Informatica, TalenD

Methodologies: Agile, UML, Design Patterns

PROFESSIONAL EXPERIENCE:

Confidential, TX

Big Data Hadoop Developer

Role &Responsibilities:

  • Gained experience in working with Cloudera distribution which is more suitable for Health Care Domains.
  • Performed gap analysis and worked collaboratively with Configuration, Claims, Members as it relates to HEDIS measurements to build automation Python Framework that automate report generation to integrate with Map Reduce.
  • Loaded the data to HBASE using bulk load and HBASE API. Created HBase tables and used various HBase Filters to store variable data formats of data coming from different portfolios.
  • Developed Python metaprogram applying business rules on the data that automatically create, spawn, monitor, and then terminate customized programs as dictated by the events and problems detected within in the data stream.
  • Develop code in Python utilizing pandas data frame to read data from the excel and process it and write the processed data to the result file that invokes Hbase and MapReduce.
  • Developed Spark code using scala and Spark -SQL for batch processing of data. Utilized in-memory processing capability of Apache Spark to process data using Spark SQL, Spark Streaming using Spark and Scala scripts.
  • Create Spark scripts to load data from source files to RDDs, create data frames from RDD and perform transformations and aggregations and collect the output of the process.
  • Apache Spark Dataframes/RDD's were used to apply business transformations and utilized HiveContext objects to perform read/write operations.
  • Involved in the performance and optimization of the existing algorithms in Hadoop using SparkContext, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Implemented partitioning, dynamic partitions and buckets in HIVE and analyzed the partitioned and bucketed data to compute various metrics for reporting.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way. Involved in using HCATALOG to access Hive table metadata from Map Reduce.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Spark, Hive, and Sqoop) as well as system specific jobs.

Environment: Hadoop, Cloudera, HDFS, Map Reduce, Hive, HBase, Zookeeper, Oozie, Spark, Sqoop, Python, PySpark, Scala, Pandas, Numpy, Tableau (Desktop/Server).

Confidential, CA

Data Engineer with Hadoop

Role &Responsibilities:

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Worked within and across Agile teams to design, develop, test and support technical solutions across a full-stack of development tools and technologies.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed Simple to complex Map reduce Jobs using Hive and Pig.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience with Cassandra, with ability to drive the evaluation and potential implementation of it as a new platform. Implemented analytical engines that pull data from API data sources and then present data back as either an API or persist it back into a NoSQL platform.
  • Involved in moving all log files generated from various sources to HDFS and Spark for further processing.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented a distributed messaging queue to integrate with Cassandra using Zookeeper
  • Experienced in using Avro data serialization system to handle Avro data files in map reduce programs.
  • Design, implementation, test, debug of ETL mappings and workflows.
  • Develop ETL routines to source data from client source systems and target the data warehouse.
  • Developed Product Catalog and Reporting DataMart databases with supporting ETLs.
  • Implemented ETL processes for warehouse and designed and implemented code for migrating data to Data lake using Spark.
  • The data is collected from distributed sources into Avro models . Applied transformations and standardizations and loaded into Hive for further data processing.
  • Built Platfora Hadoop multi-node cluster test labs using Hadoop Distros (CDH 4/5, Apache Hadoop, MapR and Horton Works) and Hadoop Eco-systems, Virtualizations and Amazon Web Services component.
  • Installed, Upgraded and Maintained Cloudera Hadoop-based software.
  • Experience with hardening Cloudera Clusters, Cloudera Navigator and Cloudera Search.
  • Managing Running Jobs, Scheduling Hadoop Jobs, Configuring the Fair Scheduler, Impala Query Scheduling.
  • Extensively worked on Impala to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Extensively Used Impala to read write and query the Hadoop data in HDFS. Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Used struts validation framework for form level validation
  • Wrote test cases in Junit for unit testing of classes.

Environment: Hadoop, HDFS, Pig, Agile, Cloudera, Accumulo, AWS EMR, AWS EC2, AWS S3, MongoDB, Sqoop, Scala, Storm, Python, Spark, MQTT, Kerberos, Impala, XML, ANT 1.6, Perl, Python, Java 8, JavaScript, Junit 3.8, Avro, Hue.

Confidential, Jacksonville, Florida

Hadoop Developer

Roles and Responsibilities:

  • Installed and configured Hadoop and Hadoop stack on a 16 node cluster.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and Map Reduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Developed data access libraries that bring MapReduce, Graph, and RDBMS data to users of Scala, Java .
  • Analyse large and critical datasets using Cloudera, HDFS, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper.
  • Designed and deployed AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups, Opsworks
  • Object storage service Amazon S3 is used to store and retrieve media files such as images and Amazon Cloud Watch is used to monitor the application and to store the logging information.
  • Involved in writing Java API for Amazon Lambda to manage some of the AWS services.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Implemented Cluster Coordination services through Zookeeper.
  • Worked with file formats TEXT, AVRO, PARQUET and SEQUENCE files .
  • Develop scripts to automate routine DBA tasks (i.e. refresh, backups, vacuuming, etc.)
  • Installed and configured Hive and also wrote Hive yarn’s that helped spot market trends.
  • Used Hadoop streaming to process terabytes data in XML format.
  • Experienced with code versioning and dependency management systems such as Git, SVT, and Maven.
  • Experienced with Hive customization, i.e. UDFs, UDTFs and UDAFs.
  • Experienced with Python-Hive integration including Pandas, Numpy and Scipy.
  • Experienced with AWS like EC2, S3, EMR, OpenStack cloud infrastructures.
  • Expert in designing customized interactive dashboards in tableau using marks, Action, filters, parameters, Security Concepts, calculations, and relationships.
  • Exported the analyzed data to relational databases using Sqoop for visualization using Tableau and to generate reports for BI team.
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.

Environment: CDH4 with Hadoop 1.x, HDFS, Pig, Cloudera, AWS Lambda, Hive, Hbase, zookeeper, MapReduce, Java, Sqoop, Oozie, Linux, UNIX Shell Scripting and Big Data, Python, Tableau.

Confidential

Software Engineer

Role &Responsibilities:

  • Involved in complete requirement analysis, design, coding and testing phases of the project.
  • Implemented the project according to the Software Development Life Cycle (SDLC).
  • Documented the data flow and the relationships between various entities.
  • Actively participated in gathering of User Requirement and System Specification.
  • Created new Database logical and Physical Design to fit the new business requirement and implemented the same using SQL Server. Created Clustered and Non-Clustered Indexes for improved performance.
  • Created Tables, Views and Indexes on the Database, Roles and maintained Database Users. Developed new Stored Procedures, Functions, and Triggers. Developed JavaScript behavior code for user interaction.
  • Used HTML, JavaScript, and JSP and developed UI.
  • Developed logical and physical Redshift Data Models. Developed Redshift Transformations and Redshift SQL
  • Used JDBC and managed connectivity, for inserting/querying& data management including stored procedures and triggers. Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Sql Server database. Involved in the design and coding of the data capture templates, presentation and component templates.
  • Implemented application using JSP, Spring MVC, Spring IOC, Spring Annotations, Spring AOP, Spring Transactions, Hibernate. Transformed project data requirements into project data models using Erwin.
  • Involved in logical and physical designs and transforms logical models into physical implementations.
  • Enhanced existing data model based on the requirements, maintained data models in Erwin Model Manager.
  • Part of a team which is responsible for metadata maintenance and synchronization of data from database.
  • Developed Datamodel for Datamart and OLTP Database.
  • Involved in end to end development of DataMart and Normalization of original data sets.
  • Hands-on writing Teradata bteq scripts to load data into the DataMart tables.
  • Built control-M jobs to schedule the Datamart jobs and load the tables in QA and production servers.
  • Automated the load process of datamart tables from end to end.
  • Involved in peer review and data validation activities in DataMart.
  • Designed Dimensional model to support Business process to answer complex business questions
  • Provided assistance to development teams on Tuning Data, Indexes and Queries.
  • Developed an API to write XML documents from database. Developed PostgreSQL Functions.
  • Used JavaScript and designed user-interface and checking validations.
  • Developed JUnit test cases and validated users input using regular expressions in JavaScript as well as in the server side. Developed complex SQL stored procedures, functions and triggers. Mapped business objects to database using Hibernate. Wrote SQL queries, stored procedures and database triggers as required on the database objects.
  • Analysis and Design with UML and Rational Rose.
  • Created Class Diagrams, Sequence diagrams and Collaboration Diagrams. Used the MVC architecture.
  • Worked on Jakarta Struts open framework. Wrote spring configuration for the beans defined and properties to be injected into them using spring's Dependency Injection.
  • Implemented a reliable socket interface using the sliding window protocol like TCP stream sockets over UDP unreliable communication channel and later on, tested using the Ftp utility program.
  • Strong domain knowledge of TCP/IP with the expertise in socket programming and IP security domain (IPSec, TLS, SSl and VPN, Firewall and NATs).
  • Have built the strong communication between the source and destination message using socket programming.
  • Hands on experience in writing Spring Restful Web services using JSON / XML.
  • Developed the Spring Features like Spring MVC, Spring DAO, Spring Boot, Spring Batch, Spring Security.
  • Using AngularJS, HTML5, CSS3 all HTML and DHTML is accomplished through AngularJS directives.
  • Developed Servlets in order to deal with requests for account activity,
  • Developed Controller Servlets and Action Servlets to handle the requests and responses.
  • Developed Servlets and created JSP pages for viewing on a HTML page. Developed the front end using JSP.
  • Developed various EJB's to handle business logic.
  • Designed and developed numerous Session Beans deployed on Web logic Application Server.
  • Implemented Database interactions using JDBC with back-end Oracle. Worked on Database designing, Stored Procedures, and PL/SQL. Created triggers and stored procedures using PL/SQL.
  • Written queries to get the data from the Oracle database using SQL.
  • Implemented Backup and Recovery of the databases.
  • Actively participated in User Acceptance Testing, and Debugging of the system.

Environment: Java, spring, XML, Hibernate, SQL Server, Maven2, JUnit, J2EE, Servlets, JSP, Struts, Spring Restful Webservices, NATS, Hibernate, Oracle, TOAD, Web logic Server, AngularJS, HTML5, CSS3 all HTML and DHTML, Dimensional modeling, logical modeling and Physical data modeling, Windows 2000 adv. server, Windows 2000/XP, MS SQL Server 2000, IIS, MS Visual Studio.

We'd love your feedback!