Hadoop Developer Resume
Cleveland, OH
SUMMARY:
- 5+ years of IT experience in Analysis, Design, Development, Maintenance and Documentation for Data warehouse and related applications using ETL, BI tools, Client/Server and Web applications on UNIX and Windows platforms.
- Well - versed with the business domains like banking, Healthcare and Insurance in Big-Data implementations.
- Strong experience in Metadata Management using Hadoop User Experience (HUE)
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Experience in the Data Analysis, Data Mining, Data Mapping, Data Quality, and Data Profiling.
- Good working experience on Hadoop tools related to Data warehousing like Hive, Pig and involved in extracting the data from these tools on to the cluster using Sqoop.
- Developed Oozie workflow schedulers to run multiple Hive jobs that run independently with time and data availability
- Designed data models for Metadata Semantic Layer in ERwin data modeler tool.
- Extensive experience working in Oracle, DB2, SQL Server and MySQL database.
- Good understanding of NoSQL databases like HBase, Cassandra and MongoDB.
- Responsible for data governance processes and policies solutions using Data Preparation tools and technologies like Podium and Hadoop.
- Worked extensively on Erwin and ER Studio in several projects in both OLAP and OLTP applications.
- Created custom data models to accommodate business metadata including KPIs, Metrics and Goals
- Knowledge of complete Software Development Life Cycle
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews.
- Administration of the database including performance monitoring, Query tuning & optimization
- Worked on the data lineages between business and technical metadata, get the lineages reviewed and approved by business users and business system analysts
- Hands-on knowledge in core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
- Experience in Software Design, Development and Implementation of Client/Server Web based Applications using JSTL, jQuery, JavaScript, Java Beans, JDBC, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, AJAX.
- Worked on Agile Methodologies and used CA Agile Central
TECHNICAL SKILLS:
BigData Technologies (Hadoop, Pig, Hive, Impala, Sqoop, Oozie, HBase, Spark), NoSQL, Microsoft SQL SERVER 2014/2012/2010, Oracle, Teradata, PostgreSQL, SSIS/SSAS/SSRS, ERwin, Podium, Windows Server 2012 r2/2008 r2, T-SQL, Java, Python, Scala, HTML, XML, Shell, GIT, SVN, CVS. Agile and Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Cleveland, OH
Hadoop Developer
Responsibilities:
- Experience in writing Sqoop scripts to import and export data from RDBMS into HDFS, HIVE and handled incremental loading on the customer and transaction information data dynamically.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, CSV formats.
- Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
- Involved in the creation and design of Kafka producers for data ingest pipelines and consumed the data using Spark streaming for Spark Data frames and ingested the data into the Hive Tables.
- Setup Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
- Implemented Kerberos Security Authentication protocol for existing cluster
- Developed Scala code using Spark API and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Worked on improving the performance and optimization of the existing Spark application in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Experienced with real time processing of data sources using Apache Spark.
- Imported the data from different sources like HDFS/HBase into Spark RDD.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
- Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
- Created Tableau dashboards by connecting with the Hive tables.
- Integrated maven with GIT to manage and deploy project related tags.
- Integrated GIT into Jenkins to automate the code check-out process.
Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Scala, GIT, Jenkins, Java, SQL Scripting and Linux Shell Scripting, Cloudera.
Confidential, Bloomfield, CT
Hadoop Developer
Responsibilities:
- Worked on the Domains and Communities for Business Glossary and Information Governance Catalog.
- Prepared workflows for scheduling the load of data into Hive using IBIS Connections.
- Enhanced the tool DataModelValidator for the DMCOE team using Core Java and JDBC.
- Worked on a robust automated framework in data lake for metadata management that integrates various metadata sources, consolidates and updates podium with latest and high-quality metadata using the big data technologies like Hive and Impala.
- Ingested data from variety of data sources like Teradata, DB2, Oracle, SQL server and PostgreSQL sources to data lake using podium and solved various data transformation and interpretation issues during the process using Sqoop, GIT and udeploy.
- Responsible for data governance processes and policies solutions using Data Preparation tools and technologies like Podium and Hadoop.
- Created a data-profiling dashboard by leveraging podium internal architecture, which drastically reduced the time to analyze data quality using Looker reporting.
- Worked on an input agnostic framework for data stewards to handle their ever-emerging work group datasets and created a business glossary by consolidating them using Hive.
- Worked on a robust comparison process to compare data modelers’ metadata with data stewards’ metadata and identify anomalies using Hive and Podium Data.
- Created technical design documentation for the data models, dimensional models, data flow control process and metadata management.
- Designed data models for Metadata Semantic Layer in ERwin data modeler tool.
- Reversed Engineered and generated the data models by connecting to their respective databases.
- Strong experience in importing the metadata from various applications and build end-to-end Data Lineage using ERwin
- Extensive use of GIT as a versioning tool.
- Worked on Agile Methodologies and used CA Agile Central and Kanban Dashboard.
Environment: Podium Data, Data Lake, HDFS, Hue, Hive, Impala, Oozie, Pig, Looker, ERwin9.64, Linux, HTML, JavaScript, Core Java, Data warehousing, PostgreSQL, Unix, Teradata
Confidential, Livonia, MI
Hadoop Developer
Responsibilities:
- Performed data analysis and data profiling using complex SQL on various sources systems.
- Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues
- Involved in defining the source to target data mappings, business rules, and data definitions
- Involved in identifying the source data from different systems and map the data into the warehouse
- Created HBase tables to store variable data formats of input data coming from different portfolios
- Create technical design documentation for the data models, data flow control process, metadata management.
- Strong experience in importing the metadata from various applications and build end-to-end Data Lineage using ERwin
- Worked with HUE and analyzed the datasets.
- Successfully loaded files to Hive and HDFS from Teradata Database.
- Load and transform large sets of structured data.
Environment: BigData Technologies (Hadoop, Pig, Hive, Sqoop, HBase, Hadoop-Map reduce)MS SQL Server 2014/2008, SSIS Package, SQL BI Suite (SSMS, SSIS, SSRS, SSAS), XML, MS Excel, MS Access 2013Windows Server 2012, SQL Profiler, Erwin r 7.3., Net 4.5, TFS 2013.
Confidential, Southfield, MI
Data Analyst/SQL Developer
Responsibilities:
- Performed Enhancements and provided production support for HAP Claims Applications
- Analyzed the datasets and loaded the data into SQL Server tables for BI reporting project
- Configured and Maintained Report Manager and Report Server for SSRS.
- Created reports to retrieve data using Stored Procedures that accept parameters depending upon the client requirements
- Involved in Debugging and Deploying reports on the production server
- Involved in data management processes and ad hoc user requests
- Involved in requirements gathering, source data analysis and identified business rules for data migration and for developing data warehouse/data mart
- Involved in identifying and defining the data inputs and captured metadata and associated rules from various source of data for ETL Process for data warehouse
- Worked with Business Analyst to develop business rules that support the transformation of data
- Helped designing and implementing processes for deploying, upgrading, managing, archiving and extracting data for reporting
- Performed maintenance duties like performance tuning and optimization of queries, functions and stored procedures
Environment: SAS 9.2, SAS Datasets, SQL Server 2008/2012, SSIS, SSRS, Tidal Job Scheduler
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in each phase of Software Development Life Cycle (SDLC) models like Requirement gathering and analysis, Design, Implementation, Testing, Deployment and Maintenance.
- Developed Login, Policy and Claims Screens for customers using HTML 5, CSS3, JavaScript, AJAX, JSP, and jQuery.
- Used Core Java to develop Business Logic.
- Involved in the development of business module applications using J2EE technologies like Servlets, JSP.
- Designed and developed the web-tier using JSP's, Servlets framework.
- Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features and enhancements.
- Strong experience in design & development of applications using Java/J2EE components such as Java Server Pages (JSP).
- Developed EJB MDB's and message Queue's using JMS technology.
- EJB Session Beans were used to process requests from the user interface and CMP entity beans were used to interact with the persistence layer.
- Developed stored procedures, triggers, and queries using PLSQL in SQL Server.
- Use Spring MVC as framework and JavaScript for client-side view, used frameworks for client-side data validation, creating dynamic web pages-Ajax, jQuery.
- Developed model classes based on the forms to be displayed on the UI.
- Implemented various design patterns in the project such as Business Delegate, Data Transfer Object, Data Access Object, Service Locator and Singleton.
- Used SQL statements and procedures to fetch the data from the database.
- Developed test cases and performed unit test using JUnit Framework.
- Used CVS as version control and ANT scripts to fetch, build, and deploy application to development environment.
Environment: s: Java, HTML, CSS, JavaScript, MySQL, Struts, EJB, Spring MVC.