Data Engineer Resume
Menlo Park, CA
PROFESSIONAL SUMMARY:
- More than 7+ years’ professional IT experience in Collecting requirement, Analysis, Architecture, Design, Documentation and Implementation of Data Warehousing solutions.
- More than 5+ years’ experience in all phases of Hadoop related technologies like MapReduce, Pig, Oozie, Hive, Zookeeper, Sqoop, Scala, Hbase, Hortonworks, Hue, and Cloudera.
- 2+ years’ expertise in design and development of Tableau visualization and dashboard solutions using Tableau Desktop and Tableau Server Administration.
- 4+ years’ of experience in Business Objects enterprise Products ( Web Intelligence, Desktop Intelligence, Info Universe Designer, View, CMC, CCM and CMS ).
- Expertise in Informatica & DataStage Admin, Production Deployments and liaison with IBM/Informatica.
- Expertise in Data warehousing concepts, Dimensional Modeling, Data Modeling, OLAP and OLTP systems.
- Experience with developing applications using Java, J2EE Technologies Servlets, JSP, Java Web Services, JDBC, XML, Cascading, spring, Hibernate.
- Expertise in using ETL methodology for supporting of Extract, Transform, and Load environment using Informatica Power Center 9.x/8.x (Designer, Repository manager, Repository Server Administrator console, Work flow manager, workflow monitor, Server Manager).
- Ability to optimize the usage of Hadoop to get maximum performance from Amazon Web Services, RackSpace and In - House Cluster.
- Experience using integrated development environment like Eclipse, Net beans, JDeveloper, My Eclipse.
- 7 years’ experience in Teradata, HP Vertica 7.X, Netezza, Oracle 10g/9i/8i, MS SQL Server, Cassandra, Sybase SQL Server, DB2, MS Access and MS Excel.
- Experience working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures and other components of database applications.
- Experience using source code management tools such as GIT, SVN, and Perforce.
TECHNICAL SKILLS:
BigData/ Hadoop Framework: HDFS, MapReduce v1/v2, Yarn, Pig, Hive, Presto, Sqoop, Oozie, ZooKeeper, Flume and HBase, Kafka, Spark
Databases: HP Vertica, Teradata, Netezza, Cassandra and Oracle 9i/10g, Microsoft SQL Server, MySQL
Languages: Java/J2EE, Scala, Spring, Hibernate, Python, Bash, SQL, Pig Latin
BI Tools: Informatica Powercenter 9.x, Business Objects XI, Tableau Desktop, QlikView, R Studio
Operating Systems: Windows XP/7, CentOS, Ubuntu
Development Tools: Intellij IDEA, Eclipse, NetBeans, Visual Studio
Development Methodologies: Six sigma Development methodologies, Agile/Scrum, Waterfall
WORK EXPERIENCE:
Confidential, Menlo Park, CA
Data Engineer
Responsibilities:
- Performed Data analysis, Data Profiling and Requirement Analysis.
- Analysed massive and highly complex HIVE data sets, performing ad-hoc analysis and data manipulation.
- Designed and developed custom data integration pipelines on Facebooks big data stack such as python, YAML, Hive, Vertica and Dataswarm.
- Designed and developed custom aggregation framework for reporting and analytics in Hive, Presto and Vertica
- Developed ETL mappings and workflows using Informatica and Dataswarm.
- Developed HIVE scripts to transfer data from and to HDFS.
- Prepared Chronos workflows to schedule daily loads based on time or file arrivals.
- Used Informatica to extract, transform & load data from SQL Server to Oracle databases.
- Created dynamic BI report/dashboard for production support in Excel/PowerPoint/Power BI/Tableau/ My SQL Server/ PHP.
- Worked on complex information model, logical relationships, and the data structures from MySQL, ORACLE, and HIVE/PRESTO.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Wrote reports using Tableau Desktop to extract data for analysis using filters based on the business use case.
- Created UNIX shell scripts to be used in conjunction with files.
- Performed the Data Accuracy, Data Analysis, Data Quality checks before and after loading the data.
- Experience in Performance Tuning in Oracle and SQL Optimizing. During the tuning process developed various indexes and partitions as needed.
- Status reporting which includes Overall project development status.
- Provide solution to any complex requirements from the End Users like Automated notification in case of any data discrepancies in the reporting tables.
- Perform analysis on the existing data warehouse objects and find the optimum way to relate the new source objects.
- Experience in different Python data science libraries like NumPy and Pandas, and did a POC on Sentiment Analysis.
- Interacted with the end users, Business Analysts and Architects for collecting, understating the business requirements. Documented them and translated requirements into technical/system solutions.
- Involved in different phases of building the Data Marts like analysing business requirements, ETL process design, performance enhancement, go-live activities and maintenance.
- Coordinated with the Business Analyst Team for requirement gathering and Allocation Process Methodology, designed the filters for processing the Data.
- Interacted with Business Analysts and Data Modellers and characterized Mapping reports and Design process for different Sources and Targets.
- Using Tableau extract to perform offline investigation.
- Blended data from different information sources by utilizing connecting component as a part of Tableau Desktop.
- Extensive Tableau Experience in Enterprise Environment and Tableau Administrator experience including specialized bolster, investigating, report plan and checking of framework use
- Worked on Business Intelligence standardization to create database layers with user-friendly views in Vertica that can be used for development of various Tableau reports/ dashboards.
Environment: Dataswarm, Hive, Presto, Vertica, Python, MySQL, Oracle, ETL Methods, Informatica, Linux, HDFS, Tableau Desktop 9/10, Linux, Data visualization in D3, Microsoft Excel.
Confidential, Bloomingdale’s, NYBig-Data / Hadoop Developer
Responsibilities:
- Worked on 200+ nodes Hadoop cluster running CDH5.4
- Worked with highly unstructured and semi structured data of 110 TB in size (300+ TB with replication factor of 3)
- Extensive experience in writing Pig scripts to transform raw data from several data sources in to forming baseline data
- Developed Hive scripts for end user / analyst requirements for Adhoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Performed Near Real - time Analysis on clickstream data using Kafka and Spark for a POC project for Bloomingdales e-commerce division.
- Created Interactive reporting dashboards by combining multiple views in Tableau Dashboard.
- Designed and created various analytical reports and dashboards to help business unit to identify critical KPIs and facilitate decision making and strategic planning in the unit
- Developed UDFs using JAVA as and when necessary to use in PIG and HIVE queries.
- Experience in using Sequence files, AVRO and HAR file formats
- Extracted the data from Teradata into HDFS using Sqoop.
- Have excellent hands on experience on Teradata utilities like MLOAD, FASTLOAD, TPUMP, FASTEXPORT, BTEQ and ARCHMAIN.
- Created Sqoop job with incremental load to populate Hive External tables.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Good working knowledge of HBase.
- Involved in gathering business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Java 7, Eclipse, Oracle 10g, Tableau 9.X, Hadoop, MapReduce, Hive, HBase, Oozie, Linux, HDFS, Hive, CDH, SQL, Toad 9.6, Kafka, Spark and Scala.
Confidential, Chicago, ILBusiness Intelligence Developer
Responsibilities:
- Created Reports, Dashboards and Storyboards in Tableau 9.0 and validated the loads from OLTP systems.
- Designed and developed dashboards for various business units like finance, marketing, operations and risk management using Tableau to analyse about five Terabytes of data each day.
- Created Data Quality Dashboards and did Application performance analysis and monitoring using Tableau.
- Created data extracts in Tableau by connecting to the view using Tableau MSSQL connector.
- Extensively used data joining and blending and other advanced features in Tableau on various data sources like Hive Tables, MySQL Tables and Flat files.
- Good experience with configuration, adding users, managing licenses and data connections, scheduling tasks, embedding views on Tableau Server.
- Involved in Trouble Shooting, Performance tuning of reports and resolving issues with in Tableau Server and Reports.
- Defined best practices for Tableau report development.
- Monitoring the system objects likes huge files/ unused indexes and taking necessary steps to improve the performance of applications as well as batch jobs Apache Hadoop installation & configuration of multiple nodes cluster using Cloudera Manager.
- Setup and optimize Standalone - System/Pseudo-Distributed/Distributed Clusters.
- Build/Tune/Maintain Hive QL and Pig Scripts for user reporting.
- Experienced in defining Oozie job flows.
- Experienced in managing and reviewing Hadoop log files.
- Developed and supported MapReduce Programs running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive.
- Involved in creating Hive tables, loading data, and writing Hive queries.
- Develop Shell scripts to automate routine DBA tasks (i.e. database refresh, backups, monitoring).
- Tuned/Modified SQL for batch and online processes.
Environment: CDH Hadoop (HDFS) multi-node installation, Tableau 8.X/9.X, Map Reduce, AWS, Hive, flume, Java, JDK, Flat Files, PL SQL, UNIX Shell Scripting.
Confidential,Chicago, IL
Data Warehouse Consultant
Responsibilities:
- Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyse data using visualization/reporting tools to analyse user s transactional data.
- Installed & configured SAP Integration Kit with Business objects, integrated crystal with SAP BW and build reports based from BW Cubes, Info sets, R/3 tables.
- Designed Business Object universes and trained end users and peers and explained them different functionalities of business objects designer, WebI and DeskI
- Worked on a project to collect the logs from the physical machines and the OpenStack controller and integrated into Hadoop HDFS using Flume.
- Designed complex dashboards and reports by linking data from multiple data providers, using free hand SQL and functionalities like Combined Queries.
- Resolved Loops, Fan traps and Chasm traps with Aliases and Contexts.
- Created Measure objects, custom LOV's and Hierarchies for easy user selection drill down purposes.
- Gathered and analysed requirements and prepared business rules for migration from Oracle to Informatica.
- Developed complex mappings by using Rank, Expression, Lookup, Update, Sequence generator, Aggregator, Router, Stored Procedure transformations to implement complex logics while coding a mapping.
- Worked with Informatica power centre Designer, Workflow Manager, Workflow Monitor and Repository Manager.
- Developed and maintained ETL (Extract, Transformation and Loading) mappings to extract the data from multiple source systems like Oracle, SQL server, Netezza, Ab Initio and Flat files, java and loaded into Teradata.
- Load data from several flat files sources using Teradata utilities (TPT, BTEQ, MLOAD, FAST LOAD and FAST EXPORT).
- Loaded the mainframe files to Teradata production region, to perform the value added processing (VAPS) before providing the data to vendor.
- Developed Informatica Workflows and sessions associated with the mappings using Workflow Manager.
- Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
- Identified parameters, look - ups, call procedures, SQL over-riders and command-line utilities.
- Conducted code review, fine tune and analyse mappings and loads.
- Conducted detailed analysis to improve the load performance.
- Designed load validation reports and analysis on reports.
Environment: Hadoop, Informatica, Business Objects XI R2, SQL Server 2000/2003, Oracle 10g/9i, DB2, Tomcat 4.0, Apache server 1.3, Window 2003 Server/XP.
ConfidentialJava/J2EE Developer
Responsibilities:
- Responsible for understanding the scope of the project and requirement gathering.
- Developed the web tier using JSP, Struts MVC to show account details and summary.
- Created and maintained the configuration of the Spring Application Framework.
- Implemented various design patterns - Singleton, Business Delegate, Value Object and Spring DAO.
- Used Spring JDBC to write some DAO classes to interact with the database to access account information.
- Mapped business objects to database using Hibernate.
- Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for JUnit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/ SQL code for procedures and functions.
- Used CVS, Perforce as configuration management tool for code versioning and release.
- Developed application using Eclipse and used build and deploy tool as Maven.
- Used Log4J to print the logging, debugging, warning, info on the server console.
Environment: Java, J2EE, JSON, LINUX, XML, XSL, CSS, Java Script, PUTTY, Eclipse