Data Base Engineer Resume
Billerica, MA
SUMMARY
- Currently working in Confidential as Database Engineer in Intellisight Product Development team.
- Experienced professional in developing highly scalable Big - Data solutions and ETL & BI solutions.
- Strong understanding of HDFS, Map Reduce and YARN frameworks.
- Good understanding in Hadoop Architecture and its eco system tools such as Hive, Spark, Sqoop, Oozie, and Kafka for data extraction, storage and analysis.
- Experienced in using Map Reduce components like Mapper, Shuffling, Partitioning, and Reducer and Combiner process along with custom Partitioning and Bucketing.
- Experience in installing, upgrading and managingVerticadatabase clusters. Good experience in creating and managing database users, schemas, roles, tables, views, projections.
- Experience with query optimization, projection design,VerticaDB performance tuning using configuration parameters and resource pool settings.
- Good understanding about Spark core and Spark SQL with Hive support to load, transform and storage the data.
- Experienced in analyzing data using HiveQL, Pig Latin and custom Map Reduce programs in Java.
- Experienced in Partitioning, bucketing and perform different types of joins on Hive tables and implementing with different serdes using Regular Expressions.
- Developed and supported several Map Reduce applications in Java to handle Semi and Unstructured data.
- Experienced with Oozie workflow management for sequence and parallel execution of Spark, Java, Map Reduce, Hive, Pig, Sqoop jobs.
- Experienced in dealing with the different file formats like XML, JSON, CSV, Sequence files, Avro and Parquet.
- Strong understanding of encoding & compressions techniques on Vertica tables and Parquet files to save disk space and improve performance.
- Experienced in development of scripts using in Linux Kernel for automation of processes using Shell scripting.
- Experienced in writing bash shell & python Scripts to automate operations on UNIX platforms.
- Proficient in Object Oriented Programming, data structures and client server applications using Java.
- Hands on experience in using R & Python programming for data analysis and for creating data visualizations.
- Experienced in using cloud-computing platforms such as Amazon webs Services and Google Cloud Platform.
- Designed web applications using various web technologies such as HTML, CSS, JavaScript, PHP & ASP .net.
- Experienced and expertise knowledge in writing SQL Queries and stored procedures on various relational databases.
- Strong understanding ofdata warehousing concepts, Database Design, OLAP, Normalization, Multi-dimensional Cubes, Data modeling using Star schema and Snow Flake schemadata models.
- Strong knowledge and hands on experience on data visualization with many BI tools in creating bar charts, pie charts, Dot charts, Boxplots, Subplots, Histograms, Error Bars, Multiple chart types, Time series etc.
- Experienced working in Agile and Waterfall methodologies of Software Development Life Cycle (SDLC) including design, development, implementation, testing, deployment and support maintenance.
TECHNICAL SKILLS
Big Data Tools: Hadoop, HDFS, Hive, Spark, Map Reduce, Vertica, Kafka, Pig, Sqoop, Yarn
Cloud Platforms: Amazon Web Services, Google Cloud Platform
Programming & Scripting: Java, Python, C++, Shell Scripting, Scala, R
BI/ETL Tools: Microsoft Business Intelligence (SSRS, SSIS & SSAS), Tableau, Qlikview, Power BI, OBIEE, IBM Info sphere Data Stage
Web Designing Skills: HTML5, CSS3, JavaScript, JQuery, PHP, ASP .Net
SQL: VSQL, SQL Server, Oracle, MySQL, Postgres
3rd Party Tools: JIRA, Perforce, Eclipse, TeamCity, Ansible, Microsoft Office Stack, etc.
PROFESSIONAL EXPERIENCE
Confidential, BILLERICA, MA
DATA BASE ENGINEER
Responsibilities:
- Working as Database Developer in the product development team of Intellisight, a real time big data analytical platform for telecom organizations to process information, uncovering insights dat can leverage to generateactionable intelligence and analysisbased on highly nuanced results.
- Intellisight is a platform developed in multi node cluster on top of Cloudera Hadoop distribution and Vertica database.
- Understand the source code and modify accordingly develop new enhancements for next releases and to resolve bugs.
- Analyze source data architecture and design new data architecture to leverage Vertica's capabilities. This includes analysis of data types, data structures, encoding & compression techniques inVertica.
- Profile importantverticaqueries and optimizeverticaprojection Segmentation, table Partitioning and Tuple mover, move-out and merge out tasks.
- Install & Upgrade Vertica, Create DBs, Resource pool allocations, Loading, Recovery & Backup DBs are some of the essential responsibilities as Verica DBA.
- Created User Defined Functions (UDFs) for Vertica using C++ and Java.
- Python and shell scripting to automate the tasks. Make changes to Java API loading, building schemas & configuration settings in Vertica.
- Involved in creating Tables on various SQL DB engines such as Spark, Hive & Impala using parquet files, loading the data in the tables and wrote queries to analysis the data.
- Developed scripts and automated data management configurations from end to end to sync among all nodes in cluster.
- TuneVerticalong running queries using query optimizer as per requests.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Developed spark code and spark-SQL/streaming for faster testing and processing of data.
- Involved in creating Hive Tables, loading the data in the tables and wrote Hive queries to analysis the data.
- Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Implemented schema evaluation on tables using parquet files.
- Did performance testing on terabytes of data using various query engines such as Hive, Spark and Impala.
Confidential
IT ANALYST INTERN
Responsibilities:
- Extract raw data from transactional and other operational applications; transform it into useful information using SSIS packages, and created cubes using SSAS and visualizations using SSRS, PowerBI to reveal insights, which halped business to work faster, more accurate, and to take actionable decisions across organization.
- Involved in writing MDX queries and performance optimization of the SSAS cubes.
- Responsible for Developing OLAP Cubes, Data Source View, Partitions of queries with MDX.
- Develop dashboards in Tableau Desktop and published them on to Tableau Server dat allowed end users to understand the data on the fly with the usage of quick filters for on demand-needed information.
- Created views in Tableau Desktop dat were published to internal team for review and further data analysis and customization using filters and actions.
- Developed story-telling dashboards in Tableau Desktop and published them on to Tableau Server, which allowed end users to understand the data on the fly with the usage of quick filters for on demand-needed information.
Confidential
BIG DATA ANALYST
Responsibilities:
- Used Hive as an ETL tool to perform Transformations, joins and aggregations before storing data into HDFS.
- Automated all the jobs, for pulling net flow data from the relational databases using sqoop to load data into Hive tables, workflows using Oozie and enabled email alerts on any failure cases.
- Creating DDLs partitioned external and internal tables on various data stage layers to view the data using Hive.
- Exported the transformed data using sqoop to make it available for generation of reports & dashboards.
- Created and modified UDF and UDAFs for Hive whenever necessary and developed Hive queries.
- Created workflow and coordinator using Oozie for regular jobs and to automate the tasks of loading data into HDFS
- Configured oozie coordinator jobs to extract the data (full load or incremental load) using Sqoop from various relational databases.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
Confidential
ETL ANALYST
Responsibilities:
- Worked on analyzing, writing Hadoop Map Reduce jobs using Java API, Pig Latin and Hive.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Implemented various ETL scripts and Read and write data from/to Hadoop file system-using Data Stage.
- Implemented a script to transmit information from Oracle to Hive using Sqoop.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
Confidential
BI/ETL DEVELOPER
Responsibilities:
- Experienced in full SDLC, end-to-end development of software products from requirement analysis to till it moves to production, design, coding, testing, de-bugging, documentation and implementation.
- Analyzed source systems, designed data model, developed ETL scripts and created cubes, reporting solutions, dashboards and KPIs for Human Resources, Finance and Supply Chain Management.
- Designed data modelling, defined metrics & created cubes for finance Profits & Loss reports
- Data modelling, Write SQL Queries, Data extraction, ETL Operations, Data loading, Testing, Database administration, Cubing and BI Visualization reports building using Data Stage and OBIEE.
- Used Data Stage as an ETL tool to extract data from sources systems, loaded the data into the ORACLE database.
- Implemented Data Stage Jobs to extract data from heterogeneous sources, applied transform logics to the extracted data and Loaded into Data Warehouse Databases.
- Used the Data Stage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on ad hoc or scheduled basis.
- Created shell scripts to run data stage jobs from UNIX and tan schedule these scripts through scheduling tool.
