Senior Big Data Engineer Resume
Branchburg, NJ
SUMMARY
- Around 8+ years of experience in IT, this includes Implementation of Data Warehousing projects with Teradata.
- Strong understanding of Data warehouse project development life cycle. Expertise in Teradata/Netezza/Redshift Database design, implementation and maintenance mainly in Data Warehouse environments.
- Experience in Hadoop based data environments from source system data like DWH to HDFS storage systems using Sqoop (for import/export). Hive(analytical purposes) etc.
- Trained and well versed of Data pre - processing & Visualization Techniques in Data Science(Supervised & Unsupervised Algorithms) using python libraries (scikit, pandas, seaborn, Matplotlib etc)
- Experience in implementing data migration projects from on - prem Oracle, Teradata to Cloud Datalake and building ELT (for bulk & CDC loads) & ETL data pipelines in SnowFlake environment (snowsql, snowpipe, streams etc.).
- Experience in handling services on AWS cloud like S3 for storage management, creating/configuring/ integrating IAM roles, EC2 etc. knowledge on handling streaming data using AWS kinesis.
- In depth understanding and usage of Teradata OLAP functions. Proficient in Teradata SQL, Stored Procedures, Macros, Views, Indexes (Primary, Secondary, PPI, Join indexes etc.). 3+ years of experience in Teradata production support.
- Well versed with HADOOP framework and Analysis, Design, Development, Documentation, Deployment and Integration using SQL and Big Data technologies.
- Experience in using di erent Hadoop eco system components such as HDFS, YARN, MapReduce,Spark, Sqoop, Hive and Kafka.
- Experience with Data warehousing and Data mining, using one or more NoSQL Databases like HBase, Cassandra, and Mongo DB.
- Design BI Applications in Tableau, QlinkView, SSIS, SSRS, SSAS, OBIEE, Cognos, Informatica.
- Part of the Agile BI/ ETL Team and attend regular user meeting to go through the requirement for the Data/ BI sprints. Highly visible data ow, Dashboards and reports are created based on the user stories.
- Experience in using Sqoop to ingest data from RDBMS to HDFS.
- Experience in Cluster coordination using Zookeeper and worked on file formats like Text, ORC, Avro, Parquet and compression techniques like Gzip and Zlib.
- Experience in using various Python libraries like Numpy, SciPy, python-twitter, pandas.
- Worked on visualization tools like Tableau for report creation and further analysis.
- Experienced with Spark processing framework such as Spark SQL, and Data Warehousing and ETL processes.
- Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine and imported datafromAWS S3 into Spark RDD, performed transformations and actions on RDDs.
- Experience with spark streaming and to write spark jobs.
- Experience in developing high throughput streaming applications from Kafka queues and writing enriched data back to outbound Kafka queues.
- Good understanding of AWS S3, EC2, Kinesis and Dynamo DB.
- Used Rstudio for data-preprocessing and building machine learning algorithms on datasets.
- Good knowledge on NLP, Statistical Models, Machine Learning, Data Mining solutions to various business problems and generating using R, Python.
- Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake,Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
- Experience in real-time analytics with Spark RDD, Data Frames and Streaming API.
- Used Spark Data Frame API over Cloudera platform to perform analytics on Hive data.
- Knowledge in integration of data from various sources like RDBMS, Spreadsheets, Text Files.
- Acquires good understanding of JIRA and maintaining JIRA dashboards.
- Knowledge in using Java IDE’s like Eclipse and IntelliJ
- Used Maven for building projects.
- Designed end to end scalable architecture to solve business problems using various Azure componentslike HDInsight, Data factory, Data lake, Storage and Machine Learning Studio.
- Ability to work independently as well as in a team and able to e ectively communicate with customers, peers and management at all levels in and outside the organization.
- Hands on experience on Hortonworks and Cloudera Hadoop environments.
- Provided production support and involved with root cause analysis, bug Fixing and promptly updating the business users on day-to-day production issues.
- Developed DAGs and automated the process for the data science teams.
- Developed Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Involved in daily SCRUM meetings to discuss the development/progress of Sprints and was active in making scrum meetings more productive.
- Developed data pipelines using ETL tools SQL Server Integration Services (SSIS), Microsoft VisualStudio (SSDT).
- Experience in designing visualizations using Tableau software and storyline, publishing and presenting dashboards.
- Developed spark applications in python (Pyspark) on distributed environment to load huge number of CSV files with di erent schema into Hive ORC tables.
- Experience in maintaining an Apache Tomcat MYSQL, LDAP, Web service environment.
- Designed ETL work ows on Tableau, Deployed data from various sources to HDFS.
- Proven ability to manage all stages of project development, strong problem solving and analytical skills and abilities to make balanced and independent decision.
TECHNICAL SKILLS
Big Data / Hadoop Technologies: MapReduce, Spark, SparkSQL, Azure, Spark Streaming, Kafka, PySpark,, Pig, Hive,HBase, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari Server
Languages: HTML5,DHTML, WSDL, CSS3, C, C++, XML,R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), Java Script, Shell Scripting
No SQL Databases: Cassandra, HBase, MongoDB, MariaDB
Web Design Tools: HTML, CSS, JavaScript, JSP, jQuery, XML
Development Tools: Microsoft SQL Studio, IntelliJ, Azure Databricks, Eclipse, NetBeans
Public Cloud: EC2, IAM, S3, Autoscaling, CloudWatch, Route53, EMR, RedShift
Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall
Build Tools: Jenkins, Toad, SQL Loader, PostgreSql, Talend, Maven, ANT, RTC, RSA, Control-M,Oozie, Hue, SOAP UI
Reporting Tools: MS OfXice (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS,cognos.
Databases: Microsoft SQL Server, MySQL, Oracle, DB2, Teradata, Netezza.
Operating Systems: All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris
PROFESSIONAL EXPERIENCE
Confidential, Branchburg, NJ
Senior Big Data Engineer
Responsibilities:
- Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap.
- Installing, configuring and maintaining Data Pipelines.
- Designing the business requirement collection approach based on the project scope and SDLC methodology.
- Files extracted from Hadoop and dropped on daily hourly basis into S3.
- Authorizing Python (PySpark) Scripts for custom UDF’s for Row/ Column manipulations, merges, aggregations, stacking, data labeling and for all Cleaning and conforming tasks.
- Writing Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Develop solutions to leverage ETL tools and identify opportunities for process improvements using Informatica and Python.
- Conduct root cause analysis and resolve production problems and data issues.
- Performance tuning, code promotion and testing of application changes.
- Conduct performance analysis and optimize data processes. Make recommendations for continuous improvement of the data processing environment Conduct performance analysis and optimize data processes. Make recommendations for continuous improvement of the data processing environment.
- Develop a data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirement.
- Design and implement multiple ETL solutions with various data sources by extensive SQL Scripting, ETL tools, Python, Shell Scripting and scheduling tools. Data profiling and data wrangling of XML, Webfeeds and File handling using python, Unix and Sql.
- Loading data from di erent sources to a data ware house to perform some data aggregations for business Intelligence using python.
- Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2.
- Used Sqoop to channel data from di erent sources of HDFS and RDBMS.
- Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple File formats.
- Used SSIS to build automated multi-dimensional cubes.
- Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using python and NoSQL databases such as HBase and Cassandra.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the Fly to build the common learner data model and persists the data in HDFS.
- Prepared and uploaded SSRS reports, manages database and SSRS permissions.
- Used Apache NiFi to copy data from local File system to HDP. Thorough understanding of various modules of AML including watch list filtering, suspicious activity monitoring, CTR, CDD and EDD.
- Used SQL Server Management Tool to check the data in the database as compared to the requirement given.
- Validated the test data in DB2 tables on Mainframes and on Teradata using SQL queries.
- Experience with Cloud service providers such as Amazon AWS, Microsoft Azure, and Google GCP.
- Data Analysis: Expertise in analyzing data using Pig scripting, Hive queries, sparks (python) and Impala.
- Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB.
- Identified and documented Functional/Non-Functional and other business related decisions for implementing Actimize-SAM to comply with AML Regulations.
- Work with region and country AML Compliance leads to support start-up of compliance-led projects at regional and country levels. Including defining the subsequent phases training, UAT, sta to perform test scripts, data migration and the uplift strategy (updating of customer information to bring them to the new KYC standards) review of customer documentation.
- Description of End-to-end development of Actimize models for trading compliance solutions of the project bank.
- Expertise in Machine learning, graph analytics and text mining techniques such as classification, regression, clustering, feature engineering, label propagation, Page rank, information extraction, topic modeling etc.
- Used machine learning techniques to build learning models that allow the organization to predict outcomes with business implication.
- Automated and scheduled recurring reporting processes using UNIX shell scripting and Teradata utilities such as MLOAD, BTEQ and Fast Load.
- Implemented Actimize Anti-Money Laundering (AML) system to monitor suspicious transactions and enhance regulatory compliance.
- Worked on Dimensional and Relational Data Modeling using Star and SnowFlake Schemas, OLTP/OLAP system, Conceptual, Logical and Physical data modeling using Erwin.
- Automated the data processing with Oozie to automate data loading into the Hadoop Distributed FileSystem.
- Used Machine learning models like Random forest, Decision tree, Neural networks in regression methods over the data base from the target.
- Cleaned input text data using PySpark Machine learning feature extraction API.
- Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.
Environment: Cloudera Manager (CDH5), PySpark, HDFS, NiFi, Pig, Hive, AWS, S3, Kafka, SSIS, Snow ake, PyCharm, Scrum, Git, Sqoop, Azure, PySpark, HBase, Informatica, SQL, Python, XML, UNIX.
Confidential, Sunnyvale, CA
Sr. Data Engineer
Responsibilities:
- Familiarity with Hive joins & used HQL for querying the databases eventually leading to complex HiveUDF’s.
- Installed OS and administered Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
- Worked on installing Cloudera Manager, CDH and install the JCE Policy File to create a Kerberos principal for the Cloudera manager server, enabling Kerberos using the wizard.
- Conducted exploratory data analysis using python, Matplotlib and seaborn to identify underlying patterns and correlation between features.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Worked on Configuration Kerberos authentication in the cluster.
- Experience in creating tables, dropping and altering at run time without blocking updates and queries using HBase and Hive.
- Research and develop Machine learning models for security problems in the areas of Networking, applications & data.
- Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deep learning.
- Experience in working with di erent join patterns and implemented both map and reduce side joins.
- Wrote ume configuration files for importing streaming log data into HBase with Flume.
- Imported several transactional logs from web servers with ume to ingest the data into HDFS.
- Used Flume and Spool directory for loading the data from local file system(LFS) to HDFS.
- Installed and configured pig, written pig latin scripts to convert the data from text file to Avro format.
- Created partitioned hive tables and worked on them using HiveQL.
- Loading data into HBase using bulk load and Non-bulk load.
- Worked on continuous integration tools like Jenkins and automated jar files at end of day.
- Worked on tableau and integrated Hive, tableau desktop reports and published to tableau server.
- Developed MapReduce programs in Java for parsing the raw data and populating staging tables.
- Experience in setting up the whole app stack, setup, and debug log stash and send Apache logs to AWS Elasitic Search.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Used spark-SQL to load JSON data and create schema and loaded it into Hive tables and handled structured data using Spark SQL.
- Implemented Spark Scripts using Scala, Spark, Spark SQL to access hive tables into Spark for faster processing of data.
- Extract Transform and Load data from source systems to Azure Data storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Data bricks.
- Tested Apache Tez for building high performance batch and interactive data processing applications on Pig and Hive jobs.
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, PostgreSQL, Scala, Data Frame, Impala, OpenShift, Talend, pair RDD’s.
- Setup data pipeline using in TDCH, Talend, Sqoop and PySpark on the basis on size of data loads.
- Implemented real time analytics on Cassandra data using thrift API.
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed transformations and exported the data to Cassandra.
- Leading the testing e orts in support of projects/programs across a large landscape of technologies (Unix, AngularJS, AWS, Sause LABS, Cucumber JVM, Mongo DB, GitHub, Bitbucket, SQL, NoSQL database, API, Java, Jenkins.
- Experience in using MapR File system, Ambari, Cloudera Manager for installation and management of Hadoop Cluster.
- Worked on writing Scala Programs using Spark/Spark-SQL in performing aggregations.
- Developed Web Services in play framework using Scala in building stream data Platform.
- Worked with data modelers to understand Financial data model and provided suggestions to the logical and physical data model.
- Perform Table partitioning, monthly & yearly data Archival activities.
- Developing python scripts for Redshift CloudWatch metrics data collection and automating the data points to redshift database.
- Developed scripts for loading application call logs to S3 and used AWS Glue ETL to load into Redshift for data analytics team.
- Installing IBM Http Server, WebSphere Plugins and WebSphere Application Server NetworkDeployment (ND).
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Provide troubleshooting and best practices methodology for development teams.
- This includes process automation and new application onboarding.
- Produce unit tests for Spark transformations and helper methods. Design data processing pipelines.
- Configuring IBM Http Server, WebSphere Plugins and WebSphere Application Server NetworkDeployment (ND) for user work-load distribution.
- Multiple batch jobs were written for processing hourly and daily data received through multiple sources like Adobe, No-SQL databases.
- Testing the processed data through various test cases to meet the business requirements.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Design data solutions for Enterprise Data Warehouse using ETL and ELT methodologies.
- Interact with business stakeholders from various teams such as Finance, Marketing, e-commerce etc., understand their analytical and business needs define metrics and translate to BI solution.
Environment: Cloudera CDH5.13, Ambari, IBM Web Sphere, Hive, Python, HBase, Spark, Scala, Map Reduce, HDFS, Sqoop, AWS, Flume, Linux, Shell Scripting, Tableau, UNIX, Kafka, SQL, No-SQL.
Confidential, New York, NY
Data Engineer
Responsibilities:
- Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration, writing functional and technical specifications, creating source to target mapping, designing data profiling and data validation jobs in Informatica, and creating ETL jobs in Informatica.
- Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production.
- Built API’s that will allow customer service representatives to access the data and answer queries.
- Designed changes to transform current Hadoop jobs to HBase.
- Handled fixing defects efficiently and worked with the QA and BA team for clarifications.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log Files.
- Extending the functionality of Hive with custom UDF s and UDAF’s.
- The new Business Data Warehouse (BDW) improved query/report performance, reducedthe time needed to develop reports and established self-service reporting model in Cognos for business users.
- Implemented Bucketing and partitioning using hive to assist the users with data analysis.
- Used OOZIE scripts for deployment of the application and perforce as the secure versioning software.
- Implemented partitioning, dynamic partitions, buckets in HIVE.
- Develop database management systems for easy access, storage, and retrieval of data.
- Perform DB activities such as indexing, performance tuning, backup and restore.
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (DataFlow language), and custom MapReduce programs in Java.
- Did various performance optimizations like using distributed cache for small datasets, Partition,Bucketing in the hive and Map Side joins.
- Expert in creating Hive UDFs using Java to analyze the data efficiently.
- Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
- Implemented AJAX, JSON, and Java script to create interactive web screens.
- Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB. Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Processed the image data through the Hadoop distributed system by using Map Reduce then stored into HDFS.
- Created Session Beans and controller Servlets for handling HTTP requests from Talend.
- Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
- Wrote documentation for each report including purpose, data source, column mapping, transformation, and user group.
- Utilized Waterfall methodology for team and project management.
- Used Git for version control with Data Engineer team and Data Scientists colleagues. Involved in creating Tableau dashboards using stack bars, bar graphs, scattered plots, geographical maps, charts, using show me functionality. Dashboards and stories as needed using Tableau Desktop and Tableau Server.
- Responsible for daily communications to management and internal organizations regarding status of all assigned projects and tasks.
- Executed quantitative analysis on chemical products to recommend e ective combinations.
- Performed statistical analysis using SQL, Python, R Programming and Excel.
- Worked extensively with Excel VBA Macros, Microsoft Access Forms.
- Import, clean, Filter and analyze data using tools such as SQL, HIVE and PIG.
- Used Python& SAS to extract, transform & load source data from transaction systems, generated reports, insights, and key conclusions.
- Manipulated and summarized data to maximize possible outcomes efficiently.
- Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the Fly with the usage of quick Filters for on demand needed information.
- Analyzed and recommended improvements for better data consistency and efficiency.
- Designed and Developed data mapping procedures ETL-Data Extraction, Data Analysis and Loading process for integrating data using R programming.
- E ectively Communicated plans, project status, project risks and project metrics to the project team planned test strategies in accordance with project scope.
Environment: Cloudera CDH4.3, Hadoop, Pig, Hive, Informatica, HBase, MapReduce, HDFS, Sqoop, Impala, SQL, Tableau, Python, SAS, Flume, Oozie, Linux.
Confidential
Data Analyst
Responsibilities:
- Gathered all the Sales Analysis report prototypes from the business analysts belonging todi erent Business units.
- Worked with Master SSIS packages to execute a set of packages that load data from various sources onto the Data Warehouse on a timely basis.
- Involved in Data Extraction, Transformation and Loading (ETL) from source systems.
- Responsible with ETL design identifying the source systems, designing source to target relationships, data cleansing, data quality, creating source specifications, ETL design documents.
- The data received from Legacy Systems of customer information were cleansed and then transformed into staging tables and target tables in DB2.
- Used External Tables to Transform and load data from Legacy systems into Target tables.
- Use of data transformation tools such as DTS, SSIS, Informatica or Data Stage.
- Conducted Design reviews with the business analysts, content developers and DBAs.
- Designed, developed, and maintained Enterprise Data Architecture for enterprise datamanagement including business intelligence systems, data governance, data quality, enterprise, metadata tools, data modeling, data integration, operational data stores, data marts, data warehouses, and data standards.
- Incremental loading of fact tables from the source system to staging table on daily basic.
- Coding SQL stored procedures and triggers.
- Responsible for data extraction and data ingestion from di erent data sources into Hadoop DataLake by creating ETL pipelines using Pig, and Hive.
- Built pipelines to move hashed and un-hashed data from XML Files to Data lake.
- Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
- Extensively worked with Spark-SQL context to create data frames and datasets to preprocess the model data.
- Experienced in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
- Wrote Junit tests and Integration test cases for those Micro-services.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications.
- Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
- Worked with Python, C++, Spark, SQL, AirFlow, and Looker.
- Developed NiFi workFlow to pick up the multiple Files from ftp location and move those to HDFS on daily basis.
- Used various Transformations in SSIS DataFlow, Control Flow using for loop Containers and FuzzyLookups and Implemented Event Handlers and Error Handling in SSIS packages.
- Involved in Cloudera Navigator access for auditing and viewing data.
- Extracted tables from various databases for code review.
- Generated document coding to create metadata names for database tables.
- Analyzed metadata and table data for comparison and confirmation.
- Adhered to document deadlines for assigned databases.
- Ran routine reports on a scheduled basis as well as ad-hocs based on key point indicators.
- Develop DataStage jobs to cleanse, transform and load data to Data Warehouse and sequencers to encapsulate the DataStage job Flow.
- Designed data visualizations to analyze and communicate Findings.
Environment: Linux, Erwin, SQL Server, Crystal Reports9.0, HTML, DTS, SSIS, Azure, Informatica, Data Stage Version 7.0, Oracle, Toad, MS Excel, Pow.
Confidential
Hadoop Developer
Responsibilities:
- Responsible for data extraction and data ingestion from di erent data sources into Hadoop DataLake by creating ETL pipelines using Pig, and Hive.
- Responsible for importing data to HDFS using Sqoop from di erent RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
- Experience in designing and developing applications in PySpark using python to compare the performance of Spark with Hive.
- Mapped client business requirements to internal requirements of trading platform products.
- Supported revenue management using statistical and quantitative analysis, developed several statistical approaches and optimization models.
- Led the business analysis team of four members, in absence of the Team Lead.
- Added value by providing innovative solutions and delivering improved upon methods of data presentation by focusing on the Business need and the Business Value of the solution. Worked for Internet Marketing - Paid Search channels.
- Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders.
- Incorporated predictive modeling (rule engine) to evaluate the Customer/Seller health score using python scripts, performed computations and integrated with the Tableau viz.
- Worked with stakeholders to communicate campaign results, strategy, issues or needs.
- Analyzed marketing campaigns from various perspectives including CTR, conversion rates, seasonal/geographical trends, search queries, landing page, conversion funnel, quality score, competitors, distribution channel, etc. to achieve maximum ROI for clients.
- Worked with business to identify the gaps in mobile tracking and come up with the solution to solve.
- Analyzed click events of Hybrid landing page which includes bounce rate, conversion rate, Jumpback rate, List/Gallery view, etc. and provide valuable information for landing page optimization.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Create/Modify shell scripts for scheduling various data cleansing scripts and ETL load process.
- Developed testing scripts in Python and prepare test procedures, analyze test results data andsuggest improvements of the system and software.
- Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.
- GUI prompts user to enter personal information, charity items to donate, and deliver options.
- Developed a fully functioning C# program that connects to SQL Server Management Studio and integrates information that users enter with preexisting information in the database.
- Implemented SQL functions to receive user information from front end C# GUIs and store it into database.
- Utilized SQL functions to select information from database and send it to the front end upon user request.
- Handled importing data from di erent data sources into HDFS using Sqoop and performing transformations using Hive and then loading data into HDFS.
- Exporting of a result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis.
- Experience in managing and reviewing Hadoop Log Files.
- Used Sqoop to transfer data between relational databases and Hadoop.
- Worked on HDFS to store and access huge datasets within Hadoop. Good hands on experience with GitHub.
Environment: Spark, Java, Python, Jenkins, HDFS, Sqoop, Hadoop 2.0, Kafka, JSON, Hive, Sqoop, Oozie, Git.