Hadoop/scala Developer Resume
SUMMARY
- Around Six years of experience in working with big data using teh Hadoop framework and PySpark for analysis, transformations, deployment, and ingestion of data. Well acquainted with AWS Data Pipelines, Data structures, and processing systems. data mining, data cleaning, and data munging using PySpark, Spark, SQL, Python, SQL, and Hive.
- Currently looking for a challenging role as a “Data Engineer” in an organization where I can perform with my analytical skills while also enhancing my noledge and experience in teh software industry.
- Experience in migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data warehouse and controlling, granting database access, and Migrating On - premises databases to Azure Data Lake Store using Azure Data Factory.
- Experience in developing Map Reduce programs using Apache Hadoop for analyzing big data as per teh requirement.
- Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
- Extensive experience working on teh spark in performing ETL using Spark-SQL, Spark Core,and Real-time data processing using Spark Streaming.
- Strong experience working with various file formats like Avro, Parquet, Orc, Json, Csv, etc.
- Experience in developing customized UDFs in Python to extend Hive and Pig Latin functionality.
- Extensively used Python libraries such as PySpark, Pytest, Pymongo, coracle, PyExcel, Boto3, Psycopg, embody, NumPy, and Beautiful Soup.
- Good experience in Data warehousing applications using Informatica and designing teh workflows, and worklets.
- Developing Spark applications for cleaning and validation of teh ingested data into teh AWS cloud.
- Implement simple to complex transformation on Streaming Data and Datasets.
- Experience in converting Hive/SQL queries into Spark transformations.
- Experience working with large data sets and making performance improvements.
- Experience dealing with file formats like CSV, JSON, Parquet, ORC.
- Design and implemented data pipelines to handle real-time streaming data to process semi-structured data by integrating 10+data sources using Pyspark.
- Well-experienced in handling Finance, Marketing and Omnichannel data sets and proactively involved in solving complex business tasks.
- Automated ETL processes across billions of rows of data which reduce manual workload by more than 20% monthly.
- Developed business critical dashboards in Power BI and Tableau to monitor real-time sales.
- Successfully migrated multiple reporting platform to Tableau cloud and setup connections for on-prem data base and online cloud connectors.
- PerformingETLtesting activities like running teh Jobs, Extracting teh data using necessary queries from database transform, and upload into variousData warehouseservers.
- Converting data load pipeline algorithms written in python and SQL.
- DevelopedPython-basedAPI(RESTful Web Service) to track revenue and perform revenue analysis.
TECHNICAL SKILLS
Hadoop/Spark Ecosystem: Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Sqoop, Oozie, Zookeeper, Spark, Airflow, MongoDB, Cassandra, HBase, and Storm.
Programming Languages: Java, Python, Hibernate, JDBC, JSON, HTML, CSS
Cloud Technologies: AWS, GCP,Big query, Composer
Script Languages: Python, Shell Script(bash, shell)
Databases: Oracle, MySQL, SQL Server, PostgreSQL, HBase, Snowflake, Cassandra, MongoDB
Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans
Version controls and Tools: GIT, Maven, SBT, CBT
ETL/Reporting: Power Bi, Data Studio,Tableau
PROFESSIONAL EXPERIENCE
Confidential
Hadoop/Scala Developer
Responsibilities:
- Designed and developed large, complex data sets that meet functional / non-functional business requirements.
- Automated manual processes, and optimized data delivery for greater scalability.
- Developed ETL/ELT pipelines for loading data from a wide variety of data source formats using on-premises and cloud technology.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP and performed structural modifications using Map Reduce HIVE.
- Worked collaboratively to manage buildouts of large data clusters and real-time streaming with Spark.
- Developed ETL data pipelines using Spark, Spark Streaming, and Scala.
- Responsible for designing Data pipelines from web servers using Sqoop,Kafka, and Spark Streaming API.
- Loaded teh data into teh patient analytics database on teh Snowflake platform.
- Creating Databricks notebooks using SQL, Python, and automated notebooks using jobs.
- Used Spark for interactive queries, processing of streaming data, and integration with popular NoSQL database for huge volume of data.
- Migrate an existing on-premises application to teh Azure cloud.
- Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, Azure Data CatLog, HDInsight, Azure SQL Server, Azure ML, and Power BI.
- Using Azure Databricks, created spark clusters and configured high concurrency clusters to speed up teh preparation of high-quality data.
- Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
- Designed and developed data pipeline in Azure cloud which gets customer data from API and processes it to Azure SQL DB.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Developed various UDFs in Map-Reduce and Python for Pig and Hive.
- Developed PIG UDFs for manipulating teh data according to Business Requirements and worked on developing custom PIG Loaders.
- Designing and Developing Apache NiFi jobs to get teh files from transaction systems into teh data lake raw zone.
- Experienced in Databricks platform where it follows best practices for securing network access to cloud applications.
- Installed and configured Hive, Pig, Sqoop, Flume, and Oozie on teh Hadoop cluster.
- Analysed teh SQL scripts and designed them by using PySpark SQL for faster performance.
- Used Azure Data Factory, SQL API, and MongoDB API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB, Cosmos DB)
Environment: Spark, Spark Streaming, Apache Kafka, Apache NiFi, Hive, Azure, Azure Databricks, Azure Data Grid, Azure Synapse analytics, PIG, PySpark, Tableau, Teradata, Snowflake, Sqoop, Oozie, Scala, Python, GIT, GIT HUB.
Confidential
Business Intelligence Developer
Responsibilities:
- Managed multiple tasks for customer sales, inventory, merchandising reports, dashboards, portals, analytics solutions and scorecards from requirement gathering to final.
- Develop various dashboards in Power BI used in customer sales and inventory.
- Validate newly created reports and work with other developers for support.
- Developed various reports for every quarter on gift card users based on different locations.
- Designed complex data intensive reports in Power BI utilizing various graph features such as gauge, funnel, line better business analysis.
- Tested Complex ETL Mapping and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
- Responsible for different Data mapping activities from Source systems to EDW, ODS& data marts.
- Worked on teh tuning ofSQLQueries to bring down run time by working on Indexes and Execution Plan.
- Worked on a direct query usingPower BIto compare legacy data with teh current data and generated reports and stored and dashboards.
- DesignedSSISPackages to extract, transfer, load (ETL) existing data intoSQLServer from different environments for theSSAScubes (OLAP).
- SQLServer reporting services (SSRS). Created & formatted Crosstab, Conditional, Drill-down, Top N, Summary, Form,OLAP, Sub reports, ad-hoc reports, parameterized reports, interactive reports & custom reports.
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets usingPower BI.
Confidential
Business Intelligence Lead
Responsibilities:
- Developed Impromptu and ReportNet reports and PowerPlay cubes and reports using Cogno
- Series 7 applications.
- Conducted training for Power Users and web users. Trained personnel on PowerPlay client/web, IWR, Report and Query Studio and Impromptu client.
- Developed 250 date prompted, scheduled and web enabled hotel cost reports by region, state and city. Reports generated monthly income from teh hotels to access this sales data.
- Developed consumer behavior, key trends, finding ways to maximize department revenue.
- Used statistical methods and analyzed large data set collected from different users to perform time study.
- Solved Business problems by analyzing data using MS Excel, Macros, VLOOKUP and Pivot Table.
- Designed, developed & administered Power BI Dashboards/Reports and published them on to Power BI Sites.
- Extensively involved in requirements gathering for building operational cubes by meeting with business partners in base lining teh details.
- Worked on database testing, wrote complexSQLqueries to verify teh transactions and business logic like identifying teh duplicate rows by usingSQLDeveloper andPL/SQLDeveloper.
- Used Teradata for OLTP systems by generating models to support Revenue Management Applications that connect to SAAS.
- Created SSIS Packages for import and export of data between Oracle database and others like MS Excel and Flat Files.
- Compiling and validating data from all departments and Presenting to Director Operation.
- KPIcalculator Sheet and maintain that sheet within SharePoint.
- CreatedTableaureports with complex calculations and worked on Ad-hoc reporting usingPower BI.